Title: | Removal of unwanted variation for gene-gene correlations and related analysis |
---|---|
Description: | RUVcorr allows to apply global removal of unwanted variation (ridged version of RUV) to real and simulated gene expression data. |
Authors: | Saskia Freytag |
Maintainer: | Saskia Freytag <[email protected]> |
License: | GPL-2 |
Version: | 1.39.0 |
Built: | 2024-10-31 04:32:35 UTC |
Source: | https://github.com/bioc/RUVcorr |
assessQuality
allows to assess the quality of cleaning procedures in the context
of correlations when the true underlying correlation structure is known.
assessQuality( est, true, index = "all", methods = c("all", "fnorm", "wrong.sign") )
assessQuality( est, true, index = "all", methods = c("all", "fnorm", "wrong.sign") )
est |
A matrix of estimated gene expression values. |
true |
A matrix of true correlations. |
index |
A vector of indices of genes to be included in
the assessment; if |
methods |
The method used for quality assessment;
if |
The squared Frobenius norm used for assessQuality
has the following structure
Here, the parameter and the parameter
denote the lower triangles of the
estimated and true Fisher transformed correlation matrices, respectively.
The parameter
denotes the number of elements in
and
.
assessQuality
returns a vector of the requested quality assessments.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) assessQuality(Y$Y, Y$Sigma, index=1:100, methods="wrong.sign") assessQuality(Y$Y, Y$Sigma, index=1:100, method="fnorm")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) assessQuality(Y$Y, Y$Sigma, index=1:100, methods="wrong.sign") assessQuality(Y$Y, Y$Sigma, index=1:100, method="fnorm")
background
returns background genes for judging the quality of the cleaning.
These genes are supposed to represent the majority of genes. The positive control and
negative control genes should be excluded.
background(Y, nBG, exclude, nc_index)
background(Y, nBG, exclude, nc_index)
Y |
A matrix of gene expression values or an object of
the class |
nBG |
An integer setting the number of background genes. |
exclude |
A vector of indices of genes to exclude. |
nc_index |
A vector of indices of negative controls (also excluded from being background genes). |
background
returns a vector of randomly chosen indices.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) background(Y, nBG=20, exclude=1:100, nc_index=251:500)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) background(Y, nBG=20, exclude=1:100, nc_index=251:500)
calculateThreshold
returns the proportion of prioritised genes from a random selection
for supplied threshold. Furthermore, this function also fits a loess curve to the estimated points.
This allows the calculation of a threshold for priortisation of genes.
calculateThreshold( X, exclude, index.ref, set.size = length(index.ref), Weights = NULL, thresholds = seq(0.05, 1, 0.05), anno = NULL, Factor = NULL, cpus = 1, parallel = FALSE )
calculateThreshold( X, exclude, index.ref, set.size = length(index.ref), Weights = NULL, thresholds = seq(0.05, 1, 0.05), anno = NULL, Factor = NULL, cpus = 1, parallel = FALSE )
X |
A matrix of gene expression values. |
exclude |
A vector of indices of genes to exclude. |
index.ref |
A vector of indices of reference genes used for prioritisation. |
set.size |
An integer giving the size of the set of genes that are to be prioritised. |
Weights |
A object of class |
thresholds |
A vector of thresholds; values should be in the range |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
cpus |
An integer giving the number of cores that are supposed to be used. |
parallel |
A logical value indicating whether parallel comuting should be used. |
The proportion of prioritized random genes is estimated by drawing 1000 random sets of genes and calculating how many would be prioritised at every given threshold. A gene is is prioritised if at least one correlation with a known reference gene is above the given threshold.
calculateThreshold
returns an object of class Threshold
.
An object of class Threshold
is a list
with the following components:
Prop.values
A vector of the proportion of prioritized genes.
Thresholds
A vector containing the values in threshold
.
loess.estimate
An object of class loess
.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") calculateThreshold(Y$Y, exclude=seq(251,500,1), index.ref=seq_len(10), Weights=weights, anno=anno, Factor="Factor")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") calculateThreshold(Y$Y, exclude=seq(251,500,1), index.ref=seq_len(10), Weights=weights, anno=anno, Factor="Factor")
compareRanks
allows to calculate the difference of the ranks of known reference
gene pairs from two versions of the same data.
compareRanks(Y, Y.hat, ref_index, no.random = 1000, exclude_index)
compareRanks(Y, Y.hat, ref_index, no.random = 1000, exclude_index)
Y |
A matrix of raw gene expression values. |
Y.hat |
A matrix of cleaned gene expression values. |
ref_index |
A vector of indices that are referrring to genes of interest. |
no.random |
An integer giving the number of random genes. |
exclude_index |
A vector of indices to be excluded from the selection of random genes. |
The correlations between all random genes and reference genes is calculated
(including correlations between random and reference) using the two versions of
the data. The correlations are then ranked according to their absolute value (highest
to lowest). The ranks of the reference gene pairs are extracted. For a paticular
reference gene pair, the difference in the ranks between the two versions of the data
is calculated:
Rank in Y
- Rank in Y.hat
compareRanks
returns a vector of the differences in ranks of
the correlations of reference gene pairs estimated using raw or cleaned data.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nu=0, kW=10) compareRanks(Y$Y, Y.hat, ref_index=1:30, no.random=100, exclude_index=c(31:100,251:500))
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nu=0, kW=10) compareRanks(Y$Y, Y.hat, ref_index=1:30, no.random=100, exclude_index=c(31:100,251:500))
correlationPlot
produces a correlation plot to compare true and estimated
correlationPlot( true, est, plot.genes = sample(seq_len(dim(true)[1]), 18), boxes = TRUE, title, line = -1 )
correlationPlot( true, est, plot.genes = sample(seq_len(dim(true)[1]), 18), boxes = TRUE, title, line = -1 )
true |
A matrix of true gene-gene correlation values. |
est |
A matrix of estimated gene expression values. |
plot.genes |
A vector of indices of genes used in plotting; the suggested length of this vector is 18. |
boxes |
A logical scalar to indicate whether boxes
are drawn around sets of 6 genes; only available if |
title |
A character string describing the title of the plot. |
line |
on which MARgin line, starting at 0 counting outwards. |
The upper triangle of the correlation plot shows the true gene-gene correlation values, while the lower triangle of the correlation plot shows the gene-gene correlation values calculated from the estimated gene expression values. This is possible because correlation matrices are symmetric.
correlationPlot
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) correlationPlot(Y$Sigma, Y$Y, title="Raw", plot.genes=c(sample(1:100, 6), sample(101:250, 6), sample(251:500, 6)))
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) correlationPlot(Y$Sigma, Y$Y, title="Raw", plot.genes=c(sample(1:100, 6), sample(101:250, 6), sample(251:500, 6)))
ECDFPlot
generates empirical cumulative distribution
functions (ECDF) for gene-gene correlation values.
ECDFPlot(X, Y, index = "all", col.X = "red", col.Y = "black", title, legend)
ECDFPlot(X, Y, index = "all", col.X = "red", col.Y = "black", title, legend)
X |
A matrix or list of matrices of estimated gene-gene correlations. |
Y |
A matrix of reference gene-gene correlations (i.e. underlying known correlation structure). |
index |
A vector of indicies of genes of interest. |
col.X |
The color or colors for ECDF as estimated from |
col.Y |
The color for ECDF as estimated from |
title |
A character string describing title of plot. |
legend |
A vector describing |
ECDFPlot
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) Y.hat.cor<-cor(Y.hat) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) ECDFPlot(Y.hat.cor, Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Truth")) ECDFPlot(list(Y.hat.cor, cor(Y$Y)), Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Raw", "Truth"), col.Y="black")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) Y.hat.cor<-cor(Y.hat) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) ECDFPlot(Y.hat.cor, Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Truth")) ECDFPlot(list(Y.hat.cor, cor(Y$Y)), Y$Sigma, index=1:100, title="Simulated data", legend=c("RUV", "Raw", "Truth"), col.Y="black")
eigenvaluePlot
plots the ratio of the ith eigenvalue
of the SVD of the negative controls to the eigenvalue total.
eigenvaluePlot(Y, nc_index, k = 10, center = TRUE, title = "Eigenvalue Plot")
eigenvaluePlot(Y, nc_index, k = 10, center = TRUE, title = "Eigenvalue Plot")
Y |
A matrix of gene expressions. |
nc_index |
A vector of indices for the negative controls. |
k |
A numeric value giving the number of eigenvalues that should be displayed. |
center |
A logical character to indicate whether centering is needed. |
title |
A character string describing title. |
eigenvaluePlot
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) eigenvaluePlot(Y$Y, nc_index=251:500, k=20, center=TRUE)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) eigenvaluePlot(Y$Y, nc_index=251:500, k=20, center=TRUE)
empNegativeControls
finds suitable negative controls in real or simulated data.
empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## Default S3 method: empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## S3 method for class 'simulateGEdata' empNegativeControls(Y, exclude, smoothing = 0.1, nc)
empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## Default S3 method: empNegativeControls(Y, exclude, smoothing = 0.1, nc) ## S3 method for class 'simulateGEdata' empNegativeControls(Y, exclude, smoothing = 0.1, nc)
Y |
A matrix of gene expression values or an
object of the class |
exclude |
A vector of indices to be excluded from being chosen as negative controls. |
smoothing |
A numerical scalar determining the amount of smoothing to be applied. |
nc |
An integer setting the number of negative controls. |
First the mean of all genes (except the excluded genes) is calculated and genes are accordingly assigned to bins. The bins have the size of the smoothing parameter. In each bin the function picks a number of negative control genes proportional to the total number of genes in the bin. The picked genes in each bin have the lowest inter-quantile ranges of all genes in the respective bin.
empNegativeControls
returns a vector of indicies of
empirically chosen negative controls.
For simulated data it is advisable to use the known negative controls or restrict the empirical choice to the known negative controls by excluding all other genes.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) empNegativeControls(Y, exclude=1:100, nc=100)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) empNegativeControls(Y, exclude=1:100, nc=100)
findWeights
returns a list of variances and weights based on the correlation
between genes for each level of a factor found in the annotation. This function is
typically used to find the weights of each individual in the data set.
findWeights(X, anno, Factor)
findWeights(X, anno, Factor)
X |
A matrix of gene expression values. |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
Note that because calculations of weights include finding correlations between all genes,
this function might take some time. Hence, recalculation of weights is not advisable and
should be avoided. However often the inverse variances can be used to calculate new weights.
In particlular, when denotes the weight of the
level and
the variance as calculated from the gene-gene correlations:
findWeights
returns output of the class Weights
.
An object of class Weights
is a list
with the following components:
Weights
A list containing the weights of each level of Factor
.
Inv.Sigma
A list containing the inverse variances of each level of Factor
.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" findWeights(Y$Y, anno, "Factor")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" findWeights(Y$Y, anno, "Factor")
genePlot
plots the means vs. the inter-quantile ranges of the gene
expression values of all genes with the possibility to highlight interesting sets of genes.
genePlot(Y, index = NULL, legend = NULL, col.h = "red", title)
genePlot(Y, index = NULL, legend = NULL, col.h = "red", title)
Y |
A matrix of gene expression values or an object of the class |
index |
A vector of indices of genes of interest to be displayed in a different color, if |
legend |
A character string describing the highlighted genes. |
col.h |
The color of the highlighted genes. |
title |
A character string describing the title of the plot. |
genePlot
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) genePlot(Y, index=1:100, legend="Expressed genes", title="IQR-Mean Plot")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) genePlot(Y, index=1:100, legend="Expressed genes", title="IQR-Mean Plot")
histogramPlot
plots histograms of correlation values in expression data and
its reference.
histogramPlot( X, Y, legend, breaks = 40, title, col.X = "red", col.Y = "black", line = NULL )
histogramPlot( X, Y, legend, breaks = 40, title, col.X = "red", col.Y = "black", line = NULL )
X |
A matrix or a list of matrices of estimated gene-gene correlations. |
Y |
A matrix of reference gene-gene correlations (i.e. known underlying correlation structure). |
legend |
A vector of character strings describing the data contained in |
breaks |
one of:
In the last three cases the number is a suggestion only; as the
breakpoints will be set to |
title |
A character string describing title. |
col.X |
A vector or character string defining the color/colors associated with the data contained in |
col.Y |
The color associated with the data in |
line |
A vector giving the line type. |
The default for breaks is "Sturges"
.
Other names for which algorithms are supplied are "Scott"
and "FD"
/ "Freedman-Diaconis"
Case is ignored and partial
matching is used. Alternatively, a function can be supplied which will compute the
intended number of breaks or the actual breakpoints as a function of x
.
histogramPlot
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=FALSE) Y.hat.cor<-cor(Y.hat[,1:100]) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) histogramPlot(Y.hat.cor, Y$Sigma[1:100, 1:100], title="Simulated data", legend=c("RUV", "Truth")) try(dev.off(), silent=TRUE) histogramPlot(list(Y.hat.cor, cor(Y$Y[, 1:100])), Y$Sigma[1:100, 1:100], title="Simulated data", col.Y="black", legend=c("RUV", "Raw", "Truth"))
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=FALSE) Y.hat.cor<-cor(Y.hat[,1:100]) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) histogramPlot(Y.hat.cor, Y$Sigma[1:100, 1:100], title="Simulated data", legend=c("RUV", "Truth")) try(dev.off(), silent=TRUE) histogramPlot(list(Y.hat.cor, cor(Y$Y[, 1:100])), Y$Sigma[1:100, 1:100], title="Simulated data", col.Y="black", legend=c("RUV", "Raw", "Truth"))
optimizeParameters
class.is.optimizeParameters
checks if object is of optimizeParameters
class.
is.optimizeParameters(x)
is.optimizeParameters(x)
x |
An object. |
is.optimizeParameters
returns a logical scalar;
TRUE
if the object is of the class optimizeParameters
.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE) opt is.optimizeParameters(opt)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE) opt is.optimizeParameters(opt)
simulateGEdata
class.is.simulateGEdata
checks if object is of simulateGEdata
class.
is.simulateGEdata(x)
is.simulateGEdata(x)
x |
An object. |
is.simulateGEdata
returns a logical scaler;
TRUE
if the object is of the class simulateGEdata
.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) is.simulateGEdata(Y)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) is.simulateGEdata(Y)
Threshold
class.is.Threshold
checks if object is of Threshold
class.
is.Threshold(x)
is.Threshold(x)
x |
An object. |
is.Threshold
returns a logical scalar;
TRUE
if the object is of the class Threshold
.
Saskia Freytag
Weights
class.is.Weights
checks if object is of Weights
class.
is.Weights(x)
is.Weights(x)
x |
An object. |
is.Weights
returns a logical scaler;
TRUE
if the object is of the class Weights
.
Saskia Freytag
optimizeParameters
returns the optimal parameters to be
used in the removal of unwanted variation procedure when using simulated data.
optimizeParameters( Y, kW.hat = seq(5, 25, 5), nu.hat = c(0, 10, 100, 1000, 10000), nc_index, methods = c("all", "fnorm", "wrong.sign"), cpus = 1, parallel = FALSE, check.input = FALSE )
optimizeParameters( Y, kW.hat = seq(5, 25, 5), nu.hat = c(0, 10, 100, 1000, 10000), nc_index, methods = c("all", "fnorm", "wrong.sign"), cpus = 1, parallel = FALSE, check.input = FALSE )
Y |
An object of the class |
kW.hat |
A vector of integers for |
nu.hat |
A vector of values for |
nc_index |
A vector of indices of the negative controls
used in |
methods |
The method used for quality assessment;
if |
cpus |
A number specifiying how many workers to use for parallel computing. |
parallel |
Logical: if |
check.input |
Logical; if |
The simulated data is cleaned using removal of unwanted variation with all combinations of the input parameters. The quality of each cleaning is judged by the Frobenius Norm of the correlation as estimated from the cleaned data and the known data or the percentage of correlations with estimated to have the wrong sign.
optimizeParameters
returns output of the class
optimizeParameters
.
An object of class optimizeParameters
is a list containing the
following components:
All.results
A matrix of output of the quality assessment for all combinations of input parameters.
Compare.raw
A vector of the quality assessment for the uncorrected data.
Optimal.parameter
A matrix or a vector giving the optimal parameter combination.
Saskia Freytag
assessQuality
, RUVNaiveRidge
,
funcPara
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE, check.input=TRUE) opt
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,1000), nc_index=251:500, methods=c("fnorm"), cpus=1, parallel=FALSE, check.input=TRUE) opt
PCAPlot
generates principle component plots for with the possibility
to color arrays according to a known factor.
PCAPlot( Y, comp = c(1, 2), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, title )
PCAPlot( Y, comp = c(1, 2), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, title )
Y |
A matrix of gene expression values or an object of class |
comp |
A vector of length 2 specifying which principle components to be used. |
anno |
A dataframe or a matrix containing the annotation of the arrays. |
Factor |
A character string describing the column name of
|
numeric |
A logical scalar indicating whether |
new.legend |
A vector describing the names used for labelling; if |
title |
A character string giving the title. |
PCAPlot
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) PCAPlot(Y$Y, title="") ## Create random annotation file anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) PCAPlot(Y$Y, anno=anno, Factor="Factor", numeric=TRUE, title="")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) PCAPlot(Y$Y, title="") ## Create random annotation file anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0, mfrow=c(1, 1)) PCAPlot(Y$Y, anno=anno, Factor="Factor", numeric=TRUE, title="")
optimizeParameters
.plot.optimizeParameters
generates a heatmap of the quality assessment values
stored in the object of class optimizeParameters
.
## S3 method for class 'optimizeParameters' plot( x, main = colnames(opt$All.results)[seq(3, dim(opt$All.results)[2], 1)], ... )
## S3 method for class 'optimizeParameters' plot( x, main = colnames(opt$All.results)[seq(3, dim(opt$All.results)[2], 1)], ... )
x |
An object of the class |
main |
A character string describing title of plot. |
... |
Further arguments passed to or from other methods. |
The black point in the heatmap denotes the optimal parameter combination.
plot.optimizeParameters
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=2, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,100000), nc_index=seq(251,500,1), methods=c("fnorm"), cpus=1, parallel=FALSE) try(dev.off(), silent=TRUE) plot(opt, main="Heatmap Plot")
Y<-simulateGEdata(500, 500, 10, 2, 5, g=2, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) opt<-optimizeParameters(Y, kW.hat=c(1,5,10), nu.hat=c(100,100000), nc_index=seq(251,500,1), methods=c("fnorm"), cpus=1, parallel=FALSE) try(dev.off(), silent=TRUE) plot(opt, main="Heatmap Plot")
plotDesign
returns a plot with different color strips representing
different factors relating to the study design.
genes.
plotDesign(anno, Factors, anno.names = Factors, orderby = NULL)
plotDesign(anno, Factors, anno.names = Factors, orderby = NULL)
anno |
A dataframe or matrix containing the annotation of the study. |
Factors |
A vector of factors that should be plotted. |
anno.names |
A vector containing the names, the default |
orderby |
A character describing an element in |
plotDesign
returns a plot.
Saskia Freytag
library(bladderbatch) data(bladderdata) expr.meta <- pData(bladderEset) plotDesign(expr.meta, c("cancer", "outcome", "batch"), c("Diagnosis", "Outcome", "Batch"), orderby="batch")
library(bladderbatch) data(bladderdata) expr.meta <- pData(bladderEset) plotDesign(expr.meta, c("cancer", "outcome", "batch"), c("Diagnosis", "Outcome", "Batch"), orderby="batch")
Threshold
.plotThreshold
plots the objects of class Threshold
.
plotThreshold(x, main = "", legend, col = NULL, ...)
plotThreshold(x, main = "", legend, col = NULL, ...)
x |
An object of class |
main |
A character string describing the title of the plot. |
legend |
A vector of character strings decribing the different |
col |
A vector giving the colors, if |
... |
Further arguments passed to or from other methods. |
plotThreshold
returns a plot.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") Thresh<-calculateThreshold(Y$Y, exclude=1:100, index.ref=1:10, Weights=weights, anno=anno, Factor="Factor") plotThreshold(Thresh)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:4, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") Thresh<-calculateThreshold(Y$Y, exclude=1:100, index.ref=1:10, Weights=weights, anno=anno, Factor="Factor") plotThreshold(Thresh)
simualteGEdata
.print.simualteGEdata
is the print
generic
for object so f the class simulateGEdata
.
## S3 method for class 'simulateGEdata' print(x, ...)
## S3 method for class 'simulateGEdata' print(x, ...)
x |
An object of the class |
... |
Further arguments passed to or from other methods. |
print.simualteGEdata
returns the information about simulation and
the first 5 rows and 5 columns of all matrices.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y
prioritise
returns a set of genes from a candidate set of genes that are
correlated above a provided threshold with at least one of the provided reference
genes.
prioritise(X, ref_index, cand_index, anno, Factor, Weights, threshold)
prioritise(X, ref_index, cand_index, anno, Factor, Weights, threshold)
X |
A matrix of gene expression values. |
ref_index |
A vector of indices of reference genes. |
cand_index |
A vector of indices of candidate genes. |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
Weights |
An object of class |
threshold |
A value in the range |
prioritise
returns a matrix with three columns. The first column gives
the names of the genes that were prioiritised, while the second column gives the
number of correlations above the threshold for the gene in question. The
columns gives the sum of the absolute value of all correlations with reference genes
above the threshold.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) colnames(Y$Y)<-1:dim(Y$Y)[2] anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") prioritise(Y$Y, 1:10, 51:150, anno, "Factor", weights, 0.6)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=TRUE) colnames(Y$Y)<-1:dim(Y$Y)[2] anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") prioritise(Y$Y, 1:10, 51:150, anno, "Factor", weights, 0.6)
RLEPlot
generates three different types of
relative log expression plots for high-dimensional data.
RLEPlot( X, Y, center = TRUE, name, title, method = c("IQR.points", "IQR.boxplots", "minmax"), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, outlier = FALSE )
RLEPlot( X, Y, center = TRUE, name, title, method = c("IQR.points", "IQR.boxplots", "minmax"), anno = NULL, Factor = NULL, numeric = FALSE, new.legend = NULL, outlier = FALSE )
X |
A matrix of gene expression values. |
Y |
A matrix of gene expression values. |
center |
A logical scalar; |
name |
A vector of characters describing the data contained in
|
title |
A character string describing the title of the plot. |
method |
The type of RLE plot to be displayed; possible inputs are
|
anno |
A dataframe or a matrix containing the annotation of
arrays in |
Factor |
A character string corresponding to a column name of
|
numeric |
A logical scalar indicating whether |
new.legend |
A vector describing the names used for labelling; if |
outlier |
A logical indicating whether outliers should be plotted; only
applicable when |
There are three different RLE plots that can be generated using RLEPlot
:
"IQR.points"
Median expression vs. inter-quantile range of every array.
"IQR.boxplots"
Boxplots of the 25% and 75% quantile of all arrays.
"Minmax"
Ordinary RLE plots for the 5 arrays with the smallest and largest inter-quantile ranges.
Note that normal RLE plots are not supplied as they are not very suitable for high-dimensional data.
RLEPlot
returns a plot.
Saskia Freytag, Terry Speed
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points") try(dev.off(), silent=TRUE) par(mfrow=c(1, 1)) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.boxplots") try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="minmax") #Create a random annotation file anno<-as.matrix(sample(1:4, dim(Y.hat)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points", anno=anno, Factor="Factor", numeric=TRUE)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 10, check.input=TRUE) try(dev.off(), silent=TRUE) par(mar=c(5.1, 4.1, 4.1, 2.1), mgp=c(3, 1, 0), las=0) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points") try(dev.off(), silent=TRUE) par(mfrow=c(1, 1)) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.boxplots") try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="minmax") #Create a random annotation file anno<-as.matrix(sample(1:4, dim(Y.hat)[1], replace=TRUE)) colnames(anno)<-"Factor" try(dev.off(), silent=TRUE) RLEPlot(Y$Y, Y.hat, name=c("Raw", "RUV"), title="", method="IQR.points", anno=anno, Factor="Factor", numeric=TRUE)
RUVcorr allows to apply global removal of unwanted variation (ridged version of RUV) to real and simulated gene expression data.
All gene expression data are assumed to be in the following format:
Rows correspond to arrays.
Columns correspond to genes.
Saskia Freytag
RUVNaiveRidge
applies the ridged version of global removal of unwanted variation
to simulated or real gene expression data.
RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## Default S3 method: RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## S3 method for class 'simulateGEdata' RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE)
RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## Default S3 method: RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE) ## S3 method for class 'simulateGEdata' RUVNaiveRidge(Y, center = TRUE, nc_index, nu, kW, check.input = FALSE)
Y |
A matrix of gene expression values or an object of
class |
center |
A logical scalar; if |
nc_index |
A vector of indices of negative controls. |
nu |
A numeric scalar value of |
kW |
An integer setting the number of dimensions for the estimated noise. |
check.input |
A logical scalar; if |
The parameter kW
controls how much noise is cleaned, whereas the
parameter nu
controls the amount of ridging to deal with possible dependence of
the noise and the factor of interest.
RUVNaiveRidge
returns a matrix of the cleaned
(RUV-treated) centered gene expression values.
Saskia Freytag, Laurent Jacob
Jacob L., Gagnon-Bartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 9, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5] Y.hat<-RUVNaiveRidge(Y, center=FALSE, nc_index=251:500, 0, 10, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5]
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=FALSE) Y Y.hat<-RUVNaiveRidge(Y, center=TRUE, nc_index=251:500, 0, 9, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5] Y.hat<-RUVNaiveRidge(Y, center=FALSE, nc_index=251:500, 0, 10, check.input=TRUE) cor(Y.hat[,1:5]) Y$Sigma[1:5,1:5]
simulateGEdata
returns simulated noisy gene expression values of specified size
and its underlying gene-gene correlation.
simulateGEdata( n, m, k, size.alpha, corr.strength, g = NULL, Sigma.eps = 0.1, nc, ne, intercept = TRUE, check.input = FALSE )
simulateGEdata( n, m, k, size.alpha, corr.strength, g = NULL, Sigma.eps = 0.1, nc, ne, intercept = TRUE, check.input = FALSE )
n |
An integer setting the number of genes. |
m |
An integer setting the number of arrays. |
k |
An integer setting number of dimensions of noise term,
controls dimension of |
size.alpha |
A numeric scalar giving the maximal and
minimal absolute value of |
corr.strength |
An integer controlling the dimension of |
g |
An integer value between [1, min( |
Sigma.eps |
A numeric scalar setting the amount of random variation in
|
nc |
An integer setting the number of negative controls. |
ne |
An integer setting the number of strongly expressed genes. |
intercept |
An logical value indicating whether the systematic noise has an intercept. |
check.input |
A logical scalar; if |
This function generates log2-transformed expression values of n
genes in
m
arrays. The expression values consist of true expression and noise:
The dimensions of the matrices and
are used to control the size of
the correlation between the genes. It is possible to simualte three different classes
of genes:
correlated genes expressed with true log2-transformed values from 0 to 16
correlated genes expressed with true log2-transformed values with mean 0
uncorrelated genes with true log2-transformed expression equal to 0 (negative controls)
The negative control are always the last nc
genes in the data,
whereas the strongly expressed genes are always the first ne
genes in the data.
The parameter intercept
controls whether the systematic noise has an
offset or not. Note that the intercept is one dimension of .
It is possible to either simulate data where
and
are independent by
setting
g
to NULL, or increasing correlation between
and
by increasing
g
.
simulateGEdata
returns output of the class simulateGEdata
.
An object of class simulateGEdata
is a list
with the
following components:
Truth
A matrix containing the values of .
Y
A matrix containing the values in .
Noise
A matrix containing the values in .
Sigma
A matrix containing the true gene-gene correlations, as defined by .
Info
A matrix containing some of the general information about the simulation.
Saskia Freytag, Johann Gagnon-Bartsch
Jacob L., Gagnon-Bartsch J., Speed T. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Berkley Technical Reports (2012).
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y Y<-simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y Y<-simulateGEdata(500, 500, 10, 2, 5, g=3, Sigma.eps=0.1, 250, 100, intercept=TRUE, check.input=TRUE) Y
wcor
returns correlations weighted according to a provided object of
class Weights
.
wcor(X, anno, Factor, Weights)
wcor(X, anno, Factor, Weights)
X |
A matrix of gene expression values. |
anno |
A dataframe or a matrix containing the annotation of arrays in |
Factor |
A character string corresponding to a column name of |
Weights |
An object of class |
wcor
returns a matrix.
Saskia Freytag
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") wcor(Y$Y[,1:5], anno, "Factor", weights)
Y<-simulateGEdata(500, 500, 10, 2, 5, g=NULL, Sigma.eps=0.1, 250, 100, intercept=FALSE, check.input=FALSE) anno<-as.matrix(sample(1:5, dim(Y$Y)[1], replace=TRUE)) colnames(anno)<-"Factor" weights<-findWeights(Y$Y, anno, "Factor") wcor(Y$Y[,1:5], anno, "Factor", weights)