Title: | Adaptive Gene Picking for Microarray Expression Data Analysis |
---|---|
Description: | Functions to Analyze Microarray (Gene Expression) Data. |
Authors: | Brian S. Yandell <[email protected]> |
Maintainer: | Brian S. Yandell <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.79.0 |
Built: | 2024-10-31 03:28:36 UTC |
Source: | https://github.com/bioc/pickgene |
The function plots contours for the odds that points on microarray show differential expression between two conditions (e.g. Cy3 and Cy5 dye channels on the same microarray).
em.ggb(x, y, theta, start = c(2,1.2,2.7), pprior = 2, printit = FALSE, tol = 1e-9, offset = 0 )
em.ggb(x, y, theta, start = c(2,1.2,2.7), pprior = 2, printit = FALSE, tol = 1e-9, offset = 0 )
x |
first condition expression levels |
y |
second condition expression levels |
theta |
four parameters |
start |
starting estimates for theta |
pprior |
Beta hyperparameter for prob |
printit |
print iterations if TRUE |
tol |
parameter tolerance for convergence |
offset |
offset added to xx and yy before taking log (can help with negative adjusted values) |
Fit Gamma/Gamma/Bernoulli model (equal marginal distributions) The model has spot intensities x ~ Gamma(a,b); y ~ Gamma(a,c). The shape parameters b and c are ~ Gamma(a0,nu). With probability p, b = c; otherwise b != c. All spots are assumed to be independent.
Four parameter vector theta
after convergence.
Michael Newton
MA Newton, CM Kendziorski, CS Richmond, FR Blattner and KW Tsui (2000) “On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data,” J Computational Biology 00: 000-000.
## Not run: em.ggb( x, y ) ## End(Not run)
## Not run: em.ggb( x, y ) ## End(Not run)
The function created a model matrix of orthogonal contrasts to be used by pickgene.
model.pickgene(faclevel, facnames = letters[seq(length(faclevel))], contrasts.fac = "contr.poly", collapse = "+", show = NULL, renorm = 1, modelexpr = formula(paste("~", paste(facnames, collapse = collapse))), contrasts.list = contr.list)
model.pickgene(faclevel, facnames = letters[seq(length(faclevel))], contrasts.fac = "contr.poly", collapse = "+", show = NULL, renorm = 1, modelexpr = formula(paste("~", paste(facnames, collapse = collapse))), contrasts.list = contr.list)
faclevel |
vector with number of levels for each factor |
facnames |
vector of factor names (default = "a", "b", ...) |
contrasts.fac |
vector of contrast types |
collapse |
"+" for additive model, "*" for full model with interactions |
show |
vector of contrast numbers to show (default is all) |
renorm |
vector to renormalize contrasts (e.g., use |
modelexpr |
model formula |
contrasts.list |
list of contrasts indexed by |
Creates a model matrix data frame with first column having all 1's and other columns having contrasts.
Result of call to model.matrix
Brian Yandell
model.pickgene(c(2,3), c("sex","genotype"))
model.pickgene(c(2,3), c("sex","genotype"))
The function plots contours for the odds that points on microarray show differential expression between two conditions (e.g. Cy3 and Cy5 dye channels on the same microarray).
oddsplot(x, y, theta, by.level = 10, rotate = FALSE, offset = 0, main = "", xlab = xlabs, ylab = ylabs, col = NULL, cex = c(0.25, 0.75), shrink = FALSE, lims = range(c(x, y)))
oddsplot(x, y, theta, by.level = 10, rotate = FALSE, offset = 0, main = "", xlab = xlabs, ylab = ylabs, col = NULL, cex = c(0.25, 0.75), shrink = FALSE, lims = range(c(x, y)))
x |
first condition expression levels |
y |
second condition expression levels |
theta |
four parameters from |
by.level |
odds plot contours increase by this level |
rotate |
rotate to average versus ratio if TRUE, otherwise plot conditions against each other |
offset |
offset for |
main |
main title for plot |
xlab |
horizontal axis label (default if |
ylab |
vertical axis label (default if |
col |
color of points (if NULL, use black for non-changing points, blue for changing points) |
cex |
character expansion (use |
shrink |
use shrinkage on expression levels if TRUE (default is FALSE) |
lims |
limits for plot area |
Fit Gamma/Gamma/Bernoulli model (equal marginal distributions) The model has spot intensities x ~ Gamma(a,b); y ~ Gamma(a,c). The shape parameters b and c are ~ Gamma(a0,nu). With probability p, b = c; otherwise b != c. All spots are assumed to be independent.
Log odds for all points in original order.
Michael Newton
MA Newton, CM Kendziorski, CS Richmond, FR Blattner and KW Tsui (2000) “On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data,” J Computational Biology 00: 000-000.
## Not run: oddsplot( x, y ) ## End(Not run)
## Not run: oddsplot( x, y ) ## End(Not run)
The function picks plots the average intensity versus linear contrasts (currently linear, quadratic up to cubic) across experimental conditions. Critical line is determine according to Bonferroni-like multiple comparisons, allowing SD to vary with intensity.
pickgene(data, geneID = 1:nrow(data), overalllevel = 0.05, npickgene = -1, marginal = FALSE, rankbased = TRUE, allrank = FALSE, meanrank = FALSE, offset = 0, modelmatrix = model.pickgene(faclevel, facnames, contrasts.fac, collapse, show, renorm), faclevel = ncol(data), facnames = letters[seq(length(faclevel))], contrasts.fac = "contr.poly", show = NULL, main = "", renorm = 1, drop.negative = FALSE, plotit = npickgene < 1, mfrow = c(nr, nc), mfcol = NULL, ylab = paste(shownames, "Trend"), ...)
pickgene(data, geneID = 1:nrow(data), overalllevel = 0.05, npickgene = -1, marginal = FALSE, rankbased = TRUE, allrank = FALSE, meanrank = FALSE, offset = 0, modelmatrix = model.pickgene(faclevel, facnames, contrasts.fac, collapse, show, renorm), faclevel = ncol(data), facnames = letters[seq(length(faclevel))], contrasts.fac = "contr.poly", show = NULL, main = "", renorm = 1, drop.negative = FALSE, plotit = npickgene < 1, mfrow = c(nr, nc), mfcol = NULL, ylab = paste(shownames, "Trend"), ...)
data |
data matrix |
geneID |
gene identifier (default |
overalllevel |
overall significance level (default |
npickgene |
number of genes to pick (default |
marginal |
additive model if TRUE, include interactions if FALSE |
rankbased |
use ranks if TRUE, log tranform if FALSE |
allrank |
rank all chips together if true, otherwise rank separately |
meanrank |
show mean abundance as rank if TRUE |
offset |
offset for log transform |
modelmatrix |
model matrix with first row all 1's and other rows
corresponding to design contrasts; automatically created by call
to |
faclevel |
number of factor levels for each factor |
facnames |
factor names |
contrasts.fac |
type of contrasts |
show |
vector of contrast numbers to show (default is all) |
main |
vector of main titles for plots (default is none) |
renorm |
vector to renormalize contrasts (e.g. use |
drop.negative |
drop negative values in log transform |
plotit |
plot if TRUE |
mfrow |
|
mfcol |
|
ylab |
vertical axis labels |
... |
parameters for |
Infer genes that differentially express across conditions using a robust
data-driven method. Adjusted gene expression levels A
are
replaced by qnorm(rank(A))
, followed by robustscale
estimation of center and spread. Then Bonferroni-style gene by gene
tests are performed and displayed graphically.
Data frame containing significant genes with the following information:
pick |
data frame with picked genes |
score |
data frame with center and spread for plotting |
Each of these is a list with elements for each contrast.
The pick
data frame elements have the following information:
probe |
gene identifier |
average |
average gene intensity |
fold1 |
positive fold change |
fold2 |
negative fold change |
pvalue |
Bonferroni-corrected p-value |
The score
data frame elements have the following:
x |
mean expression level (antilog scale) |
y |
contrast (antilog scale) |
center |
center for contrast |
scale |
scale (spread) for contrast |
lower |
lower test limit |
upper |
upper test limit |
Yi Lin and Brian Yandell
Y Lin, BS Yandell and ST Nadler (2000) “Robust Data-Driven Inference for Gene Expression Microarray Experiments,” Technical Report, Department of Statistics, UW-Madison.
## Not run: pickgene( data ) ## End(Not run)
## Not run: pickgene( data ) ## End(Not run)
Smoothing spline estimate of median and mean absolute deviation (MAD).
robustscale(y, x, nslice=400, corcenter=TRUE, decrease=TRUE)
robustscale(y, x, nslice=400, corcenter=TRUE, decrease=TRUE)
y |
response |
x |
predictor |
nslice |
number of slices (should be "large") |
corcenter |
correct for center |
decrease |
force MAD to decrease with |
This divides data into roughly many nslice
slices and computes
median and mean absolute deviation (mad
) for each slice. These are
then smoothed using smooth.spline
.
Data frame containing significant genes with the following information:
center |
estimate of center median |
scale |
MAD estimate of scale |
x |
ordered |
y |
|
Yi Lin
## Not run: robustscale(y,x) ## End(Not run)
## Not run: robustscale(y,x) ## End(Not run)
Example simulations
multipickgene
### Note: This uses old pickgene #detail of the model (7-8). (first run does not include meas error \eta_i) #par(mfrow=c(3,3)) t<-rnorm(10000,4,2) changes1<-rep(0,10000) changes1[1:500]<-rnorm(500) t1<-t+changes1 changes2<-rep(0,10000) changes2[1:500]<-rnorm(500) t2<-t+changes2 s<-rnorm(10000,0,0.1) cx<-3 cy<-2 t1<-t1+rnorm(10000,0,0.1) t2<-t2+rnorm(10000,0,0.1) x<-cx*exp(t1) y<-cy*exp(t2) #x<-cx*exp(t1)+rnorm(10000,0,50) #y<-cy*exp(t2)+rnorm(10000,0,40) xx<-qnorm(rank(x)/(10000+1)) yy<-qnorm(rank(y)/(10000+1)) #hist(x,breaks=100) #hist(y,breaks=100) #plot(x,y) #hist(y[x<=0],breaks=20) #hist(x[y<=0],breaks=20) #plot(xx,yy) topgenepick<-multipickgene( cbind(xx,yy),condi=0:1,geneID=1:10000, d=1, npickgene=500)$pick[[1]]$probe abchangesrank<-rank((-1)*abs(t1-t2)) count <- rep(NA,500) for( i in 1:500 ) { topipick <- topgenepick[1:i] count[i] <- sum( abchangesrank[topipick] <= i ) } ## Figure 2 plot( 1:500, 1:500, type="n", xlab="Rank of 500 most changed genes by our procedure", ylab="Number similarly ranked by the 'optimal' procedure", xaxs="i", yaxs="i" ) lines( 1:500, count, type="s", lty=1, lwd=2 ) abline(0,1) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="rank1.ps" ) #again, but with the additive noise. (includes \eta_i) par(mfrow=c(2,2)) t<-rnorm(10000,4,2) changes1<-rep(0,10000) changes1[1:500]<-rnorm(500) t1<-t+changes1 changes2<-rep(0,10000) changes2[1:500]<-rnorm(500) t2<-t+changes2 s<-rnorm(10000,0,0.1) cx<-3 cy<-2 t1<-t1+rnorm(10000,0,0.1) t2<-t2+rnorm(10000,0,0.1) ### note that noise is very large here (50,40) x<-cx*exp(t1)+rnorm(10000,0,50) y<-cy*exp(t2)+rnorm(10000,0,40) xx<-qnorm(rank(x)/(10000+1)) yy<-qnorm(rank(y)/(10000+1)) hist(x,breaks=100) hist(y,breaks=100) plot(x,y,cex=0.4) #hist(y[x<=0],breaks=20) #hist(x[y<=0],breaks=20) plot(xx,yy,cex=0.4) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="simudata.ps" ) topgenepick<-multipickgene(cbind(xx,yy),condi=0:1,geneID=1:10000, d=1, npickgene=500)$pick[[1]]$probe abchangesrank<-rank((-1)*abs(t1-t2)) count <- rep(NA,500) for( i in 1:500 ) { topipick <- topgenepick[1:i] count[i] <- sum( abchangesrank[topipick] <= i ) } par(mfrow=c(1,1)) # figure 4 plot( 1:500, 1:500, type="n", xlab="Rank of 500 most changed genes by our procedure", ylab="Number similarly ranked by the 'optimal' procedure", xaxs="i", yaxs="i" ) lines( 1:500, count, type="s", lty=1, lwd=2 ) abline(0,1) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="rank2.ps" ) ### Figure 5 genepick <- multipickgene( cbind(xx,yy), condi=0:1, geneID=1:10000, d=1) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="simutest.ps" )$pick[[1]]$probe npick<-length(genepick$pickedgene) genepick$pickedgene npick count[npick]
### Note: This uses old pickgene #detail of the model (7-8). (first run does not include meas error \eta_i) #par(mfrow=c(3,3)) t<-rnorm(10000,4,2) changes1<-rep(0,10000) changes1[1:500]<-rnorm(500) t1<-t+changes1 changes2<-rep(0,10000) changes2[1:500]<-rnorm(500) t2<-t+changes2 s<-rnorm(10000,0,0.1) cx<-3 cy<-2 t1<-t1+rnorm(10000,0,0.1) t2<-t2+rnorm(10000,0,0.1) x<-cx*exp(t1) y<-cy*exp(t2) #x<-cx*exp(t1)+rnorm(10000,0,50) #y<-cy*exp(t2)+rnorm(10000,0,40) xx<-qnorm(rank(x)/(10000+1)) yy<-qnorm(rank(y)/(10000+1)) #hist(x,breaks=100) #hist(y,breaks=100) #plot(x,y) #hist(y[x<=0],breaks=20) #hist(x[y<=0],breaks=20) #plot(xx,yy) topgenepick<-multipickgene( cbind(xx,yy),condi=0:1,geneID=1:10000, d=1, npickgene=500)$pick[[1]]$probe abchangesrank<-rank((-1)*abs(t1-t2)) count <- rep(NA,500) for( i in 1:500 ) { topipick <- topgenepick[1:i] count[i] <- sum( abchangesrank[topipick] <= i ) } ## Figure 2 plot( 1:500, 1:500, type="n", xlab="Rank of 500 most changed genes by our procedure", ylab="Number similarly ranked by the 'optimal' procedure", xaxs="i", yaxs="i" ) lines( 1:500, count, type="s", lty=1, lwd=2 ) abline(0,1) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="rank1.ps" ) #again, but with the additive noise. (includes \eta_i) par(mfrow=c(2,2)) t<-rnorm(10000,4,2) changes1<-rep(0,10000) changes1[1:500]<-rnorm(500) t1<-t+changes1 changes2<-rep(0,10000) changes2[1:500]<-rnorm(500) t2<-t+changes2 s<-rnorm(10000,0,0.1) cx<-3 cy<-2 t1<-t1+rnorm(10000,0,0.1) t2<-t2+rnorm(10000,0,0.1) ### note that noise is very large here (50,40) x<-cx*exp(t1)+rnorm(10000,0,50) y<-cy*exp(t2)+rnorm(10000,0,40) xx<-qnorm(rank(x)/(10000+1)) yy<-qnorm(rank(y)/(10000+1)) hist(x,breaks=100) hist(y,breaks=100) plot(x,y,cex=0.4) #hist(y[x<=0],breaks=20) #hist(x[y<=0],breaks=20) plot(xx,yy,cex=0.4) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="simudata.ps" ) topgenepick<-multipickgene(cbind(xx,yy),condi=0:1,geneID=1:10000, d=1, npickgene=500)$pick[[1]]$probe abchangesrank<-rank((-1)*abs(t1-t2)) count <- rep(NA,500) for( i in 1:500 ) { topipick <- topgenepick[1:i] count[i] <- sum( abchangesrank[topipick] <= i ) } par(mfrow=c(1,1)) # figure 4 plot( 1:500, 1:500, type="n", xlab="Rank of 500 most changed genes by our procedure", ylab="Number similarly ranked by the 'optimal' procedure", xaxs="i", yaxs="i" ) lines( 1:500, count, type="s", lty=1, lwd=2 ) abline(0,1) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="rank2.ps" ) ### Figure 5 genepick <- multipickgene( cbind(xx,yy), condi=0:1, geneID=1:10000, d=1) ## Not run: dev.print( hor=F, height=6.5, width=6.5, file="simutest.ps" )$pick[[1]]$probe npick<-length(genepick$pickedgene) genepick$pickedgene npick count[npick]