Title: | QC Pipeline and Data Analysis Tools for High-Dimensional Illumina mRNA Expression Data |
---|---|
Description: | QC pipeline and data analysis tools for high-dimensional Illumina mRNA expression data. |
Authors: | Weiliang Qiu [aut, cre], Brandon Guo [aut, ctb], Christopher Anderson [aut, ctb], Barbara Klanderman [aut, ctb], Vincent Carey [aut, ctb], Benjamin Raby [aut, ctb] |
Maintainer: | Weiliang Qiu <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.37.0 |
Built: | 2024-10-31 01:12:18 UTC |
Source: | https://github.com/bioc/iCheck |
Draw scatter plots for top results in whole-genome-wide analysis to test for the association of probes to a continuous-type phenotype variable.
boxPlots( resFrame, es, col.resFrame = c("probeIDs", "stats", "pval", "p.adj"), var.pheno = "sex", var.probe = "TargetID", var.gene = "Symbol", var.chr = "Chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "boxPlots.ps")
boxPlots( resFrame, es, col.resFrame = c("probeIDs", "stats", "pval", "p.adj"), var.pheno = "sex", var.probe = "TargetID", var.gene = "Symbol", var.chr = "Chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "boxPlots.ps")
resFrame |
A data frame stores testing results, which must contain columns that indicate probe id, test statistic, p-value and optionally adjusted p-value. |
es |
An |
col.resFrame |
A vector of characters indicating column names of |
var.pheno |
character. the name of continuous-type phenotype variable that is used to test the association of this variable to probes. |
var.probe |
character. the name of feature variable indicating probe id. |
var.gene |
character. the name of feature variable indicating gene symbol. |
var.chr |
character. the name of feature variable indicating chromosome number. |
nTop |
integer. indicating how many top tests will be used to draw the scatter plot. |
myylab |
character. indicating y-axis label. |
datExtrFunc |
name of the function to extract genomic data. For
an |
fileFlag |
logic. indicating if plot should be saved to an external figure file. |
fileFormat |
character. indicating the figure file type. Possible values are “ps”, “pdf”, or “jpeg”. All other values will produce “png” file. |
fileName |
character. indicating figure file name (file extension should be specified). For example,
you set |
Value 0
will be returned if no error occurs.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) res.limma = lmFitWrapper( es = es.sim, formula = ~as.factor(memSubj), pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE) boxPlots( resFrame=res.limma$frame, es=es.sim, col.resFrame = c("probeIDs", "stats", "pval"), var.pheno = "memSubj", var.probe = "probe", var.gene = "gene", var.chr = "chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "boxPlots.ps")
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) res.limma = lmFitWrapper( es = es.sim, formula = ~as.factor(memSubj), pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE) boxPlots( resFrame=res.limma$frame, es=es.sim, col.resFrame = c("probeIDs", "stats", "pval"), var.pheno = "memSubj", var.probe = "probe", var.gene = "gene", var.chr = "chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "boxPlots.ps")
Draw estimated density plots for all arrays.
densityPlots( es, requireLog2 = TRUE, myxlab = "expression level", mymain = "density plots", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "densityPlots.ps")
densityPlots( es, requireLog2 = TRUE, myxlab = "expression level", mymain = "density plots", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "densityPlots.ps")
es |
An |
requireLog2 |
logic. indicating if log2 transformation is required before estimating densities. |
myxlab |
character. indicating x-axis label. |
mymain |
character. indicating title of the plot. |
datExtrFunc |
name of the function to extract genomic data. For
an |
fileFlag |
logic. indicating if plot should be saved to an external figure file. |
fileFormat |
character. indicating the figure file type. Possible values are “ps”, “pdf”, or “jpeg”. All other values will produce “png” file. |
fileName |
character. indicating figure file name (file extension should be specified). For example,
you set |
A list object, the -th element is the object returned by
function
density
for the -th array.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) densityPlots( es = es.sim, requireLog2 = FALSE, myxlab = "expression level", mymain = "density plots", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "densityPlots.ps")
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) densityPlots( es = es.sim, requireLog2 = FALSE, myxlab = "expression level", mymain = "density plots", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "densityPlots.ps")
Generate a simple ExpressionSet object.
genExprSet( ex, pDat, fDat = NULL, annotation = "lumiHumanAll.db")
genExprSet( ex, pDat, fDat = NULL, annotation = "lumiHumanAll.db")
ex |
A matrix of expression levels. Rows are gene probes and columns are arrays. |
pDat |
A data frame describing arrays. Rows are arrays and columns are variables
describing arrays. The row names of |
fDat |
A data frame describing gene probes. Rows are gene probes and columns
are variables describing gene probes. The rownames of |
annotation |
character string. Indicating the annotation library
(e.g. |
an ExpressionSet object.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
Generating simulated data set from conditional normal distributions.
genSimData.BayesNormal( nCpGs, nCases, nControls, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 0.001, applier = lapply)
genSimData.BayesNormal( nCpGs, nCases, nControls, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 0.001, applier = lapply)
nCpGs |
integer. Number of genes. |
nCases |
integer. Number of cases. |
nControls |
integer. Number of controls. |
mu.n |
numeric. mean of the conditional normal distribution for controls. See details. |
mu.c |
numeric. mean of the conditional normal distribution for cases. See details. |
d0 |
integer. degree of freedom for scale-inverse chi squared distribution. See details. |
s02 |
numeric. scaling parameter for scale-inverse chi squared distribution for controls. See details. |
s02.c |
numeric. scaling parameter for scale-inverse chi squared distribution for cases. See details. |
testPara |
character string. indicating if the test is for testing equal mean, equal variance, or both. |
outlierFlag |
logical. indicating if outliers would be generated. If |
eps |
numeric. if |
applier |
function name to do |
Based on Phipson and Oshlack's (2014) simulation algorithm.
For each CpG site, variance of the DNA methylation was first sampled from an
scaled inverse chi-squared distribution with degree of freedom
and scaling parameter
:
.
M value for each CpG was then sampled from a normal distribution
with mean
and variance equal to the simulated variance
.
For cases, the variance was first generated from
.
M value for each CpG was then sampled from a normal distribution
with mean
and variance equal to the simulated variance
.
An ExpressionSet object. The phenotype data of the ExpressionSet object
contains 2 columns: arrayID
(array id) and memSubj (subject
membership, i.e., case (memSubj=1
) or control (memSubj=0
)).
The feature data of the ExpressionSet object contains 4 elements:
probe
(probe id), gene
(psuedo gene symbol), chr
(psuedo chromosome number), and memGenes
(indicating if a gene is differentially expressed (when testPara="mean"
)
or indicating if a gene is differentially variable (when testPara="var"
) ).
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
Phipson B, Oshlack A. DiffVar: A new method for detecting differential variability with application to methylation in cancer and aging. Genome Biol 2014; 15:465
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim)
Get principal components of arrays.
getPCAFunc(es, labelVariable = "subjID", hybName = "Hybridization_Name", requireLog2 = TRUE, corFlag = FALSE )
getPCAFunc(es, labelVariable = "subjID", hybName = "Hybridization_Name", requireLog2 = TRUE, corFlag = FALSE )
es |
An |
labelVariable |
A character string.
The name of a phenotype data variable use to
label the arrays in the boxplots. By default,
|
hybName |
character string. indicating the phenotype variable |
requireLog2 |
logical. |
corFlag |
logical. Indicating if correlation matrix ( |
A list with 3 elements:
es.s |
An |
pcs |
An object returned by the function |
requireLog2 |
logical. The same value as the input |
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) pca.obj = getPCAFunc(es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, corFlag = FALSE )
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) pca.obj = getPCAFunc(es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, corFlag = FALSE )
Perform glm test for all gene probes.
glmWrapper(es, formula = FEV1 ~ xi + age + gender, pos.var.interest = 1, family = gaussian, logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "ProbeID", gene.var = "Symbol", chr.var = "Chromosome", applier = lapply, verbose = TRUE)
glmWrapper(es, formula = FEV1 ~ xi + age + gender, pos.var.interest = 1, family = gaussian, logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "ProbeID", gene.var = "Symbol", chr.var = "Chromosome", applier = lapply, verbose = TRUE)
es |
An LumiBatch object.
|
formula |
An object of class |
pos.var.interest |
integer. Indicates which covariate
in the right-hand-size of |
family |
By default is gaussian. refer to |
logit |
logical. Indicate if the gene probes will be logit transformed.
For example, for DNA methylation data, one might want to logit transformation
for the beta-value ( |
pvalAdjMethod |
One of p-value adjustment methods provided by
the R function |
alpha |
Significance level. A test is claimed to be significant
if the adjusted p-value |
probeID.var |
character string. Name of the variable indicating probe ID in feature data set. |
gene.var |
character string. Name of the variable indicating gene symbol in feature data set. |
chr.var |
character string. Name of the variable indicating chromosome number in feature data set. |
applier |
By default, it is lapply. If the library multicore is available, can use mclapply to replace lappy. |
verbose |
logical. Determine if intermediate output need to be suppressed. By default
|
This function applies R function glm
for each gene probe.
A list with the following elements:
n.sig |
Number of significant tests after p-value adjustment. |
frame |
A data frame containing test results sorted according
to the ascending order of unadjusted p-values for the covariate
of the interest. The data frame contains
7 columns: |
statMat |
A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the covariate of the interest. |
pvalMat |
A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the covariate of the interest. |
pval.quantile |
Quantiles (minimum, 25
for each covariate including intercept provided in the
input argument |
frame.unsorted |
A data frame containing test results.
The data frame contains
7 columns: |
statMat.unsorted |
A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. |
pvalMat.unsorted |
A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. |
memGenes |
A numeric vector indicating the cluster membership
of probes (unsorted).
|
memGenes2 |
A numeric vector indicating the cluster membership
of probes (unsorted).
|
mu1 |
Mean expression levels for arrays for probe cluster 1
(average taking across all probes with |
mu2 |
Mean expression levels for arrays for probe cluster 2
(average taking across all probes with |
mu3 |
Mean expression levels for arrays for probe cluster 3
(average taking across all probes with |
resMat |
A matrix with |
If the covariate of the interest is a factor or interaction term with more than 2 levels, then the p-value of the likelihood ratio test might be more appropriate than the smallest p-value for the covariate of the interest.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) res.glm = glmWrapper( es = es.sim, formula = xi~as.factor(memSubj), pos.var.interest = 1, family = gaussian, logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", applier = lapply, verbose = TRUE)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) res.glm = glmWrapper( es = es.sim, formula = xi~as.factor(memSubj), pos.var.interest = 1, family = gaussian, logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", applier = lapply, verbose = TRUE)
Perform glm test for all gene probes.
lkhrWrapper(es, formulaReduced = FEV1 ~ xi + gender, formulaFull = FEV1 ~ xi + age + gender, family = gaussian, logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "ProbeID", gene.var = "Symbol", chr.var = "Chromosome", applier = lapply, verbose = TRUE)
lkhrWrapper(es, formulaReduced = FEV1 ~ xi + gender, formulaFull = FEV1 ~ xi + age + gender, family = gaussian, logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "ProbeID", gene.var = "Symbol", chr.var = "Chromosome", applier = lapply, verbose = TRUE)
es |
An LumiBatch object.
|
formulaReduced |
An object of class |
formulaFull |
An object of class |
family |
By default is gaussian. refer to |
logit |
logical. Indicate if the gene probes will be logit transformed.
For example, for DNA methylation data, one might want to logit transformation
for the beta-value ( |
pvalAdjMethod |
One of p-value adjustment methods provided by
the R function |
alpha |
Significance level. A test is claimed to be significant
if the adjusted p-value |
probeID.var |
character string. Name of the variable indicating probe ID in feature data set. |
gene.var |
character string. Name of the variable indicating gene symbol in feature data set. |
chr.var |
character string. Name of the variable indicating chromosome number in feature data set. |
applier |
By default, it is lapply. If the library multicore is available, can use mclapply to replace lappy. |
verbose |
logical. Determine if intermediate output need to be suppressed. By default
|
This function applies R functions lrtest
in R package lmtest
and glm
for each gene probe.
A list with the following elements:
frame |
A data frame containing test results sorted according
to the ascending order of unadjusted p-values for the covariate
of the interest. The data frame contains
8 columns: |
frame.unsorted |
A data frame containing test results.
unordered |
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) set.seed(1234567) es.sim$age = rnorm(ncol(es.sim), mean=50, sd=5) res.lkh = lkhrWrapper( es = es.sim, formulaReduced = xi ~ memSubj, formulaFull = xi ~ memSubj + age, family = gaussian(), logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", applier = lapply, verbose = TRUE)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) set.seed(1234567) es.sim$age = rnorm(ncol(es.sim), mean=50, sd=5) res.lkh = lkhrWrapper( es = es.sim, formulaReduced = xi ~ memSubj, formulaFull = xi ~ memSubj + age, family = gaussian(), logit = FALSE, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", applier = lapply, verbose = TRUE)
A wrapper function for the function 'lmFit' of the R Bioconductor package 'limma' for paired data.
lmFitPaired( esDiff, formula = ~1, pos.var.interest = 0, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var="ProbeID", gene.var = "Symbol", chr.var = "Chromosome", verbose = TRUE)
lmFitPaired( esDiff, formula = ~1, pos.var.interest = 0, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var="ProbeID", gene.var = "Symbol", chr.var = "Chromosome", verbose = TRUE)
esDiff |
An LumiBatch object containing log2 difference between cases and controls.
|
formula |
An object of class |
pos.var.interest |
integer. Indicates which covariate on the right-hand-side of |
pvalAdjMethod |
One of p-value adjustment methods provided by
the R function |
alpha |
Significance level. A test is claimed to be significant
if the adjusted p-value |
probeID.var |
character string. Name of the variable indicating probe ID in feature data set. |
gene.var |
character string. Name of the variable indicating gene symbol in feature data set. |
chr.var |
character string. Name of the variable indicating chromosome number in feature data set. |
verbose |
logical. Determine if intermediate output need to be suppressed. By default
|
This is a wrapper function of R Bioconductor functions
lmFit
and eBayes
for paired data
to make it easier to input design and
output list of significant results.
A list with the following elements:
n.sig |
Number of significant tests after p-value adjustment. |
frame |
A data frame containing test results sorted according
to the ascending order of unadjusted p-values for the intercept.
The data frame contains
7 columns: |
statMat |
A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the intercept. |
pvalMat |
A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the intercept. |
pval.quantile |
Quantiles (minimum, 25
for all covariates including intercept provided in the
input argument |
frame.unsorted |
A data frame containing test results.
The data frame contains
7 columns: |
statMat.unsorted |
A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. |
pvalMat.unsorted |
A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. |
memGenes |
A numeric vector indicating the cluster membership
of probes (unsorted).
|
memGenes2 |
A numeric vector indicating the cluster membership
of probes (unsorted).
|
mu1 |
Mean expression levels for arrays for probe cluster 1
(average taking across all probes with |
mu2 |
Mean expression levels for arrays for probe cluster 2
(average taking across all probes with |
mu3 |
Mean expression levels for arrays for probe cluster 3
(average taking across all probes with |
ebFit |
object returned by R Bioconductor function |
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) # although the generated data is not from # paired design, we use it to illusrate the # usage of the function lmFitPaired res.limma = lmFitPaired( es = es.sim, formula = ~as.factor(memSubj), pos.var.interest = 0, # the intercept is what we are interested pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) # although the generated data is not from # paired design, we use it to illusrate the # usage of the function lmFitPaired res.limma = lmFitPaired( es = es.sim, formula = ~as.factor(memSubj), pos.var.interest = 0, # the intercept is what we are interested pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE)
A wrapper function for the function 'lmFit' of the R Bioconductor package 'limma'.
lmFitWrapper( es, formula = ~as.factor(gender), pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "ProbeID", gene.var = "Symbol", chr.var = "Chromosome", verbose = TRUE)
lmFitWrapper( es, formula = ~as.factor(gender), pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "ProbeID", gene.var = "Symbol", chr.var = "Chromosome", verbose = TRUE)
es |
An LumiBatch object.
|
formula |
An object of class |
pos.var.interest |
integer. Indicates which covariate on the right-hand-side of |
pvalAdjMethod |
One of p-value adjustment methods provided by
the R function |
alpha |
Significance level. A test is claimed to be significant
if the adjusted p-value |
probeID.var |
character string. Name of the variable indicating probe ID in feature data set. |
gene.var |
character string. Name of the variable indicating gene symbol in feature data set. |
chr.var |
character string. Name of the variable indicating chromosome number in feature data set. |
verbose |
logical. Determine if intermediate output need to be suppressed. By default
|
This is a wrapper function of R Bioconductor functions
lmFit
and eBayes
to make it easier to input design and
output list of significant results.
A list with the following elements:
n.sig |
Number of significant tests after p-value adjustment. |
frame |
A data frame containing test results sorted according
to the ascending order of unadjusted p-values for the covariate of the
interest. The data frame contains
7 columns: |
statMat |
A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the covariate of the interest. |
pvalMat |
A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the covariate of the interest. |
pval.quantile |
Quantiles (minimum, 25
for all covariates including intercept provided in the
input argument |
frame.unsorted |
A data frame containing test results.
The data frame contains
7 columns: |
statMat.unsorted |
A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. |
pvalMat.unsorted |
A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. |
memGenes |
A numeric vector indicating the cluster membership
of probes (unsorted).
|
memGenes2 |
A numeric vector indicating the cluster membership
of probes (unsorted).
|
mu1 |
Mean expression levels for arrays for probe cluster 1
(average taking across all probes with |
mu2 |
Mean expression levels for arrays for probe cluster 2
(average taking across all probes with |
mu3 |
Mean expression levels for arrays for probe cluster 3
(average taking across all probes with |
ebFit |
object returned by R Bioconductor function |
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) res.limma = lmFitWrapper( es = es.sim, formula = ~as.factor(memSubj), pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) res.limma = lmFitWrapper( es = es.sim, formula = ~as.factor(memSubj), pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE)
Output slots (exprs, pData, fData) of an LumiBatch object into 3 text files.
LumiBatch2Table( es, probeID.var="ProbeID", gene.var="Symbol", chr.var="Chromosome", sep = ",", quote = FALSE, filePrefix = "test", fileExt = "csv")
LumiBatch2Table( es, probeID.var="ProbeID", gene.var="Symbol", chr.var="Chromosome", sep = ",", quote = FALSE, filePrefix = "test", fileExt = "csv")
es |
An LumiBatch object |
probeID.var |
character string. Name of the variable indicating probe ID in feature data set. |
gene.var |
character string. Name of the variable indicating gene symbol in feature data set. |
chr.var |
character string. Name of the variable indicating chromosome number in feature data set. |
sep |
Field delimiter for the output text files |
quote |
logical. Indicating if any character or factor. See also |
filePrefix |
Prefix of the names of the output files. |
fileExt |
File extension of the names of the output files. |
Suppose filePrefix="test"
and fileExt=".csv"
.
Then, the file names of the 3 output files are:
“test_exprs.csv”, “test_pDat.csv”,
and “test_fDat.csv”, respectively.
None.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) LumiBatch2Table( es = es.sim, probeID.var="probe", gene.var="gene", chr.var="chr", sep = ",", quote = FALSE, filePrefix = "test", fileExt = "csv")
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) LumiBatch2Table( es = es.sim, probeID.var="probe", gene.var="gene", chr.var="chr", sep = ",", quote = FALSE, filePrefix = "test", fileExt = "csv")
Scatter plot of first 2 principal components.
pca2DPlot(pcaObj, plot.dim = c(1,2), labelVariable = "subjID", hybName = "Hybridization_Name", outFileName = "test_pca_raw.pdf", title = "Scatter plot of pcas", plotOutPutFlag = FALSE, mar = c(5, 4, 4, 2) + 0.1, lwd = 1.5, equalRange = TRUE, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, cex.legend = 1.5, cex = 1.5, cex.lab = 1.5, cex.axis = 1.5, legendPosition = "topright", ...)
pca2DPlot(pcaObj, plot.dim = c(1,2), labelVariable = "subjID", hybName = "Hybridization_Name", outFileName = "test_pca_raw.pdf", title = "Scatter plot of pcas", plotOutPutFlag = FALSE, mar = c(5, 4, 4, 2) + 0.1, lwd = 1.5, equalRange = TRUE, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, cex.legend = 1.5, cex = 1.5, cex.lab = 1.5, cex.axis = 1.5, legendPosition = "topright", ...)
pcaObj |
An object returned by the function |
plot.dim |
A vector of 2 positive-integer-value integer specifying which 2 pcas will be plot. |
labelVariable |
The name of a column of the phenotype data matrix. The elements of the column will replace the column names of the expression data matrix. |
hybName |
character string. indicating the phenotype variable |
outFileName |
Name of the figure file to be created. |
title |
Title of the scatter plot. |
plotOutPutFlag |
logical. |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot. The default is 'c(5, 4, 4, 2) +
0.1'. see |
lwd |
The line width, a _positive_ number, defaulting to '1'.
see |
equalRange |
logical. Indicating if the x-axis and y-axis have the same range. |
xlab |
Label of x axis. |
ylab |
Label of y axis. |
xlim |
Range of x axis. |
ylim |
Range of y axis. |
cex.legend |
Font size for legend. |
cex |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default.
see |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. see |
legendPosition |
Position of legend. Possible values are “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”. |
... |
Arguments to be passed to |
A matrix of PCA scores. Each column corresponds to a principal component.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) pca.obj = getPCAFunc(es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, corFlag = FALSE ) pca2DPlot(pcaObj = pca.obj, plot.dim = c(1,2), labelVariable = "subjID", hybName = "memSubj", plotOutPutFlag = FALSE, cex.legend = 0.5, legendPosition = "topright")
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) pca.obj = getPCAFunc(es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, corFlag = FALSE ) pca2DPlot(pcaObj = pca.obj, plot.dim = c(1,2), labelVariable = "subjID", hybName = "memSubj", plotOutPutFlag = FALSE, cex.legend = 0.5, legendPosition = "topright")
Scatter plot of 3 specified principal components.
pca3DPlot(pcaObj, plot.dim = c(1,2, 3), labelVariable = "subjID", hybName = "Hybridization_Name", outFileName = "test_pca_raw.pdf", title = "Scatter plot of pcas", plotOutPutFlag = FALSE, mar = c(5, 4, 4, 2) + 0.1, lwd = 1.5, equalRange = TRUE, xlab = NULL, ylab = NULL, zlab = NULL, xlim = NULL, ylim = NULL, zlim = NULL, cex.legend = 1.5, cex = 1.5, cex.lab = 1.5, cex.axis = 1.5, legendPosition = "topright", ...)
pca3DPlot(pcaObj, plot.dim = c(1,2, 3), labelVariable = "subjID", hybName = "Hybridization_Name", outFileName = "test_pca_raw.pdf", title = "Scatter plot of pcas", plotOutPutFlag = FALSE, mar = c(5, 4, 4, 2) + 0.1, lwd = 1.5, equalRange = TRUE, xlab = NULL, ylab = NULL, zlab = NULL, xlim = NULL, ylim = NULL, zlim = NULL, cex.legend = 1.5, cex = 1.5, cex.lab = 1.5, cex.axis = 1.5, legendPosition = "topright", ...)
pcaObj |
An object returned by the function |
plot.dim |
A vector of 3 positive-integer-value integer specifying which 3 pcas will be plot. |
labelVariable |
The name of a column of the phenotype data matrix. The elements of the column will replace the column names of the expression data matrix. |
hybName |
character string. indicating the phenotype variable |
outFileName |
Name of the figure file to be created. |
title |
Title of the scatter plot. |
plotOutPutFlag |
logical. |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot. The default is 'c(5, 4, 4, 2) +
0.1'. see |
lwd |
The line width, a _positive_ number, defaulting to '1'.
see |
equalRange |
logical. Indicating if the x-axis and y-axis have the same range. |
xlab |
Label of x axis. |
ylab |
Label of y axis. |
zlab |
Label of z axis. |
xlim |
Range of x axis. |
ylim |
Range of y axis. |
zlim |
Range of z axis. |
cex.legend |
Font size for legend. |
cex |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default.
see |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. see |
legendPosition |
Position of legend. Possible values are “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”. |
... |
Arguments to be passed to |
A matrix of PCA scores. Each column corresponds to a principal component.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) pca.obj = getPCAFunc(es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, corFlag = FALSE ) pca3DPlot(pcaObj = pca.obj, plot.dim = c(1,2,3), labelVariable = "subjID", hybName = "memSubj", plotOutPutFlag = FALSE, cex.legend = 0.5, legendPosition = "topright")
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) pca.obj = getPCAFunc(es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, corFlag = FALSE ) pca3DPlot(pcaObj = pca.obj, plot.dim = c(1,2,3), labelVariable = "subjID", hybName = "memSubj", plotOutPutFlag = FALSE, cex.legend = 0.5, legendPosition = "topright")
Plot trajectories of probe profiles across arrays
plotCurves( dat, curveNames, fileName, plotOutPutFlag=FALSE, requireLog2 = FALSE, cex = 1, ylim = NULL, xlab = "", ylab = "intensity", lwd = 3, main = "Trajectory plot", mar = c(10, 4, 4, 2) + 0.1, las = 2, cex.axis=1, ...)
plotCurves( dat, curveNames, fileName, plotOutPutFlag=FALSE, requireLog2 = FALSE, cex = 1, ylim = NULL, xlab = "", ylab = "intensity", lwd = 3, main = "Trajectory plot", mar = c(10, 4, 4, 2) + 0.1, las = 2, cex.axis=1, ...)
dat |
Numeric data matrix. Rows are probes; columns are arrays. |
curveNames |
Probe names. |
fileName |
file name of output figure. |
plotOutPutFlag |
logical. |
requireLog2 |
logical. |
cex |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default.
see |
ylim |
Range of y axis. |
xlab |
Label of x axis. |
ylab |
Label of y axis. |
lwd |
The line width, a _positive_ number, defaulting to '1'.
see |
main |
Main title of the plot. |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot. The default is 'c(5, 4, 4, 2) +
0.1'. see |
las |
'las' numeric in 0,1,2,3; the style of axis labels. 0 - always parallel to the axis, 1 - always horizontal, 2 - always perpendicular to the axis, or 3 - always vertical. see |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. see |
... |
Arguments to be passed to |
no return value.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) # plot trajectories of the first 5 genes plotCurves( dat = exprs(es.sim)[1:5,], curveNames = featureNames(es.sim)[1:5], plotOutPutFlag=FALSE, cex = 0.5, requireLog2 = FALSE)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) # plot trajectories of the first 5 genes plotCurves( dat = exprs(es.sim)[1:5,], curveNames = featureNames(es.sim)[1:5], plotOutPutFlag=FALSE, cex = 0.5, requireLog2 = FALSE)
Plot trajectories of specific QC probes (e.g., biotin, cy3_hyb, housekeeping gene probes, low stringency probes, etc.) across arrays
plotQCCurves( esQC, probes = c("biotin", "cy3_hyb", "housekeeping", "low_stringency_hyb", "signal", "p95p05"), labelVariable = "subjID", hybName = "Hybridization_Name", reporterGroupName = "Reporter_Group_Name", requireLog2 = TRUE, projectName = "test", plotOutPutFlag = FALSE, cex = 1, ylim = NULL, xlab = "", ylab = "intensity", lwd = 3, mar = c(10, 4, 4, 2) + 0.1, las = 2, cex.axis = 1, sortFlag = TRUE, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA), ...)
plotQCCurves( esQC, probes = c("biotin", "cy3_hyb", "housekeeping", "low_stringency_hyb", "signal", "p95p05"), labelVariable = "subjID", hybName = "Hybridization_Name", reporterGroupName = "Reporter_Group_Name", requireLog2 = TRUE, projectName = "test", plotOutPutFlag = FALSE, cex = 1, ylim = NULL, xlab = "", ylab = "intensity", lwd = 3, mar = c(10, 4, 4, 2) + 0.1, las = 2, cex.axis = 1, sortFlag = TRUE, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA), ...)
esQC |
ExpressionSet object of QC probe profiles. |
probes |
A character vectors of QC probe names. By default, it includes the following probe names “biotin”, “cy3_hyb”, “housekeeping”, “low_stringency_hyb”, “signal”, “p95p05”. For “signal”, trajectories of 5th, 25th, 50th, 75th, and 95th percentiles of the expression levels of all QC probes will be ploted. For “p95p05”, the trajectory of the ratio of 95th percentile to 5th percentile of the expression levels of all QC probes will be ploted. |
labelVariable |
A character string.
The name of a phenotype data variable use to
label the arrays in the boxplots. By default,
|
hybName |
character string. indicating the phenotype variable |
reporterGroupName |
character string. indicating feature variable |
requireLog2 |
logical. |
projectName |
A character string. Name of the project. The plots will be saved as pdf format files,
the names of which have the format
|
plotOutPutFlag |
logical. |
cex |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default.
see |
ylim |
Range of y axis. |
xlab |
Label of x axis. |
ylab |
Label of y axis. |
lwd |
The line width, a _positive_ number, defaulting to '1'.
see |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot. The default is 'c(5, 4, 4, 2) +
0.1'. see |
las |
'las' numeric in 0,1,2,3; the style of axis labels. 0 - always parallel to the axis, 1 - always horizontal, 2 - always perpendicular to the axis, or 3 - always vertical. see |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. see |
sortFlag |
logical. Indicates if arrays need to be sorted according to
|
varSort |
A vector of phenotype variable names to be used to sort the samples
of |
timeFormat |
A vector of time format for the possible time variables in |
... |
Arguments to be passed to |
no return value.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) esQC.sim = genSimData.BayesNormal(nCpGs = 10, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(esQC.sim) fDat = fData(esQC.sim) esQC.sim$Hybridization_Name = sampleNames(esQC.sim) fDat$Reporter_Group_Name = c( rep("biotin", 5), rep("housekeeping", 5)) fData(esQC.sim)=fDat # plot trajectories of the QC probes plotQCCurves( esQC = esQC.sim, probes = c("biotin", "housekeeping"), labelVariable = "subjID", hybName = "Hybridization_Name", reporterGroupName = "Reporter_Group_Name", requireLog2 = FALSE, plotOutPutFlag = FALSE, sortFlag = FALSE)
# generate simulated data set from conditional normal distribution set.seed(1234567) esQC.sim = genSimData.BayesNormal(nCpGs = 10, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(esQC.sim) fDat = fData(esQC.sim) esQC.sim$Hybridization_Name = sampleNames(esQC.sim) fDat$Reporter_Group_Name = c( rep("biotin", 5), rep("housekeeping", 5)) fData(esQC.sim)=fDat # plot trajectories of the QC probes plotQCCurves( esQC = esQC.sim, probes = c("biotin", "housekeeping"), labelVariable = "subjID", hybName = "Hybridization_Name", reporterGroupName = "Reporter_Group_Name", requireLog2 = FALSE, plotOutPutFlag = FALSE, sortFlag = FALSE)
Plot trajectories of the ratio of 95th percentile to 5th percentile of sample probe profiles across arrays.
plotSamplep95p05( es, labelVariable = "subjID", hybName = "Hybridization_Name", requireLog2 = FALSE, projectName = "test", plotOutPutFlag = FALSE, cex = 1, ylim = NULL, xlab = "", ylab = "", lwd = 1.5, mar = c(10, 4, 4, 2) + 0.1, las = 2, cex.axis=1.5, title = "Trajectory of p95/p05", cex.legend = 1.5, cex.lab = 1.5, legendPosition = "topright", cut1 = 10, cut2 = 6, sortFlag = TRUE, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA), verbose = FALSE, ...)
plotSamplep95p05( es, labelVariable = "subjID", hybName = "Hybridization_Name", requireLog2 = FALSE, projectName = "test", plotOutPutFlag = FALSE, cex = 1, ylim = NULL, xlab = "", ylab = "", lwd = 1.5, mar = c(10, 4, 4, 2) + 0.1, las = 2, cex.axis=1.5, title = "Trajectory of p95/p05", cex.legend = 1.5, cex.lab = 1.5, legendPosition = "topright", cut1 = 10, cut2 = 6, sortFlag = TRUE, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA), verbose = FALSE, ...)
es |
ExpressionSet object of Sample probe profiles. |
labelVariable |
A character string.
The name of a phenotype data variable use to
label the arrays in the boxplots. By default,
|
hybName |
character string. indicating the phenotype variable |
requireLog2 |
logical. |
projectName |
A character string. Name of the project. The plots will be saved as pdf format files,
the names of which have the format
|
plotOutPutFlag |
logical. |
cex |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default.
see |
ylim |
Range of y axis. |
xlab |
Label of x axis. |
ylab |
Label of y axis. |
lwd |
The line width, a _positive_ number, defaulting to '1'.
see |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot. The default is 'c(5, 4, 4, 2) +
0.1'. see |
las |
'las' numeric in 0,1,2,3; the style of axis labels. 0 - always parallel to the axis, 1 - always horizontal, 2 - always perpendicular to the axis, or 3 - always vertical. see |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. see |
title |
Figure title. |
cex.legend |
Font size of legend text. |
cex.lab |
The magnification to be used for x and y labels relative to the current setting of cex. |
legendPosition |
Position of legend. Possible values are “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”. |
cut1 |
second horiztonal line setting the cutoff for the ratio |
cut2 |
second horiztonal line setting the cutoff for the ratio |
sortFlag |
logical. Indicates if arrays need to be sorted according to
|
varSort |
A vector of phenotype variable names to be used to sort the samples
of |
timeFormat |
A vector of time format for the possible time variables in |
verbose |
logical. Determine if intermediate output need to be suppressed. By default
|
... |
Arguments to be passed to |
The trajectory of the ratio of 95 to 5
A list of 2 elements. The first element is the 2 x n matrix, where n is the number of arrays. The first row of the matrix is the 5-th percentile and the second row of the matrix is the 95-th percentile.
The second element is the ratio of the 95-th percentile to the 5-th percentile.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) es.sim$Batch_Run_Date = 1:ncol(es.sim) es.sim$Chip_Barcode = 1:ncol(es.sim) es.sim$Chip_Address = 1:ncol(es.sim) plotSamplep95p05( es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, projectName = "test", plotOutPutFlag = FALSE, title = "Trajectory of p95/p05", cex.legend = 0.5, legendPosition = "topright", sortFlag = TRUE, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA), verbose = FALSE)
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) es.sim$Batch_Run_Date = 1:ncol(es.sim) es.sim$Chip_Barcode = 1:ncol(es.sim) es.sim$Chip_Address = 1:ncol(es.sim) plotSamplep95p05( es = es.sim, labelVariable = "subjID", hybName = "memSubj", requireLog2 = FALSE, projectName = "test", plotOutPutFlag = FALSE, title = "Trajectory of p95/p05", cex.legend = 0.5, legendPosition = "topright", sortFlag = TRUE, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA), verbose = FALSE)
Plot trajectories of quantiles across arrays.
quantilePlot( dat, fileName, probs = c(0, 0.05, 0.25, 0.5, 0.75, 0.95, 1), plotOutPutFlag = FALSE, requireLog2 = FALSE, sortFlag = TRUE, cex = 1, ylim = NULL, xlab = "", ylab = "intensity", lwd = 3, main = "Trajectory plot of quantiles", mar = c(15, 4, 4, 2) + 0.1, las = 2, cex.axis = 1)
quantilePlot( dat, fileName, probs = c(0, 0.05, 0.25, 0.5, 0.75, 0.95, 1), plotOutPutFlag = FALSE, requireLog2 = FALSE, sortFlag = TRUE, cex = 1, ylim = NULL, xlab = "", ylab = "intensity", lwd = 3, main = "Trajectory plot of quantiles", mar = c(15, 4, 4, 2) + 0.1, las = 2, cex.axis = 1)
dat |
Expression data. Rows are gene probes; columns are arrays. |
fileName |
File name of output figure. |
probs |
quantiles (any real values between the interval |
plotOutPutFlag |
logical. |
requireLog2 |
logical. |
sortFlag |
logical. |
cex |
numerical value giving the amount by which plotting text
and symbols should be magnified relative to the default.
see |
ylim |
Range of y axis. |
xlab |
Label of x axis. |
ylab |
Label of y axis. |
lwd |
The line width, a _positive_ number, defaulting to '1'.
see |
main |
Charater string. main title of the plot. |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot. The default is 'c(5, 4, 4, 2) +
0.1'. see |
las |
'las' numeric in 0,1,2,3; the style of axis labels. 0 - always parallel to the axis, 1 - always horizontal, 2 - always perpendicular to the axis, or 3 - always vertical. see |
cex.axis |
The magnification to be used for axis annotation relative to the current setting of cex. see |
The quantile matrix with row quantiles and column array.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) png(file="qplot.png") quantilePlot( dat = exprs(es.sim), probs = c(0, 0.05, 0.25, 0.5, 0.75, 0.95, 1), plotOutPutFlag = FALSE, requireLog2 = FALSE, sortFlag = TRUE) dev.off()
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) png(file="qplot.png") quantilePlot( dat = exprs(es.sim), probs = c(0, 0.05, 0.25, 0.5, 0.75, 0.95, 1), plotOutPutFlag = FALSE, requireLog2 = FALSE, sortFlag = TRUE) dev.off()
Draw heatmap of square of correlations among arrays.
R2PlotFunc( es, hybName = "Hybridization_Name", arrayType = c("all", "replicates", "GC"), GCid = c("128115", "Hela", "Brain"), probs = seq(0, 1, 0.25), col = gplots::greenred(75), labelVariable = "subjID", outFileName = "test_R2_raw.pdf", title = "Raw Data R^2 Plot", requireLog2 = FALSE, plotOutPutFlag = FALSE, las = 2, keysize = 1, margins = c(10, 10), sortFlag = TRUE, varSort=c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat=c("%m/%d/%Y", NA, NA), ...)
R2PlotFunc( es, hybName = "Hybridization_Name", arrayType = c("all", "replicates", "GC"), GCid = c("128115", "Hela", "Brain"), probs = seq(0, 1, 0.25), col = gplots::greenred(75), labelVariable = "subjID", outFileName = "test_R2_raw.pdf", title = "Raw Data R^2 Plot", requireLog2 = FALSE, plotOutPutFlag = FALSE, las = 2, keysize = 1, margins = c(10, 10), sortFlag = TRUE, varSort=c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat=c("%m/%d/%Y", NA, NA), ...)
es |
ExpressionSet object of QC probe profiles. |
hybName |
character string. indicating the phenotype variable |
arrayType |
A character string indicating if the correlations are calculated based on all arrays, arrays with replicates, or genetic control arrays. |
GCid |
A vector of character string. symbols for genetic control samples. The symbols can be more than one. |
probs |
A vector of probabilities specify the quantiles of correlations to be output. |
col |
colors used for the image. see the function |
labelVariable |
A character string indicating how to label the arrays. |
outFileName |
A character string. The name of output file. |
title |
Title of the plot. |
requireLog2 |
logical. |
plotOutPutFlag |
logical. |
las |
'las' numeric in 0,1,2,3; the style of axis labels. 0 - always parallel to the axis, 1 - always horizontal, 2 - always perpendicular to the axis, or 3 - always vertical. see |
keysize |
numeric value indicating the size of the key.
see the function |
margins |
numeric vector of length 2 containing the margins.
see the function |
sortFlag |
logical. Indicates if arrays need to be sorted according to
|
varSort |
A vector of phenotype variable names to be used to sort the samples
of |
timeFormat |
A vector of time format for the possible time variables in |
... |
Arguments to be passed to |
A list with 3 elments. The first element R2Mat
is
the matrix of squared correlation.
The second element R2vec
is the vector of
the upper triangle of the matrix
of squared correlation (diagnoal elements are excluded).
The third element R2vec.within.req
is the vector of
within-replicate ,
that is, any element in
R2vec.within.req
is the squared correlation
coefficient between two arrays/replicates for a subject.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) es.sim$Batch_Run_Date = 1:ncol(es.sim) es.sim$Chip_Barcode = 1:ncol(es.sim) es.sim$Chip_Address = 1:ncol(es.sim) # draw heatmap for the first 5 subjects png(file="r2plot.png") R2PlotFunc( es = es.sim[, 1:5], hybName = "memSubj", arrayType = c("all", "replicates", "GC"), GCid = c("128115", "Hela", "Brain"), probs = seq(0, 1, 0.25), col = gplots::greenred(75), labelVariable = "subjID", outFileName = "test_R2_raw.pdf", title = "Raw Data R^2 Plot", requireLog2 = FALSE, plotOutPutFlag = FALSE, las = 2, keysize = 1, margins = c(10, 10), sortFlag = TRUE, varSort=c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat=c("%m/%d/%Y", NA, NA)) dev.off()
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) es.sim$Batch_Run_Date = 1:ncol(es.sim) es.sim$Chip_Barcode = 1:ncol(es.sim) es.sim$Chip_Address = 1:ncol(es.sim) # draw heatmap for the first 5 subjects png(file="r2plot.png") R2PlotFunc( es = es.sim[, 1:5], hybName = "memSubj", arrayType = c("all", "replicates", "GC"), GCid = c("128115", "Hela", "Brain"), probs = seq(0, 1, 0.25), col = gplots::greenred(75), labelVariable = "subjID", outFileName = "test_R2_raw.pdf", title = "Raw Data R^2 Plot", requireLog2 = FALSE, plotOutPutFlag = FALSE, las = 2, keysize = 1, margins = c(10, 10), sortFlag = TRUE, varSort=c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat=c("%m/%d/%Y", NA, NA)) dev.off()
Draw scatter plots for top results in whole-genome-wide analysis to test for the association of probes to a continuous-type phenotype variable.
scatterPlots( resFrame, es, col.resFrame = c("probeIDs", "stats", "pval", "p.adj"), var.pheno = "bmi", outcomeFlag = FALSE, fitLineFlag = TRUE, var.probe = "TargetID", var.gene = "Symbol", var.chr = "Chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "scatterPlots.ps")
scatterPlots( resFrame, es, col.resFrame = c("probeIDs", "stats", "pval", "p.adj"), var.pheno = "bmi", outcomeFlag = FALSE, fitLineFlag = TRUE, var.probe = "TargetID", var.gene = "Symbol", var.chr = "Chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "scatterPlots.ps")
resFrame |
A data frame stores testing results, which must contain columns that indicate probe id, test statistic, p-value and optionally adjusted p-value. |
es |
An |
col.resFrame |
A vector of characters indicating column names of |
var.pheno |
character. the name of continuous-type phenotype variable that is used to test the association of this variable to probes. |
outcomeFlag |
logic. indicating if |
fitLineFlag |
logic. indicating if a fitted line |
var.probe |
character. the name of feature variable indicating probe id. |
var.gene |
character. the name of feature variable indicating gene symbol. |
var.chr |
character. the name of feature variable indicating chromosome number. |
nTop |
integer. indicating how many top tests will be used to draw the scatter plot. |
myylab |
character. indicating y-axis label. |
datExtrFunc |
name of the function to extract genomic data. For
an |
fileFlag |
logic. indicating if plot should be saved to an external figure file. |
fileFormat |
character. indicating the figure file type. Possible values are “ps”, “pdf”, or “jpeg”. All other values will produce “png” file. |
fileName |
character. indicating figure file name (file extension should be specified). For example,
you set |
Value 0
will be returned if no error occurs.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) # generate phenotype age es.sim$age = rnorm(ncol(es.sim), mean=50, sd=5) res.limma = lmFitWrapper( es = es.sim, formula = ~age, pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE) scatterPlots( resFrame=res.limma$frame, es=es.sim, col.resFrame = c("probeIDs", "stats", "pval"), var.pheno = "age", outcomeFlag = FALSE, fitLineFlag = TRUE, var.probe = "probe", var.gene = "gene", var.chr = "chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "scatterPlots.ps")
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) # generate phenotype age es.sim$age = rnorm(ncol(es.sim), mean=50, sd=5) res.limma = lmFitWrapper( es = es.sim, formula = ~age, pos.var.interest = 1, pvalAdjMethod = "fdr", alpha = 0.05, probeID.var = "probe", gene.var = "gene", chr.var = "chr", verbose = TRUE) scatterPlots( resFrame=res.limma$frame, es=es.sim, col.resFrame = c("probeIDs", "stats", "pval"), var.pheno = "age", outcomeFlag = FALSE, fitLineFlag = TRUE, var.probe = "probe", var.gene = "gene", var.chr = "chr", nTop = 20, myylab = "expression level", datExtrFunc = exprs, fileFlag = FALSE, fileFormat = "ps", fileName = "scatterPlots.ps")
Sort the order of samples for an ExpressionSet object.
sortExpressionSet( es, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA) )
sortExpressionSet( es, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA) )
es |
An ExpressionSet. |
varSort |
A vector of phenotype variable names to be used to sort the samples
of |
timeFormat |
A vector of time format for the possible time variables in |
An ExpressionSet object with samples sorted based on the variables
indicated in varSort
.
Weiliang Qiu <[email protected]>, Brandon Guo <[email protected]>, Christopher Anderson <[email protected]>, Barbara Klanderman <[email protected]>, Vincent Carey <[email protected]>, Benjamin Raby <[email protected]>
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) es.sim$Batch_Run_Date = 1:ncol(es.sim) es.sim$Chip_Barcode = 1:ncol(es.sim) es.sim$Chip_Address = 1:ncol(es.sim) es.sim2 = sortExpressionSet( es = es.sim, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA) )
# generate simulated data set from conditional normal distribution set.seed(1234567) es.sim = genSimData.BayesNormal(nCpGs = 100, nCases = 20, nControls = 20, mu.n = -2, mu.c = 2, d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var", outlierFlag = FALSE, eps = 1.0e-3, applier = lapply) print(es.sim) es.sim$Batch_Run_Date = 1:ncol(es.sim) es.sim$Chip_Barcode = 1:ncol(es.sim) es.sim$Chip_Address = 1:ncol(es.sim) es.sim2 = sortExpressionSet( es = es.sim, varSort = c("Batch_Run_Date", "Chip_Barcode", "Chip_Address"), timeFormat = c("%m/%d/%Y", NA, NA) )