Title: | Preprocessing tools for oligonucleotide arrays |
---|---|
Description: | A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files). |
Authors: | Benilton Carvalho and Rafael Irizarry |
Maintainer: | Benilton Carvalho <[email protected]> |
License: | LGPL (>= 2) |
Version: | 1.71.0 |
Built: | 2025-01-16 06:14:39 UTC |
Source: | https://github.com/bioc/oligo |
The oligo package provides tools to preprocess different oligonucleotide arrays types: expression, tiling, SNP and exon chips. The supported manufacturers are Affymetrix and NimbleGen.
It offers support to large datasets (when the bigmemory is loaded) and can execute preprocessing tasks in parallel (if, in addition to bigmemory, the snow package is also loaded).
The package will read the raw intensity files (CEL for Affymetrix; XYS for NimbleGen) and allow the user to perform analyses starting at the feature-level.
Reading in the intensity files require the existence of data packages that contain the chip specific information (X/Y coordinates; feature types; sequence). These data packages packages are built using the pdInfoBuilder package.
For Affymetrix SNP arrays, users are asked to download the already built annotation packages from BioConductor. This is because these packages contain metadata that are not automatically created. The following annotation packages are available:
50K Xba - pd.mapping50kxba.240 50K Hind - pd.mapping50khind.240 250K Sty - pd.mapping250k.sty 250K Nsp - pd.mapping250k.nsp GenomeWideSnp 5 (SNP 5.0) - pd.genomewidesnp.5 GenomeWideSnp 6 (SNP 6.0) - pd.genomewidesnp.6
For users interested in genotype calls for SNP 5.0 and 6.0 arrays, we strongly recommend the use use the crlmm package, which implements a more efficient version of CRLMM.
Benilton Carvalho - [email protected]
Carvalho, B.; Bengtsson, H.; Speed, T. P. & Irizarry, R. A. Exploration, Normalization, and Genotype Calls of High Density Oligonucleotide SNP Array Data. Biostatistics, 2006.
Function to compute the amounts of each nucleotide in a sequence.
basecontent(seq)
basecontent(seq)
seq |
character vector of length |
matrix
with n
rows and 4 columns with the counts for
each base.
sequences <- c("ATATATCCCCG", "TTTCCGAGC") basecontent(sequences)
sequences <- c("ATATATCCCCG", "TTTCCGAGC") basecontent(sequences)
Simplified interface to PLM.
basicPLM(pmMat, pnVec, normalize = TRUE, background = TRUE, transfo = log2, method = c('plm', 'plmr', 'plmrr', 'plmrc'), verbose = TRUE)
basicPLM(pmMat, pnVec, normalize = TRUE, background = TRUE, transfo = log2, method = c('plm', 'plmr', 'plmrr', 'plmrc'), verbose = TRUE)
pmMat |
Matrix of intensities to be processed. |
pnVec |
Probeset names |
normalize |
Logical flag: normalize? |
background |
Logical flag: background adjustment? |
transfo |
function: function to be used for data transformation prior to summarization. |
method |
Name of the method to be used for normalization. 'plm' is the usual PLM model; 'plmr' is the (row and column) robust version of PLM; 'plmrr' is the row-robust version of PLM; 'plmrc' is the column-robust version of PLM. |
verbose |
Logical flag: verbose. |
A list with the following components:
Estimates |
A (length(pnVec) x ncol(pmMat)) matrix with probeset summaries. |
StdErrors |
A (length(pnVec) x ncol(pmMat)) matrix with standard errors of 'Estimates'. |
Residuals |
A (nrow(pmMat) x ncol(pmMat)) matrix of residuals. |
Currently, only RMA-bg-correction and quantile normalization are allowed.
Benilton Carvalho
rcModelPLM
,
rcModelPLMr
,
rcModelPLMrr
,
rcModelPLMrc
,
basicRMA
set.seed(1) pms <- 2^matrix(rnorm(1000), nc=20) colnames(pms) <- paste("sample", 1:20, sep="") pns <- rep(letters[1:10], each=5) res <- basicPLM(pms, pns, TRUE, TRUE) res[['Estimates']][1:4, 1:3] res[['StdErrors']][1:4, 1:3] res[['Residuals']][1:20, 1:3]
set.seed(1) pms <- 2^matrix(rnorm(1000), nc=20) colnames(pms) <- paste("sample", 1:20, sep="") pns <- rep(letters[1:10], each=5) res <- basicPLM(pms, pns, TRUE, TRUE) res[['Estimates']][1:4, 1:3] res[['StdErrors']][1:4, 1:3] res[['Residuals']][1:20, 1:3]
Simple interface to RMA.
basicRMA(pmMat, pnVec, normalize = TRUE, background = TRUE, bgversion = 2, destructive = FALSE, verbose = TRUE, ...)
basicRMA(pmMat, pnVec, normalize = TRUE, background = TRUE, bgversion = 2, destructive = FALSE, verbose = TRUE, ...)
pmMat |
Matrix of intensities to be processed. |
pnVec |
Probeset names. |
normalize |
Logical flag: normalize? |
background |
Logical flag: background adjustment? |
bgversion |
Version of background correction. |
destructive |
Logical flag: use destructive methods? |
verbose |
Logical flag: verbose. |
... |
Not currently used. |
Matrix.
set.seed(1) pms <- 2^matrix(rnorm(1000), nc=20) colnames(pms) <- paste("sample", 1:20, sep="") pns <- rep(letters[1:10], each=5) res <- basicRMA(pms, pns, TRUE, TRUE) res[, 1:3]
set.seed(1) pms <- 2^matrix(rnorm(1000), nc=20) colnames(pms) <- paste("sample", 1:20, sep="") pns <- rep(letters[1:10], each=5) res <- basicRMA(pms, pns, TRUE, TRUE) res[, 1:3]
Boxplot for observed (log-)intensities in a FeatureSet-like object (ExpressionFeatureSet, ExonFeatureSet, SnpFeatureSet, TilingFeatureSet) and ExpressionSet.
## S4 method for signature 'FeatureSet' boxplot(x, which=c("pm", "mm", "bg", "both", "all"), transfo=log2, nsample=10000, target = "mps1", ...) ## S4 method for signature 'ExpressionSet' boxplot(x, which, transfo=identity, nsample=10000, ...)
## S4 method for signature 'FeatureSet' boxplot(x, which=c("pm", "mm", "bg", "both", "all"), transfo=log2, nsample=10000, target = "mps1", ...) ## S4 method for signature 'ExpressionSet' boxplot(x, which, transfo=identity, nsample=10000, ...)
x |
a |
which |
character defining what probe types are to be used in the plot. |
transfo |
a function to transform the data before plotting. See 'Details'. |
nsample |
number of units to sample and build the plot. |
... |
arguments to be passed to the default boxplot method. |
The 'transfo' argument will set the transformation to be used. For raw data, 'transfo=log2' is a common practice. For summarized data (which are often in log2-scale), no transformation is needed (therefore 'transfo=identity').
The boxplot methods for FeatureSet
and Expression
use a
sample (via sample
) of the probes/probesets to produce the
plot. Therefore, the user interested in reproducibility is advised to
use set.seed
.
Returns chromosome information.
pmChr(object)
pmChr(object)
object |
|
chromosome()
returns the chromosomal information for all probes
and pmChr()
subsets the output to the PM probes only (if a
TilingFeatureSet object).
Vector with chromosome information.
Performs genotype calls via CRLMM (Corrected Robust Linear Model with Maximum-likelihood based distances).
crlmm(filenames, outdir, batch_size=40000, balance=1.5, minLLRforCalls=c(5, 1, 5), recalibrate=TRUE, verbose=TRUE, pkgname, reference=TRUE) justCRLMM(filenames, batch_size = 40000, minLLRforCalls = c(5, 1, 5), recalibrate = TRUE, balance = 1.5, phenoData = NULL, verbose = TRUE, pkgname = NULL, tmpdir=tempdir())
crlmm(filenames, outdir, batch_size=40000, balance=1.5, minLLRforCalls=c(5, 1, 5), recalibrate=TRUE, verbose=TRUE, pkgname, reference=TRUE) justCRLMM(filenames, batch_size = 40000, minLLRforCalls = c(5, 1, 5), recalibrate = TRUE, balance = 1.5, phenoData = NULL, verbose = TRUE, pkgname = NULL, tmpdir=tempdir())
filenames |
character vector with the filenames. |
outdir |
directory where the output (and some tmp files) files will be saved. |
batch_size |
integer defining how many SNPs should be processed at a time. |
recalibrate |
Logical - should recalibration be performed? |
balance |
Control parameter to balance homozygotes and heterozygotes calls. |
minLLRforCalls |
Minimum thresholds for genotype calls. |
verbose |
Logical. |
phenoData |
|
pkgname |
alt. pdInfo package to be used |
reference |
logical, defaulting to TRUE ... |
tmpdir |
Directory where temporary files are going to be stored at. |
SnpCallSetPlus
object.
Create set of colors, interpolating through a set of preferred colors.
darkColors(n) seqColors(n) seqColors2(n) divColors(n)
darkColors(n) seqColors(n) seqColors2(n) divColors(n)
n |
integer determining number of colors to be generated |
darkColors
is based on the Dark2 palette in RColorBrewer, therefore
useful to describe qualitative features of the data.
seqColors
is based on Blues and generates a gradient of blues, therefore
useful to describe quantitative features of the data. seqColors2
behaves similarly, but it is based on OrRd (white-orange-red).
divColors
is based on the RdBu pallete in RColorBrewer, therefore
useful to describe quantitative features ranging on two extremes.
x <- 1:10 y <- 1:10 cols1 <- darkColors(10) cols2 <- seqColors(10) cols3 <- divColors(10) cols4 <- seqColors2(10) plot(x, y, col=cols1, xlim=c(1, 13), pch=19, cex=3) points(x+1, y, col=cols2, pch=19, cex=3) points(x+2, y, col=cols3, pch=19, cex=3) points(x+3, y, col=cols4, pch=19, cex=3) abline(0, 1, lty=2) abline(-1, 1, lty=2) abline(-2, 1, lty=2) abline(-3, 1, lty=2)
x <- 1:10 y <- 1:10 cols1 <- darkColors(10) cols2 <- seqColors(10) cols3 <- divColors(10) cols4 <- seqColors2(10) plot(x, y, col=cols1, xlim=c(1, 13), pch=19, cex=3) points(x+1, y, col=cols2, pch=19, cex=3) points(x+2, y, col=cols3, pch=19, cex=3) points(x+3, y, col=cols4, pch=19, cex=3) abline(0, 1, lty=2) abline(-1, 1, lty=2) abline(-2, 1, lty=2) abline(-3, 1, lty=2)
Fits robust Probe Level linear Models to all the (meta)probesets
in an FeatureSet
. This is carried out
on a (meta)probeset by (meta)probeset basis.
fitProbeLevelModel(object, background=TRUE, normalize=TRUE, target="core", method="plm", verbose=TRUE, S4=TRUE, ...)
fitProbeLevelModel(object, background=TRUE, normalize=TRUE, target="core", method="plm", verbose=TRUE, S4=TRUE, ...)
object |
|
background |
Do background correction? |
normalize |
Do normalization? |
target |
character vector describing the summarization target. Valid values are: 'probeset', 'core' (Gene/Exon), 'full' (Exon), 'extended' (Exon). |
method |
summarization method to be used. |
verbose |
verbosity flag. |
S4 |
return final value as an S4 object ( |
... |
subset to be passed down to |
fitProbeLevelModel
returns an oligoPLM
object, if S4=TRUE
; otherwise, it will return a list.
This is the initial port of fitPLM
to oligo. Some features
found on the original work by Ben Bolstad (in the affyPLM package) may
not be yet available. If you found one of this missing
characteristics, please contact Benilton Carvalho.
This is a simplified port from Ben Bolstad's work implemented in the affyPLM package. Problems with the implementation in oligo should be reported to Benilton Carvalho.
Bolstad, BM (2004) Low Level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization. PhD Dissertation. University of California, Berkeley.
rma
, summarizationMethods
, subset
if (require(oligoData)){ data(nimbleExpressionFS) fit <- fitProbeLevelModel(nimbleExpressionFS) image(fit) NUSE(fit) RLE(fit) }
if (require(oligoData)){ data(nimbleExpressionFS) fit <- fitProbeLevelModel(nimbleExpressionFS) image(fit) NUSE(fit) RLE(fit) }
Estimate affinity coefficients using sequence information and splines.
getAffinitySplineCoefficients(intensities, sequences)
getAffinitySplineCoefficients(intensities, sequences)
intensities |
Intensity matrix |
sequences |
Probe sequences |
Matrix with estimated coefficients.
getBaseProfile
Computes and, optionally, lots nucleotide profile, describing the sequence effect on intensities.
getBaseProfile(coefs, probeLength = 25, plot = FALSE, ...)
getBaseProfile(coefs, probeLength = 25, plot = FALSE, ...)
coefs |
affinity spline coefficients. |
probeLength |
length of probes |
plot |
logical. Plots profile? |
... |
arguments to be passed to matplot. |
Invisibly returns a matrix with estimated effects.
Get container information for NimbleGen Tiling Arrays. This is useful for better identification of control probes.
getContainer(object, probeType)
getContainer(object, probeType)
object |
A |
probeType |
String describing which probes to query ('pm', 'bg') |
'character' vector with container information.
This will read the summaries written to disk and return them to the
user as a SnpCallSetPlus
or SnpCnvCallSetPlus
object.
getCrlmmSummaries(tmpdir)
getCrlmmSummaries(tmpdir)
tmpdir |
directory where CRLMM saved the results to. |
If the data were from SNP 5.0 or 6.0 arrays, the function will return
a SnpCnvCallSetPlus
object. It will return a SnpCallSetPlus
object, otherwise.
Gets NetAffx Biological Annotations saved in the annotation package (Exon and Gene ST Affymetrix arrays).
getNetAffx(object, type = "probeset")
getNetAffx(object, type = "probeset")
object |
'ExpressionSet' object (eg., result of rma()) |
type |
Either 'probeset' or 'transcript', depending on what type of summaries were obtained. |
This retrieves NetAffx annotation saved in the (pd) annotation package - annotation(object). It is only available for Exon ST and Gene ST arrays.
The 'type' argument should match the summarization target used to generate 'object'. The 'rma' method allows for two targets: 'probeset' (target='probeset') and 'transcript' (target='core', target='full', target='extended').
'AnnotatedDataFrame' that can be used as featureData(object)
Benilton Carvalho
This function will (try to) extract the color information for
NimbleGen arrays. This is useful when using read.xysfiles2
to
parse XYS files for Tiling applications.
getNgsColorsInfo(path = ".", pattern1 = "_532", pattern2 = "_635", ...)
getNgsColorsInfo(path = ".", pattern1 = "_532", pattern2 = "_635", ...)
path |
path where to look for files |
pattern1 |
pattern to match files supposed to go to the first channel |
pattern2 |
pattern to match files supposed to go to the second channel |
... |
extra arguments for |
Many NimbleGen samples are identified following the pattern sampleID_532.XYS / sampleID_635.XYS.
The function suggests sample names if all the filenames follow the standard above.
A data.frame with, at least, two columns: 'channel1' and 'channel2'. A third column, 'sampleNames', is returned if the filenames follow the sampleID_532.XYS / sampleID_635.XYS standard.
Benilton Carvalho <[email protected]>
Retrieve platform design object.
getPlatformDesign(object) getPD(object)
getPlatformDesign(object) getPD(object)
object |
|
Retrieve platform design object.
platformDesign
or PDInfo
object.
A tool to simplify the selection of probe information, so user does not need to use the SQL approaches.
getProbeInfo(object, field, probeType = "pm", target = "core", sortBy = c("fid", "man_fsetid", "none"), ...)
getProbeInfo(object, field, probeType = "pm", target = "core", sortBy = c("fid", "man_fsetid", "none"), ...)
object |
|
field |
character string with names of field(s) of interest to be obtained from database. |
probeType |
character string: 'pm' or 'mm' |
target |
Used only for Exon or Gene ST arrays: 'core', 'full', 'extended', 'probeset'. |
sortBy |
Field to be used for sorting. |
... |
Arguments to be passed to |
A data.frame
with the probe level information.
The code allows for querying info on MM probes, however it has been used mostly on PM probes.
Benilton Carvalho
if (require(oligoData)){ data(affyGeneFS) availProbeInfo(affyGeneFS) probeInfo <- getProbeInfo(affyGeneFS, c('fid', 'x', 'y', 'chrom')) head(probeInfo) ## Selecting antigenomic background probes agenGene <- getProbeInfo(affyGeneFS, field=c('fid', 'fsetid', 'type'), target='probeset', subset= type == 'control->bgp->antigenomic') head(agenGene) }
if (require(oligoData)){ data(affyGeneFS) availProbeInfo(affyGeneFS) probeInfo <- getProbeInfo(affyGeneFS, c('fid', 'x', 'y', 'chrom')) head(probeInfo) ## Selecting antigenomic background probes agenGene <- getProbeInfo(affyGeneFS, field=c('fid', 'fsetid', 'type'), target='probeset', subset= type == 'control->bgp->antigenomic') head(agenGene) }
Accessors for physical array coordinates.
getX(object, type) getY(object, type)
getX(object, type) getY(object, type)
object |
|
type |
'character' defining the type of the probes to be queried. Valid options are 'pm', 'mm', 'bg' |
A vector with the requested coordinates.
## Not run: x <- read.celfiles(list.celfiles()) theXpm <- getX(x, "pm") theYpm <- getY(x, "pm") ## End(Not run)
## Not run: x <- read.celfiles(list.celfiles()) theXpm <- getX(x, "pm") theYpm <- getY(x, "pm") ## End(Not run)
Plot the density estimates for each sample
## S4 method for signature 'FeatureSet' hist(x, transfo=log2, which=c("pm", "mm", "bg", "both", "all"), nsample=10000, target = "mps1", ...) ## S4 method for signature 'ExpressionSet' hist(x, transfo=identity, nsample=10000, ...)
## S4 method for signature 'FeatureSet' hist(x, transfo=log2, which=c("pm", "mm", "bg", "both", "all"), nsample=10000, target = "mps1", ...) ## S4 method for signature 'ExpressionSet' hist(x, transfo=identity, nsample=10000, ...)
x |
|
transfo |
a function to transform the data before plotting. See 'Details'. |
nsample |
number of units to sample and build the plot. |
which |
set of probes to be plotted ("pm", "mm", "bg", "both", "all"). |
... |
arguments to be passed to |
The 'transfo' argument will set the transformation to be used. For raw data, 'transfo=log2' is a common practice. For summarized data (which are often in log2-scale), no transformation is needed (therefore 'transfo=identity').
The hist methods for FeatureSet
and Expression
use a
sample (via sample
) of the probes/probesets to produce the
plot (unless nsample > nrow(x)). Therefore, the user interested in reproducibility is advised to
use set.seed
.
Produces a pseudo-image (graphics::image
) for each sample.
## S4 method for signature 'FeatureSet' image(x, which, transfo=log2, ...) ## S4 method for signature 'PLMset' image(x, which=0, type=c("weights","resids", "pos.resids","neg.resids","sign.resids"), use.log=TRUE, add.legend=FALSE, standardize=FALSE, col=NULL, main, ...)
## S4 method for signature 'FeatureSet' image(x, which, transfo=log2, ...) ## S4 method for signature 'PLMset' image(x, which=0, type=c("weights","resids", "pos.resids","neg.resids","sign.resids"), use.log=TRUE, add.legend=FALSE, standardize=FALSE, col=NULL, main, ...)
x |
|
which |
integer indices of samples to be plotted (optional). |
transfo |
function to be applied to the data prior to plotting. |
type |
Type of statistics to be used. |
use.log |
Use log. |
add.legend |
Add legend. |
standardize |
Standardize residuals. |
col |
Colors to be used. |
main |
Main title. |
... |
parameters to be passed to |
if(require(oligoData) & require(pd.hg18.60mer.expr)){ data(nimbleExpressionFS) par(mfrow=c(1, 2)) image(nimbleExpressionFS, which=4) ## fit <- fitPLM(nimbleExpressionFS) ## image(fit, which=4) plot(1) ## while fixing fitPLM TODO }
if(require(oligoData) & require(pd.hg18.60mer.expr)){ data(nimbleExpressionFS) par(mfrow=c(1, 2)) image(nimbleExpressionFS, which=4) ## fit <- fitPLM(nimbleExpressionFS) ## image(fit, which=4) plot(1) ## while fixing fitPLM TODO }
This function implements the SNPRMA method for summarization of SNP data. It works directly with the CEL files, saving memory.
justSNPRMA(filenames, verbose = TRUE, phenoData = NULL, normalizeToHapmap = TRUE)
justSNPRMA(filenames, verbose = TRUE, phenoData = NULL, normalizeToHapmap = TRUE)
filenames |
character vector with the filenames. |
verbose |
logical flag for verbosity. |
phenoData |
a |
normalizeToHapmap |
Normalize to Hapmap? Should always be TRUE, but it's kept here for future use. |
SnpQSet
or a SnpCnvQSet
, depending on the array type.
## snprmaResults <- justSNPRMA(list.celfiles())
## snprmaResults <- justSNPRMA(list.celfiles())
Lists the XYS files.
list.xysfiles(...)
list.xysfiles(...)
... |
parameters to be passed to |
The functions interface list.files
and the user is asked
to check that function for further details.
Character vector with the filenames.
list.xysfiles()
list.xysfiles()
Create MA plots using a reference array (if one channel) or using channel2 as reference (if two channel).
MAplot(object, ...) ## S4 method for signature 'FeatureSet' MAplot(object, what=pm, transfo=log2, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'TilingFeatureSet' MAplot(object, what=pm, transfo=log2, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'PLMset' MAplot(object, what=coefs, transfo=identity, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'matrix' MAplot(object, what=identity, transfo=identity, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'ExpressionSet' MAplot(object, what=exprs, transfo=identity, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...)
MAplot(object, ...) ## S4 method for signature 'FeatureSet' MAplot(object, what=pm, transfo=log2, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'TilingFeatureSet' MAplot(object, what=pm, transfo=log2, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'PLMset' MAplot(object, what=coefs, transfo=identity, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'matrix' MAplot(object, what=identity, transfo=identity, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...) ## S4 method for signature 'ExpressionSet' MAplot(object, what=exprs, transfo=identity, groups, refSamples, which, pch=".", summaryFun=rowMedians, plotFun=smoothScatter, main="vs pseudo-median reference chip", pairs=FALSE, ...)
object |
|
what |
function to be applied on |
transfo |
function to transform the data prior to plotting. |
groups |
factor describing groups of samples that will be combined prior to plotting. If missing, MvA plots are done per sample. |
refSamples |
integers (indexing samples) to define which subjects
will be used to compute the reference set. If missing, a
pseudo-reference chip is estimated using |
which |
integer (indexing samples) describing which samples are to be plotted. |
pch |
same as |
summaryFun |
function that operates on a matrix and returns a vector that will be used to summarize data belonging to the same group (or reference) on the computation of grouped-stats. |
plotFun |
function to be used for plotting. Usually
|
main |
string to be used in title. |
pairs |
logical flag to determine if a matrix of MvA plots is to be generated |
... |
Other arguments to be passed downstream, like |
MAplot will take the following extra arguments:
subset
: indices of elements to be plotted to reduce
impact of plotting 100's thousands points (if pairs=FALSE only);
span
: see loess
;
family.loess
: see loess
;
addLoess
: logical flag (default TRUE) to add a loess
estimate;
parParams
: list of params to be passed to par() (if
pairs=TRUE only);
Plot
Benilton Carvalho - based on Ben Bolstad's original MAplot function.
if(require(oligoData) & require(pd.hg18.60mer.expr)){ data(nimbleExpressionFS) nimbleExpressionFS groups <- factor(rep(c('brain', 'UnivRef'), each=3)) data.frame(sampleNames(nimbleExpressionFS), groups) MAplot(nimbleExpressionFS, pairs=TRUE, ylim=c(-.5, .5), groups=groups) }
if(require(oligoData) & require(pd.hg18.60mer.expr)){ data(nimbleExpressionFS) nimbleExpressionFS groups <- factor(rep(c('brain', 'UnivRef'), each=3)) data.frame(sampleNames(nimbleExpressionFS), groups) MAplot(nimbleExpressionFS, pairs=TRUE, ylim=c(-.5, .5), groups=groups) }
Accessors and replacement methods for the PM/MM/BG matrices.
intensity(object) mm(object, subset = NULL, target='core') pm(object, subset = NULL, target='core') bg(object, subset = NULL) mm(object, subset = NULL, target='core')<-value pm(object, subset = NULL, target='core')<-value bg(object)<-value
intensity(object) mm(object, subset = NULL, target='core') pm(object, subset = NULL, target='core') bg(object, subset = NULL) mm(object, subset = NULL, target='core')<-value pm(object, subset = NULL, target='core')<-value bg(object)<-value
object |
|
subset |
Not implemented yet. |
value |
|
target |
One of 'probeset', 'core', 'full', 'extended'. This is ignored if the array design is something other than Gene ST or Exon ST. |
For all objects but TilingFeatureSet
, these methods will
return matrices. In case of TilingFeatureSet
objects, the
value is a 3-dimensional array (probes x samples x channels).
intensity
will return the whole intensity matrix associated to
the object. pm
, mm
, bg
will return the respective
PM/MM/BG matrix.
When applied to ExonFeatureSet
or GeneFeatureSet
objects, pm
will return the PM matrix at the transcript level
('core' probes) by default. The user should set the target
argument accordingly if something else is desired. The valid values
are: 'probeset' (Exon and Gene arrays), 'core' (Exon and Gene arrays),
'full' (Exon arrays) and 'extended' (Exon arrays).
The target
argument has no effects when used on designs other
than Gene and Exon ST.
if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){ xysPath <- system.file("extdata", package="maqcExpression4plex") xysFiles <- list.xysfiles(xysPath, full.name=TRUE) ngsExpressionFeatureSet <- read.xysfiles(xysFiles) pm(ngsExpressionFeatureSet)[1:10,] }
if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){ xysPath <- system.file("extdata", package="maqcExpression4plex") xysFiles <- list.xysfiles(xysPath, full.name=TRUE) ngsExpressionFeatureSet <- read.xysfiles(xysFiles) pm(ngsExpressionFeatureSet)[1:10,] }
Extracts the indexes for PM, MM or background probes.
mmindex(object, ...) pmindex(object, ...) bgindex(object, ...)
mmindex(object, ...) pmindex(object, ...) bgindex(object, ...)
object |
|
... |
Extra arguments, not yet implemented |
The indices are ordered by 'fid', i.e. they follow the order that the probes appear in the CEL/XYS files.
A vector of integers representing the rows of the intensity matrix that correspond to PM, MM or background probes.
## How pm() works ## Not run: x <- read.celfiles(list.celfiles()) pms0 <- pm(x) pmi <- pmindex(x) pms1 <- exprs(x)[pmi,] identical(pms0, pms1) ## End(Not run)
## How pm() works ## Not run: x <- read.celfiles(list.celfiles()) pms0 <- pm(x) pmi <- pmindex(x) pms1 <- exprs(x)[pmi,] identical(pms0, pms1) ## End(Not run)
Accessor to the (PM/MM/background) probe sequences.
mmSequence(object) pmSequence(object, ...) bgSequence(object, ...)
mmSequence(object) pmSequence(object, ...) bgSequence(object, ...)
object |
|
... |
additional arguments |
A DNAStringSet containing the PM/MM/background probe sequence associated to the array.
The functions or variables listed here are no longer part of 'oligo'
fitPLM(...) coefs(...) resids(...)
fitPLM(...) coefs(...) resids(...)
... |
Arguments. |
fitPLM
was replaced by fitProbeLevelModel
, allowing faster execution and providing more specific models. fitPLM
was based in the code written by Ben Bolstad in the affyPLM
package. However, all the model-fitting functions are now in the package preprocessCore
, on which fitProbeLevelModel
depends.
coefs
and resids
, like fitPLM
, were inherited from the affyPLM
package. They were replaced respectively by coef
and residuals
, because this is how these statistics are called everywhere else in R
.
"oligoPLM"
A class to represent Probe Level Models.
Objects can be created by calls of the form
fitProbeLevelModel(FeatureSetObject)
, where
FeatureSetObject
is an object obtained through
read.celfiles
or read.xysfiles
, representing intensities
observed for different probes (which are grouped in probesets or
meta-probesets) across distinct samples.
chip.coefs
:"matrix"
with chip/sample effects -
probeset-level
description
:"MIAME"
compliant description
information.
phenoData
:"AnnotatedDataFrame"
with phenotypic
data.
protocolData
:"AnnotatedDataFrame"
with
protocol data.
probe.coefs
:"numeric"
vector with probe effects
weights
:"matrix"
with weights - probe-level
residuals
:"matrix"
with residuals - probe-level
se.chip.coefs
:"matrix"
with standard errors
for chip/sample coefficients
se.probe.coefs
:"numeric"
vector with standard
errors for probe effects
residualSE
:scale - residual standard error
geometry
:array geometry used for plots
method
:"character"
string describing method
used for PLM
manufacturer
:"character"
string with manufacturer name
annotation
:"character"
string with the name of
the annotation package
narrays
:"integer"
describing the number of arrays
nprobes
:"integer"
describing the number of
probes before summarization
nprobesets
:"integer"
describing the number of
probesets after summarization
signature(object = "oligoPLM")
:
accessor/replacement method to annotation slot
signature(x = "oligoPLM")
: boxplot method
signature(object = "oligoPLM")
:
accessor/replacement method to coef slot
signature(object = "oligoPLM")
:
accessor/replacement method to coefs.probe slot
signature(object = "oligoPLM")
:
accessor/replacement method to geometry slot
signature(x = "oligoPLM")
: image method
signature(object = "oligoPLM")
:
accessor/replacement method to manufacturer slot
signature(object = "oligoPLM")
:
accessor/replacement method to method slot
signature(x = "oligoPLM")
: accessor/replacement
method to ncol slot
signature(object = "oligoPLM")
:
accessor/replacement method to nprobes slot
signature(object = "oligoPLM")
:
accessor/replacement method to nprobesets slot
signature(object = "oligoPLM")
:
accessor/replacement method to residuals slot
signature(object = "oligoPLM")
:
accessor/replacement method to residualSE slot
signature(object = "oligoPLM")
:
accessor/replacement method to se slot
signature(object = "oligoPLM")
:
accessor/replacement method to se.probe slot
signature(object = "oligoPLM")
: show method
signature(object = "oligoPLM")
:
accessor/replacement method to weights slot
signature(x = "oligoPLM")
: Boxplot of Normalized
Unscaled Standard Errors (NUSE) or NUSE values.
signature(x = "oligoPLM")
: Relative Log Expression
boxplot or values.
signature(x = "oligoPLM")
: Convert to ExpressionSet.
This is a port from Ben Bolstad's work implemented in the affyPLM package. Problems with the implementation in oligo should be reported to the package's maintainer.
Bolstad, BM (2004) Low Level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization. PhD Dissertation. University of California, Berkeley.
## TODO: review code and fix broken ## Not run: if (require(oligoData)){ data(nimbleExpressionFS) fit <- fitProbeLevelModel(nimbleExpressionFS) image(fit) NUSE(fit) RLE(fit) } ## End(Not run)
## TODO: review code and fix broken ## Not run: if (require(oligoData)){ data(nimbleExpressionFS) fit <- fitProbeLevelModel(nimbleExpressionFS) image(fit) NUSE(fit) RLE(fit) } ## End(Not run)
Methods for Present/Absent Calls are meant to provide means of assessing whether or not each of the (PM) intensities are compatible with observations generated by background probes.
paCalls(object, method, ..., verbose=TRUE) ## S4 method for signature 'ExonFeatureSet' paCalls(object, method, verbose = TRUE) ## S4 method for signature 'GeneFeatureSet' paCalls(object, method, verbose = TRUE) ## S4 method for signature 'ExpressionFeatureSet' paCalls(object, method, ..., verbose = TRUE)
paCalls(object, method, ..., verbose=TRUE) ## S4 method for signature 'ExonFeatureSet' paCalls(object, method, verbose = TRUE) ## S4 method for signature 'GeneFeatureSet' paCalls(object, method, verbose = TRUE) ## S4 method for signature 'ExpressionFeatureSet' paCalls(object, method, ..., verbose = TRUE)
object |
Exon/Gene/Expression-FeatureSet object. |
method |
String defining what method to use. See 'Details'. |
... |
Additional arguments passed to MAS5. See 'Details' |
verbose |
Logical flag for verbosity. |
For Whole Transcript arrays (Exon/Gene) the valid options for
method
are 'DABG' (p-values for each probe) and 'PSDABG'
(p-values for each probeset). For Expression arrays, the only option
currently available for method
is 'MAS5'.
ABOUT MAS5 CALLS:
The additional arguments that can be passed to MAS5 are:
alpha1
: a significance threshold in (0, alpha2);
alpha2
: a significance threshold in (alpha1, 0.5);
tau
: a small positive constant;
ignore.saturated
: if TRUE, do the saturation correction described in
the paper, with a saturation level of 46000;
This function performs the hypothesis test:
H0: median(Ri) = tau, corresponding to absence of transcript H1: median(Ri) > tau, corresponding to presence of transcript
where Ri = (PMi - MMi) / (PMi + MMi) for each i a probe-pair in the probe-set represented by data.
The p-value that is returned estimates the usual quantity:
Pr(observing a more "present looking" probe-set than data | data is absent)
So that small p-values imply presence while large ones imply absence of transcript. The detection call is computed by thresholding the p-value as in:
call "P" if p-value < alpha1 call "M" if alpha1 <= p-value < alpha2 call "A" if alpha2 <= p-value
A matrix (of dimension dim(PM) if method="DABG" or "MAS5"; of dimension length(unique(probeNames(object))) x ncol(object) if method="PSDABG") with p-values for P/A Calls.
Benilton Carvalho
Clark et al. Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol (2007) vol. 8 (4) pp. R64
Liu, W. M. and Mei, R. and Di, X. and Ryder, T. B. and Hubbell, E. and Dee, S. and Webster, T. A. and Harrington, C. A. and Ho, M. H. and Baid, J. and Smeekens, S. P. (2002) Analysis of high density expression microarrays with signed-rank call algorithms, Bioinformatics, 18(12), pp. 1593–1599.
Liu, W. and Mei, R. and Bartell, D. M. and Di, X. and Webster, T. A. and Ryder, T. (2001) Rank-based algorithms for analysis of microarrays, Proceedings of SPIE, Microarrays: Optical Technologies and Informatics, 4266.
Affymetrix (2002) Statistical Algorithms Description Document, Affymetrix Inc., Santa Clara, CA, whitepaper. http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
## Not run: if (require(oligoData) & require(pd.huex.1.0.st.v2)){ data(affyExonFS) ## Get only 2 samples for example dabgP = paCalls(affyExonFS[, 1:2]) dabgPS = paCalls(affyExonFS[, 1:2], "PSDABG") head(dabgP) ## for probe head(dabgPS) ## for probeset } ## End(Not run)
## Not run: if (require(oligoData) & require(pd.huex.1.0.st.v2)){ data(affyExonFS) ## Get only 2 samples for example dabgP = paCalls(affyExonFS[, 1:2]) dabgPS = paCalls(affyExonFS[, 1:2], "PSDABG") head(dabgP) ## for probe head(dabgPS) ## for probeset } ## End(Not run)
The plotM
methods are meant to plot log-ratios for different
classes of data.
Plot log-ratio for SNP data for sample i.
Plot log-ratio for SNP data for sample i.
Plot log-ratio for SNP data for sample i.
Plot log-ratio for Tiling data for sample i.
Accessor to the allelic information for PM probes.
pmAllele(object)
pmAllele(object)
object |
|
Accessor to the fragment length for PM probes.
pmFragmentLength(object, enzyme, type=c('snp', 'cn'))
pmFragmentLength(object, enzyme, type=c('snp', 'cn'))
object |
|
enzyme |
Enzyme to be used for query. If missing, all enzymes are used. |
type |
Type of probes to be used: 'snp' for SNP probes; 'cn' for Copy Number probes. |
A list of length equal to the number of enzymes used for digestion. Each element of the list is a data.frame containing:
row
: the row used to link to the PM matrix;
length
: expected fragment length.
There is not a 1:1 relationship between probes and expected fragment
length. For one enzyme, a given probe may be associated to multiple
fragment lengths. Therefore, the number of rows in the data.frame may
not match the number of PM probes and the row
column should be
used to match the fragment length with the PM matrix.
pmPosition
will return the genomic position for the
(PM) probes.
pmPosition(object) pmOffset(object)
pmPosition(object) pmOffset(object)
object |
|
pmPosition
will return genomic position for PM probes on a
tiling array.
pmOffset
will return the offset information for PM probes on
SNP arrays.
Returns the strand information for PM probes (0 - sense / 1 - antisense).
pmStrand(object)
pmStrand(object)
object |
|
Accessors to featureset names.
probeNames(object, subset = NULL, ...) probesetNames(object, ...)
probeNames(object, subset = NULL, ...) probesetNames(object, ...)
object |
|
subset |
not implemented yet. |
... |
Arguments (like 'target') passed to downstream methods. |
probeNames
returns a string with the probeset names for *each probe*
on the array. probesetNames
, on the other hand, returns the
*unique probeset names*.
Reads CEL files.
read.celfiles(..., filenames, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, rm.mask=FALSE, rm.outliers=FALSE, rm.extra=FALSE, checkType=TRUE) read.celfiles2(channel1, channel2, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, rm.mask=FALSE, rm.outliers=FALSE, rm.extra=FALSE, checkType=TRUE)
read.celfiles(..., filenames, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, rm.mask=FALSE, rm.outliers=FALSE, rm.extra=FALSE, checkType=TRUE) read.celfiles2(channel1, channel2, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, rm.mask=FALSE, rm.outliers=FALSE, rm.extra=FALSE, checkType=TRUE)
... |
names of files to be read. |
filenames |
a |
channel1 |
a |
channel2 |
a |
pkgname |
alternative data package to be loaded. |
phenoData |
|
featureData |
|
experimentData |
|
protocolData |
|
notes |
|
verbose |
|
sampleNames |
|
rm.mask |
|
rm.outliers |
|
rm.extra |
|
checkType |
|
When using 'affyio' to read in CEL files, the user can read compressed CEL files (CEL.gz). Additionally, 'affyio' is much faster than 'affxparser'.
The function guesses which annotation package to use from the header
of the CEL file. The user can also provide the name of the annotaion
package to be used (via the pkgname
argument). If the
annotation package cannot be loaded, the function returns an
error. If the annotation package is not available from BioConductor,
one can use the pdInfoBuilder
package to build one.
ExpressionFeatureSet |
if Expresssion arrays |
ExonFeatureSet |
if Exon arrays |
SnpFeatureSet |
if SNP arrays |
TilingFeatureSet |
if Tiling arrays |
if(require(pd.mapping50k.xba240) & require(hapmap100kxba)){ celPath <- system.file("celFiles", package="hapmap100kxba") celFiles <- list.celfiles(celPath, full.name=TRUE) affySnpFeatureSet <- read.celfiles(celFiles) }
if(require(pd.mapping50k.xba240) & require(hapmap100kxba)){ celPath <- system.file("celFiles", package="hapmap100kxba") celFiles <- list.celfiles(celPath, full.name=TRUE) affySnpFeatureSet <- read.celfiles(celFiles) }
NimbleGen provides XYS files which are read by this function.
read.xysfiles(..., filenames, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, checkType=TRUE) read.xysfiles2(channel1, channel2, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, checkType=TRUE)
read.xysfiles(..., filenames, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, checkType=TRUE) read.xysfiles2(channel1, channel2, pkgname, phenoData, featureData, experimentData, protocolData, notes, verbose=TRUE, sampleNames, checkType=TRUE)
... |
file names |
filenames |
|
channel1 |
a |
channel2 |
a |
pkgname |
|
phenoData |
|
featureData |
|
experimentData |
|
protocolData |
|
notes |
|
verbose |
|
sampleNames |
|
checkType |
|
The function will read the XYS files provided by NimbleGen Systems and return an object of class FeatureSet.
The function guesses which annotation package to use from the header
of the XYS file. The user can also provide the name of the annotaion
package to be used (via the pkgname
argument). If the
annotation package cannot be loaded, the function returns an
error. If the annotation package is not available from BioConductor,
one can use the pdInfoBuilder
package to build one.
ExpressionFeatureSet |
if Expresssion arrays |
TilingFeatureSet |
if Tiling arrays |
if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){ xysPath <- system.file("extdata", package="maqcExpression4plex") xysFiles <- list.xysfiles(xysPath, full.name=TRUE) ngsExpressionFeatureSet <- read.xysfiles(xysFiles) }
if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){ xysPath <- system.file("extdata", package="maqcExpression4plex") xysFiles <- list.xysfiles(xysPath, full.name=TRUE) ngsExpressionFeatureSet <- read.xysfiles(xysFiles) }
This function read the different summaries generated by crlmm.
readSummaries(type, tmpdir)
readSummaries(type, tmpdir)
type |
type of summary of |
tmpdir |
directory containing the output saved by crlmm |
On the 50K and 250K arrays, given a SNP, there are probes on both strands (sense and antisense). For this reason, the options 'alleleA-sense', 'alleleA-antisense', 'alleleB-sense' and 'alleleB-antisense' should be used **only** with such arrays (XBA, HIND, NSP or STY).
On the SNP 5.0 and SNP 6.0 platforms, this distinction does not exist in terms of algorithm (note that the actual strand could be queried from the annotation package). For these arrays, options 'alleleA', 'alleleB' are the ones to be used.
The options calls
, llr
and conf
will return,
respectivelly, the CRLMM calls, log-likelihood ratios (for devel
purpose **only**) and CRLMM confidence calls matrices.
Matrix with values of summaries.
Robust Multichip Average preprocessing methodology. This strategy allows background subtraction, quantile normalization and summarization (via median-polish).
## S4 method for signature 'ExonFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core") ## S4 method for signature 'HTAFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core") ## S4 method for signature 'ExpressionFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL) ## S4 method for signature 'GeneFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core") ## S4 method for signature 'SnpCnvFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL)
## S4 method for signature 'ExonFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core") ## S4 method for signature 'HTAFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core") ## S4 method for signature 'ExpressionFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL) ## S4 method for signature 'GeneFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL, target="core") ## S4 method for signature 'SnpCnvFeatureSet' rma(object, background=TRUE, normalize=TRUE, subset=NULL)
object |
Exon/HTA/Expression/Gene/SnpCnv-FeatureSet object. |
background |
Logical - perform RMA background correction? |
normalize |
Logical - perform quantile normalization? |
subset |
To be implemented. |
target |
Level of summarization (only for Exon/Gene arrays) |
signature(object = "ExonFeatureSet")
When applied to an ExonFeatureSet
object, rma
can
produce summaries at different levels: probeset (as defined in the PGF),
core genes (as defined in the core.mps file), full genes (as defined in
the full.mps file) or extended genes (as defined in the extended.mps
file). To determine the level for summarization, use the target
argument.
signature(object = "ExpressionFeatureSet")
When used on an ExpressionFeatureSet
object, rma
produces
summaries at the probeset level (as defined in the CDF or NDF files,
depending on the manufacturer).
signature(object = "GeneFeatureSet")
When applied to a GeneFeatureSet
object, rma
can produce
summaries at different levels: probeset (as defined in the PGF) and
'core genes' (as defined in the core.mps file). To determine the level for
summarization, use the target
argument.
signature(object = "HTAFeatureSet")
When applied to a HTAFeatureSet
object, rma
can produce
summaries at different levels: probeset (as defined in the PGF) and
'core genes' (as defined in the core.mps file). To determine the level for
summarization, use the target
argument.
signature(object = "SnpCnvFeatureSet")
If used on a SnpCnvFeatureSet
object (ie., SNP 5.0 or SNP 6.0
arrays), rma
will produce summaries for the CNV probes. Note that
this is an experimental feature for internal (and quick) assessment of
CNV probes. We recommend the use of the 'crlmm' package, which contains
a Copy Number tool specifically designed for these data.
Rafael. A. Irizarry, Benjamin M. Bolstad, Francois Collin, Leslie M. Cope, Bridget Hobbs and Terence P. Speed (2003), Summaries of Affymetrix GeneChip probe level data Nucleic Acids Research 31(4):e15
Bolstad, B.M., Irizarry R. A., Astrand M., and Speed, T.P. (2003), A Comparison of Normalization Methods for High Density O ligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2):185-193
Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ, Scherf, U, Speed, TP (2003) Exploration, Normalizati on, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics. Vol. 4, Number 2: 249-264
if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){ xysPath <- system.file("extdata", package="maqcExpression4plex") xysFiles <- list.xysfiles(xysPath, full.name=TRUE) ngsExpressionFeatureSet <- read.xysfiles(xysFiles) summarized <- rma(ngsExpressionFeatureSet) show(summarized) }
if (require(maqcExpression4plex) & require(pd.hg18.60mer.expr)){ xysPath <- system.file("extdata", package="maqcExpression4plex") xysFiles <- list.xysfiles(xysPath, full.name=TRUE) ngsExpressionFeatureSet <- read.xysfiles(xysFiles) summarized <- rma(ngsExpressionFeatureSet) show(summarized) }
Retrieves date information in CEL/XYS files.
runDate(object)
runDate(object)
object |
'FeatureSet' object. |
Creates design matrix for sequences.
sequenceDesignMatrix(seqs)
sequenceDesignMatrix(seqs)
seqs |
|
This assumes all sequences are 25bp long.
The design matrix is often used when the objecive is to adjust intensities by sequence.
Matrix with length(seqs) rows and 75 columns.
genSequence <- function(x) paste(sample(c("A", "T", "C", "G"), 25, rep=TRUE), collapse="", sep="") seqs <- sapply(1:10, genSequence) X <- sequenceDesignMatrix(seqs) Y <- rnorm(10, mean=12, sd=2) Ydemean <- Y-mean(Y) X[1:10, 1:3] fit <- lm(Ydemean~X) coef(fit)
genSequence <- function(x) paste(sample(c("A", "T", "C", "G"), 25, rep=TRUE), collapse="", sep="") seqs <- sapply(1:10, genSequence) X <- sequenceDesignMatrix(seqs) Y <- rnorm(10, mean=12, sd=2) Ydemean <- Y-mean(Y) X[1:10, 1:3] fit <- lm(Ydemean~X) coef(fit)
This function preprocess SNP arrays.
snprma(object, verbose = TRUE, normalizeToHapmap = TRUE)
snprma(object, verbose = TRUE, normalizeToHapmap = TRUE)
object |
|
verbose |
Verbosity flag. |
normalizeToHapmap |
internal |
A SnpQSet
object.
These are tools to preprocess microarray data. They include background correction, normalization and summarization methods.
backgroundCorrectionMethods() normalizationMethods() summarizationMethods() backgroundCorrect(object, method=backgroundCorrectionMethods(), copy=TRUE, extra, subset=NULL, target='core', verbose=TRUE) summarize(object, probes=rownames(object), method="medianpolish", verbose=TRUE, ...) ## S4 method for signature 'FeatureSet' normalize(object, method=normalizationMethods(), copy=TRUE, subset=NULL,target='core', verbose=TRUE, ...) ## S4 method for signature 'matrix' normalize(object, method=normalizationMethods(), copy=TRUE, verbose=TRUE, ...) ## S4 method for signature 'ff_matrix' normalize(object, method=normalizationMethods(), copy=TRUE, verbose=TRUE, ...) normalizeToTarget(object, targetDist, method="quantile", copy=TRUE, verbose=TRUE)
backgroundCorrectionMethods() normalizationMethods() summarizationMethods() backgroundCorrect(object, method=backgroundCorrectionMethods(), copy=TRUE, extra, subset=NULL, target='core', verbose=TRUE) summarize(object, probes=rownames(object), method="medianpolish", verbose=TRUE, ...) ## S4 method for signature 'FeatureSet' normalize(object, method=normalizationMethods(), copy=TRUE, subset=NULL,target='core', verbose=TRUE, ...) ## S4 method for signature 'matrix' normalize(object, method=normalizationMethods(), copy=TRUE, verbose=TRUE, ...) ## S4 method for signature 'ff_matrix' normalize(object, method=normalizationMethods(), copy=TRUE, verbose=TRUE, ...) normalizeToTarget(object, targetDist, method="quantile", copy=TRUE, verbose=TRUE)
object |
Object containing probe intensities to be preprocessed. |
method |
String determining which method to use at that preprocessing step. |
targetDist |
Vector with the target distribution |
probes |
Character vector that identifies the name of the probes represented
by the rows of |
copy |
Logical flag determining if data must be copied before processing (TRUE), or if data can be overwritten (FALSE). |
subset |
Not yet implemented. |
target |
One of the following values: 'core', 'full', 'extended', 'probeset'. Used only with Gene ST and Exon ST designs. |
extra |
Extra arguments to be passed to other methods. |
verbose |
Logical flag for verbosity. |
... |
Arguments to be passed to methods. |
Number of rows of object
must match the length of
probes
.
backgroundCorrectionMethods
and normalizationMethods
will return a character vector with the methods implemented currently.
backgroundCorrect
, normalize
and
normalizeToTarget
will return a matrix with same dimensions as
the input matrix. If they are applied to a FeatureSet object, the PM
matrix will be used as input.
The summarize
method will return a matrix with
length(unique(probes))
rows and ncol(object)
columns.
ns <- 100 nps <- 1000 np <- 10 intensities <- matrix(rnorm(ns*nps*np, 8000, 400), nc=ns) ids <- rep(as.character(1:nps), each=np) bgCorrected <- backgroundCorrect(intensities) normalized <- normalize(bgCorrected) summarizationMethods() expression <- summarize(normalized, probes=ids) intensities[1:20, 1:3] expression[1:20, 1:3] target <- rnorm(np*nps) normalizedToTarget <- normalizeToTarget(intensities, target) if (require(oligoData) & require(pd.hg18.60mer.expr)){ ## Example of normalization with real data data(nimbleExpressionFS) boxplot(nimbleExpressionFS, main='Original') for (mtd in normalizationMethods()){ message('Normalizing with ', mtd) res <- normalize(nimbleExpressionFS, method=mtd, verbose=FALSE) boxplot(res, main=mtd) } }
ns <- 100 nps <- 1000 np <- 10 intensities <- matrix(rnorm(ns*nps*np, 8000, 400), nc=ns) ids <- rep(as.character(1:nps), each=np) bgCorrected <- backgroundCorrect(intensities) normalized <- normalize(bgCorrected) summarizationMethods() expression <- summarize(normalized, probes=ids) intensities[1:20, 1:3] expression[1:20, 1:3] target <- rnorm(np*nps) normalizedToTarget <- normalizeToTarget(intensities, target) if (require(oligoData) & require(pd.hg18.60mer.expr)){ ## Example of normalization with real data data(nimbleExpressionFS) boxplot(nimbleExpressionFS, main='Original') for (mtd in normalizationMethods()){ message('Normalizing with ', mtd) res <- normalize(nimbleExpressionFS, method=mtd, verbose=FALSE) boxplot(res, main=mtd) } }