Title: | Classes for high-throughput arrays supported by oligo and crlmm |
---|---|
Description: | This package contains class definitions, validity checks, and initialization methods for classes used by the oligo and crlmm packages. |
Authors: | Benilton Carvalho and Robert Scharpf |
Maintainer: | Benilton Carvalho <[email protected]> and Robert Scharpf <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.69.0 |
Built: | 2024-12-29 07:46:57 UTC |
Source: | https://github.com/bioc/oligoClasses |
Provides a listing of available Affymetrix platforms currently supported by the R package oligo
affyPlatforms()
affyPlatforms()
A vector of class character.
R. Scharpf
affyPlatforms()
affyPlatforms()
A class for storing the locus-level summaries of the normalized intensities
Objects can be created by calls of the form new("AlleleSet", assayData, phenoData, featureData, experimentData, annotation, protocolData, ...)
.
assayData
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
featureData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAME"
~~
annotation
:Object of class "character"
~~
protocolData
:Object of class "AnnotatedDataFrame"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "eSet"
, directly.
Class "VersionedBiobase"
, by class "eSet", distance 2.
Class "Versioned"
, by class "eSet", distance 3.
signature(object = "AlleleSet")
: extract allele
specific summaries. For 50K (XBA and Hind) and 250K (Sty and
Nsp) arrays, an additional argument (strand) must be used (allowed
values: 'sense', 'antisense'.
signature(object = "AlleleSet")
: tests if
data contains allele summaries on both strands for a given SNP.
signature(object = "SnpFeatureSet")
: tests if
data contains allele summaries on both strands for a given SnpFeatureSet.
signature(object = "AlleleSet")
: link to database connection.
signature(object = "AlleleSet")
: average
intensities (across alleles)
signature(object = "AlleleSet")
: log-ratio
(Allele A vs. Allele B)
R. Scharpf
showClass("AlleleSet") ## an empty AlleleSet x <- new("matrix") new("AlleleSet", senseAlleleA=x, senseAlleleB=x, antisenseAlleleA=x, antisenseAlleleB=x) ##or new("AlleleSet", alleleA=x, alleleB=x)
showClass("AlleleSet") ## an empty AlleleSet x <- new("matrix") new("AlleleSet", senseAlleleA=x, senseAlleleB=x, antisenseAlleleA=x, antisenseAlleleB=x) ##or new("AlleleSet", alleleA=x, alleleB=x)
annotationPackages
will return a character vector of the names of annotation packages.
annotationPackages()
annotationPackages()
a character vector of the names of annotation packages
Batch statistics used for estimating copy number are stored as AssayData in the 'batchStatistics' slot of the CNSet class. Each element in the AssayData must have the same number of rows and columns. Rows correspond to features and columns correspond to batch.
A virtual Class: No objects may be created from it.
signature(object = "AssayData")
: ...
signature(object = "AssayData")
: ...
signature(object = "AssayData", allele = "character")
: ...
signature(object = "AssayData", allele = "character")
: ...
signature(object = "AssayData", allele = "character")
: ...
lM
: Extracts entire list of linear model parameters.
corr
: The within-genotype correlation of log2(A) and log2(B) intensities.
nu
: The intercept for the linear model. The linear model is
fit to the A and B alleles independently.
phi
: The slope for the linear model. The linear model is fit
independently to the A and B alleles.
library(crlmm) library(Biobase) data(cnSetExample, package="crlmm") cnSet <- cnSetExample isCurrent(cnSet) assayDataElementNames(batchStatistics(cnSet)) ## Accessors for linear model parameters ## -- Included here primarily as a check that accessors are working ## -- Values are all NA until CN estimation is performed using the crlmm package ## ## subsetting cnSet[1:10, ] ## names of elements in the object ## accessors for parameters nu(cnSet, "A")[1:10, ] nu(cnSet, "B")[1:10, ] phi(cnSet, "A")[1:10, ] phi(cnSet, "B")[1:10, ]
library(crlmm) library(Biobase) data(cnSetExample, package="crlmm") cnSet <- cnSetExample isCurrent(cnSet) assayDataElementNames(batchStatistics(cnSet)) ## Accessors for linear model parameters ## -- Included here primarily as a check that accessors are working ## -- Values are all NA until CN estimation is performed using the crlmm package ## ## subsetting cnSet[1:10, ] ## names of elements in the object ## accessors for parameters nu(cnSet, "A")[1:10, ] nu(cnSet, "B")[1:10, ] phi(cnSet, "A")[1:10, ] phi(cnSet, "B")[1:10, ]
The eSetList-derived classes have an assayDataList slot instead of an assayData slot.
AssayDataList(storage.mode = c("lockedEnvironment", "environment", "list"), ...)
AssayDataList(storage.mode = c("lockedEnvironment", "environment", "list"), ...)
storage.mode |
See |
... |
Named lists of matrices |
environment
R.Scharpf
r <- replicate(5, matrix(rnorm(25),5,5), simplify=FALSE) r <- lapply(r, function(x,dns) {dimnames(x) <- dns; return(x)}, dns=list(letters[1:5], LETTERS[1:5])) ad <- AssayDataList(r=r) ls(ad)
r <- replicate(5, matrix(rnorm(25),5,5), simplify=FALSE) r <- lapply(r, function(x,dns) {dimnames(x) <- dns; return(x)}, dns=list(letters[1:5], LETTERS[1:5])) ad <- AssayDataList(r=r) ls(ad)
Accessor for slot assayDataList in Package oligoClasses
signature(object = "gSetList")
An object inheriting from class gSetList
.
signature(object = "oligoSetList")
An object inheriting from class gSetList
.
Copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. While 'batch' is often unknown, a useful surrogates is often the scan date of the arrays (e.g., the month of the calendar year) or the 96 well chemistry plate on which the samples were arrayed during lab processing.
batch(object) batchNames(object) batchNames(object) <- value
batch(object) batchNames(object) batchNames(object) <- value
object |
An object of class |
value |
For 'batchNames', the value must be a character string corresponding of the unique batch names. |
The method 'batch' returns a character
vector that has the same
length as the number of samples in the CNSet
object.
R. Scharpf
a <- matrix(1:25, 5, 5) colnames(a) <- letters[1:5] object <- new("CNSet", alleleA=a, batch=rep("batch1", 5)) batch(object) batchNames(object)
a <- matrix(1:25, 5, 5) colnames(a) <- letters[1:5] object <- new("CNSet", alleleA=a, batch=rep("batch1", 5)) batch(object) batchNames(object)
The batchStatistics
slot contains statistics estimated
from each batch that are used to derive copy number estimates.
batchStatistics(object) batchStatistics(object) <- value
batchStatistics(object) batchStatistics(object) <- value
object |
An object of class |
value |
An object of class |
An object of class AssayData
for slot
batchStatistics
is initialized automatically when
creating a new CNSet
instance. Required in the call to
new
is a factor called batch
whose unique values
determine the number of columns for each assay data
element.
batchStatics
is an accessor for the slot
batchStatistics
that returns an object of class
AssayData
.
CNSet-class
, batchNames
, batch
"BeadStudioSet"
A container for log R ratios and B allele frequencies from SNP arrays.
Objects can be created by calls of the form new("BeadStudioSet", assayData, phenoData, featureData, experimentData, annotation, protocolData, baf, lrr, ...)
.
featureData
:Object of class "GenomeAnnotatedDataFrame"
~~
assayData
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAxE"
~~
annotation
:Object of class "character"
~~
protocolData
:Object of class
"AnnotatedDataFrame"
~~
genome
:Object of class "character"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "gSet"
, directly.
Class "eSet"
, by class "gSet", distance 2.
Class "VersionedBiobase"
, by class "gSet", distance 3.
Class "Versioned"
, by class "gSet", distance 4.
In the methods below, object
has class BeadStudioSet
.
baf(object)
: accessor for the matrix of B allele frequencies.
baf(object) <- value
replacement
method for B allele frequencies: value
must be a matrix of integers.
as(object, "data.frame")
: coerce to data.frame with column headers 'lrr',
'baf', 'x' (physical position with unit Mb), 'id', and 'is.snp'.
Used for plotting with lattice.
copyNumber(object)
: accessor for log R ratios.
copyNumber(object) <- value
: replacement method for
the log R ratios
signature(.Object = "BeadStudioSet")
:
constructs an instance of the class
lrr(object)
: accessor for matrix of log R ratios
lrr(object) <- value
replacement method for log R
ratios: value
should be a matrix or a ff_matrix
.
show(object)
: print a short summary of the
BeadStudioSet
object.
updateObject(object)
: update a BeadStudioSet
object.
R. Scharpf
new("BeadStudioSet")
new("BeadStudioSet")
Container for log R ratios and B allele frequencies stored by chromosome.
assayDataList
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
featureDataList
:Object of class "list"
~~
chromosome
:Object of class "integer"
~~
annotation
:Object of class "character"
~~
genome
:Object of class "character"
indicating
the genome build. Valid entries are "hg18" and "hg19".
clone2(object, id, prefix="",...)
Performs a deep copy of the ff objects in the assay data elements
of object
. A new object of the same class will be
instantiated. The ff objects in the instantiated object will point to
ff files on disk with prefix given by the argument prefix
.
A use-case for such a function is that one may want to perform wave
correction on the log R ratios in object
, but keep a copy of
the original unadjusted log R ratios. If object
is not copied
using clone2
prior to wave correction, the log R ratios will be
updated on disk and the original, unadjusted log R ratios will no
longer be available.
baf(object)
An accessor for the B allele frequencies
(BAFs). The accessor returns a list where each element of the list is
a matrix of the BAFs for the corresponding element in the SetList
object. While the BAFs have a range [0, 1], they are often saved
internally as integers by multiplying the original BAFs by 1000.
Users can restore the original scale by dividing by 1000.
lrr(object)
An accessor for the log R ratios, an
estimate of the copy number (presumably relative to diploid copy
number) at each marker on a SNP array. The accessor returns a list
where each element of the list is a matrix of the log R ratios for the
corresponding element in the SetList object. The log R ratios are
often saved internally as integers by multiplying the original LRRs by
100 in order to reduce the memory footprint of large studies. Users
can restore the original scale by dividing by 100.
R. Scharpf
See supporting packages for methods defined for the class.
Parses cel file dates from the header of .CEL files for the Affymetrix platform
celfileDate(filename)
celfileDate(filename)
filename |
Name of cel file |
character string
H. Jaffee
require(hapmapsnp6) path <- system.file("celFiles", package="hapmapsnp6") celfiles <- list.celfiles(path, full.names=TRUE) dts <- sapply(celfiles, celfileDate)
require(hapmapsnp6) path <- system.file("celFiles", package="hapmapsnp6") celfiles <- list.celfiles(path, full.names=TRUE) dts <- sapply(celfiles, celfileDate)
Returns the complete cel file (including path) for a CNSet object
celfileName(object)
celfileName(object)
object |
An object of class |
Character string vector.
If the CEL files for an experiment are relocated, the datadir
should be updated accordingly. See examples.
R. Scharpf
## Not run: if(require(crlmm)){ data(cnSetExample, package="crlmm") celfileName(cnSetExample) } ## End(Not run)
## Not run: if(require(crlmm)){ data(cnSetExample, package="crlmm") celfileName(cnSetExample) } ## End(Not run)
Only loads an object if the object name is not in the global environment. If not in the global environment and the file exists, the object is loaded (by default). If the file does not exist, the function FUN is run.
checkExists(.name, .path = ".", .FUN, .FUN2, .save.it=TRUE, .load.it, ...)
checkExists(.name, .path = ".", .FUN, .FUN2, .save.it=TRUE, .load.it, ...)
.name |
Character string giving name of object in global environment |
.path |
Path to where the object is saved. |
.FUN |
Function to be executed if <name> is not in the global environment and the file does not exist. |
.FUN2 |
Not currently used. |
.save.it |
Logical. Whether to save the object to the directory indicaged by
|
.load.it |
Logical. If load.it is TRUE, we try to load the object from the
indicated |
... |
Additional arguments passed to FUN. |
Could be anything – depends on what FUN, FUN2 perform.
Future versions could return a 0 or 1 indicating whether the function performed as expected.
R. Scharpf
path <- tempdir() dir.create(path) x <- 3+6 x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) ##now there is a file called x.rda in tempdir(). The file will be loaded x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) unlink(path, recursive=TRUE)
path <- tempdir() dir.create(path) x <- 3+6 x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) ##now there is a file called x.rda in tempdir(). The file will be loaded x <- checkExists("x", .path=path, .FUN=function(y, z) y+z, y=3, z=6) rm(x) unlink(path, recursive=TRUE)
Checks whether a eSet
-derived class (e.g., a SnpSet
or
CNSet
object) is ordered by chromosome and
physical position
checkOrder(object, verbose = FALSE) chromosomePositionOrder(object, ...)
checkOrder(object, verbose = FALSE) chromosomePositionOrder(object, ...)
object |
A |
verbose |
Logical. |
... |
additional arguments to |
Checks whether the object is ordered by chromosome and physical position.
Logical
R. Scharpf
data(oligoSetExample) if(!checkOrder(oligoSet)){ oligoSet <- chromosomePositionOrder(oligoSet) } checkOrder(oligoSet)
data(oligoSetExample) if(!checkOrder(oligoSet)){ oligoSet <- chromosomePositionOrder(oligoSet) } checkOrder(oligoSet)
Methods for function chromosome
in package oligoClasses ~~
The methods for chromosome
extracts the chromosome (represented
as an integer) for each marker in a eSet
-derived class or a
AnnotatedDataFrame
-derived class.
signature(object = "AnnotatedDataFrame")
Accessor for chromosome.
signature(object = "eSet")
If 'chromosome' is included in
fvarLabels(object)
, the integer representation of the
chromosome will be returned. Otherwise, an error is thrown.
signature(object = "GenomeAnnotatedDataFrame")
Accessor for chromosome. If annotation was not available due to a missing or non-existent annotation package, the value returned by the accessor will be a vector of zero's.
(chromosome(object) <- value)
: Assign chromosome to the
AnnotatedDataFrame
slot of an eSet
-derived object
.
signature(object = "RangedDataCNV")
Accessor for chromosome.
Integer representation: chr X = 23, chr Y = 24, chr XY = 25. Symbols M, Mt, and MT are coded as 26.
chromosome2integer(c(1:22, "X", "Y", "XY", "M"))
chromosome2integer(c(1:22, "X", "Y", "XY", "M"))
Coerces character string for chromosome in the pd. annotation packages to integers
chromosome2integer(chrom) integer2chromosome(intChrom)
chromosome2integer(chrom) integer2chromosome(intChrom)
chrom |
A one or 2 letter character string (e.g, "1", "X", "Y", "MT", "XY") |
intChrom |
An integer vector with values 1-25 possible |
This is useful when sorting SNPs in an object by chromosome and physical position – ensures that the sorting is done in the same way for different objects.
integer2chromosome
returns a vector of character string
indicating the chromosome the same length
as intChrom
. chromosome2integer
returns a vector of
integers the same length as the number of elements in the chrom
vector.
R. Scharpf
chromosome2integer(c(1:22, "X", "Y", "XY", "M")) integer2chromosome(chromosome2integer(c(1:22, "X", "Y", "XY", "M")))
chromosome2integer(c(1:22, "X", "Y", "XY", "M")) integer2chromosome(chromosome2integer(c(1:22, "X", "Y", "XY", "M")))
CNSet is a container for intermediate data and parameters pertaining to allele-specific copy number estimation. Methods for CNSet objects, including accessors for linear model parameters and allele-specific copy number are included here.
An object from the class is not generally intended to
be initialized by the user, but returned by the
genotype
function in the crlmm
package.
The following creates a very basic CNSet
with
assayData
containing the required elements.
new(CNSet, alleleA=new("matrix"), alleleB=new("matrix"), call=new("matrix"),
callProbability=new("matrix"), batch=new("factor"))
batch
:Object of class "factor"
~~
batchStatistics
:Object of class "AssayData"
~~
assayData
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
featureData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAME"
~~
annotation
:Object of class "character"
~~
protocolData
:Object of class
"AnnotatedDataFrame"
~~
datadir
:Object of class "list"
~~
mixtureParams
:Object of class "matrix"
~~
.__classVersion__
:Object of class "Versions"
~~
The argument object
for the following methods is a CNSet
.
object[i, j]
: subset the CNSet
object by
markers (i) and/or samples (j).
A(objet)
: accessor for the normalized intensities of
allele A
A(object) <- value
: replace intensities for the A
allele intensities by value
. The object value
must be
a matrix
, ff_matrix
, or ffdf
.
allele(object, allele)
: accessor for the normalized
intensities for the A or B allele. The argument for allele
must be either 'A' or 'B'
B(objet)
: accessor for the normalized intensities of
allele B
B(object) <- value
: replace intensities for the B
allele intensities by value
. The object value
must be
a matrix
, ff_matrix
, or ffdf
.
batch(object)
: vector of batch labels for each sample.
batchNames(object)
: the unique batch names
batchNames(object) <- value
: relabel the batches
calls(object)
: accessor for genotype calls coded as 1
(AA), 2 (AB), or 3 (BB). Nonpolymorphic markers are NA
.
confs(object)
: accessor for the genotype confidence scores.
close(object)
: close any open file connections to
ff
objects stored in the CNSet
object.
as(object, "oligoSnpSet")
: coerce a CNSet
object to an object of class oligoSnpSet
– a container for
the total copy number and genotype calls.
corr(object)
: the correlation of the A and B
intensities within each genotype.
flags(object)
: flags to indicate possible problems with
the copy number estimation. Not fully implemented at this point.
new("CNSet")
: instantiating a CNSet
object.
nu(object, allele)
: accessor for the intercept
(background) for the A and B alleles. The value of allele
must be 'A' or 'B'.
open(object)
open file connections for all ff
objects stored in the CNSet
object.
nu(object, allele)
: accessor for the slope for the A
and B alleles. The value of allele
must be 'A' or 'B'.
sigma2(object, allele)
: accessor for the within
genotype variance
tau2(object, allele)
: accessor for background variance
R. Scharpf
new("CNSet")
new("CNSet")
"CopyNumberSet"
Container for storing total copy number estimates and confidence scores of the copy number estimates.
Objects can be created by calls of the form new("CopyNumberSet", assayData, phenoData, featureData, experimentData, annotation, protocolData, copyNumber, cnConfidence, ...)
.
assayData
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
featureData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAxE"
~~
annotation
:Object of class "character"
~~
protocolData
:Object of class "AnnotatedDataFrame"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "eSet"
, directly.
Class "VersionedBiobase"
, by class "eSet", distance 2.
Class "Versioned"
, by class "eSet", distance 3.
signature(object = "CopyNumberSet")
: ...
signature(object = "CopyNumberSet", value = "matrix")
: ...
signature(from = "CNSet", to = "CopyNumberSet")
: ...
signature(object = "CopyNumberSet")
: ...
signature(object = "CopyNumberSet", value = "matrix")
: ...
signature(.Object = "CopyNumberSet")
: ...
This container is primarily for platforms for which genotypes are
unavailable. As oligoSnpSet
extends this class, methods
related to total copy number that do not depend on genotypes can be
defined at this level.
R. Scharpf
For genotyping platforms, total copy number estimates and genotype
calls can be stored in the oligoSnpSet
class.
showClass("CopyNumberSet") cnset <- new("CopyNumberSet") ls(Biobase::assayData(cnset))
showClass("CopyNumberSet") cnset <- new("CopyNumberSet") ls(Biobase::assayData(cnset))
Accessors and CopyNumberSet
copyNumber(object, ...) cnConfidence(object) copyNumber(object) <- value cnConfidence(object) <- value
copyNumber(object, ...) cnConfidence(object) copyNumber(object) <- value cnConfidence(object) <- value
object |
|
... |
Ignored for |
value |
matrix |
copyNumber
returns a matrix of copy number estimates or
relative copy number estimates. Since the copy number estimates
are stored as integers (copy number * 100), the matrix returned
by the copyNumber
accessor will need to be divided by a
factor of 100 to transform the measurements back to the original
copy number scale.
cnConfidence
returns a matrix of confidence scores for
the copy number estimates. These are also represented as
integers and will require a back-transformation to the original
scale.
library(Biobase) data(locusLevelData) path <- system.file("extdata", package="oligoClasses") fd <- readRDS(file.path(path, "genomeAnnotatedDataFrameExample.rds")) ## the following command creates an 'oligoSnpSet' object, storing ## an integer representation of the log2 copy number in the 'copyNumber' element ## of the assayData. Genotype calls and genotype confidence scores are also stored ## in the assayData. oligoSet <- new("oligoSnpSet", copyNumber=integerMatrix(log2(locusLevelData[["copynumber"]]/100), 100), call=locusLevelData[["genotypes"]], callProbability=integerMatrix(locusLevelData[["crlmmConfidence"]], 1), annotation=locusLevelData[["platform"]], featureData=fd, genome="hg19") ## There are several accessors for the oligoSnpSet class. icn <- copyNumber(oligoSet) range(icn) ## integer scale lcn <- icn/100 range(lcn) ## log2 copy number ## confidence scores for the genotypes are also represented on an integer scale ipr <- snpCallProbability(oligoSet) range(ipr) ## integer scale ## for genotype confidence scores, the helper function i2p ## converts back to a probability scale pr <- i2p(ipr) range(pr) ## The helper function confs is a shortcut, extracting the ## integer-based confidence scores and transforming to the ## probability scale pr2 <- confs(oligoSet) all.equal(pr, pr2) ## To extract information on the annotation of the SNPs, one can use position(oligoSet) chromosome(oligoSet) ## the position and chromosome coordinates were extracted from build hg19 genomeBuild(oligoSet)
library(Biobase) data(locusLevelData) path <- system.file("extdata", package="oligoClasses") fd <- readRDS(file.path(path, "genomeAnnotatedDataFrameExample.rds")) ## the following command creates an 'oligoSnpSet' object, storing ## an integer representation of the log2 copy number in the 'copyNumber' element ## of the assayData. Genotype calls and genotype confidence scores are also stored ## in the assayData. oligoSet <- new("oligoSnpSet", copyNumber=integerMatrix(log2(locusLevelData[["copynumber"]]/100), 100), call=locusLevelData[["genotypes"]], callProbability=integerMatrix(locusLevelData[["crlmmConfidence"]], 1), annotation=locusLevelData[["platform"]], featureData=fd, genome="hg19") ## There are several accessors for the oligoSnpSet class. icn <- copyNumber(oligoSet) range(icn) ## integer scale lcn <- icn/100 range(lcn) ## log2 copy number ## confidence scores for the genotypes are also represented on an integer scale ipr <- snpCallProbability(oligoSet) range(ipr) ## integer scale ## for genotype confidence scores, the helper function i2p ## converts back to a probability scale pr <- i2p(ipr) range(pr) ## The helper function confs is a shortcut, extracting the ## integer-based confidence scores and transforming to the ## probability scale pr2 <- confs(oligoSet) all.equal(pr, pr2) ## To extract information on the annotation of the SNPs, one can use position(oligoSet) chromosome(oligoSet) ## the position and chromosome coordinates were extracted from build hg19 genomeBuild(oligoSet)
Creates ff objects (array-like) using settings (path) defined by oligoClasses.
createFF(name, dim, vmode = "double", initdata = NULL)
createFF(name, dim, vmode = "double", initdata = NULL)
name |
Prefix for filename. |
dim |
Dimensions. |
vmode |
Mode. |
initdata |
NULL. |
ff object.
This function is meant to be used by developers.
ff
This function will return the SQLite connection to the database associated to objects used in oligo.
db(object)
db(object)
object |
Object of valid class. See methods. |
SQLite connection.
object of class FeatureSet
object of class SnpCallSet
object of class DBPDInfo
object of class SnpLevelSet
Benilton Carvalho
## db(object)
## db(object)
A class for Platform Design Information objects, stored using a database approach
Objects can be created by calls of the form new("DBPDInfo", ...)
.
getdb
:Object of class "function"
tableInfo
:Object of class "data.frame"
manufacturer
:Object of class "character"
genomebuild
:Object of class "character"
geometry
:Object of class "integer"
with length
2 (rows x columns)
string describing annotation package associated to object
Example of ExpressionFeatureSet Object.
data(efsExample)
data(efsExample)
Object belongs to ExpressionFeatureSet class.
data(efsExample) class(efsExample)
data(efsExample) class(efsExample)
Accessor for the 'exprs'/'se.exprs' slot of FeatureSet-like objects
Expression matrix for objects of this class. Usually results of preprocessing algorithms, like RMA.
General container 'exprs' inherited from eSet
General container 'exprs' inherited from eSet, not yet used.
featureDataList
in Package oligoClasses ~~Accessor for slot featureDataList
in Package oligoClasses ~~
signature(object = "gSetList")
An object inheriting from class gSetList
.
Classes to store data from Expression/Exon/SNP/Tiling arrays at the feature level.
The FeatureSet class is VIRTUAL. Therefore users are not able to create instances of such class.
Objects for FeatureSet-like classes can be created by calls of the form:
new(CLASSNAME, assayData, manufacturer, platform, exprs,
phenoData, featureData, experimentData, annotation, ...)
.
But the preferred way is using parsers like
read.celfiles
and read.xysfiles
.
manufacturer
:Object of class "character"
assayData
:Object of class "AssayData"
phenoData
:Object of class "AnnotatedDataFrame"
featureData
:Object of class "AnnotatedDataFrame"
experimentData
:Object of class "MIAME"
annotation
:Object of class "character"
.__classVersion__
:Object of class "Versions"
signature(.Object = "FeatureSet")
: show object contents
signature(.Object = "SnpFeatureSet")
:
checks if object contains data for both strands simultaneously
(50K/250K Affymetrix SNP chips - in this case it returns TRUE); if
object contains data for one strand at a time (SNP 5.0 and SNP 6.0
- in this case it returns FALSE)
Benilton Carvalho
eSet
, VersionedBiobase
, Versioned
set.seed(1) tmp <- 2^matrix(rnorm(100), ncol=4) rownames(tmp) <- 1:25 colnames(tmp) <- paste("sample", 1:4, sep="") efs <- new("ExpressionFeatureSet", exprs=tmp)
set.seed(1) tmp <- 2^matrix(rnorm(100), ncol=4) rownames(tmp) <- 1:25 colnames(tmp) <- paste("sample", 1:4, sep="") efs <- new("ExpressionFeatureSet", exprs=tmp)
~~ A concise (1-5 lines) description of what the class is. ~~
A virtual Class: No objects may be created from it.
.S3Class
:Object of class "character"
~~
Class "oldClass"
, directly.
signature(object = "ff_matrix")
: ...
showClass("ff_matrix")
showClass("ff_matrix")
"ff_or_matrix"
A class union of 'ffdf', 'ff_matrix', and 'matrix'
A virtual Class: No objects may be created from it.
signature(object = "ff_or_matrix")
: ...
R. Scharpf
showClass("ff_or_matrix")
showClass("ff_or_matrix")
Extended package ff's class definitions for ff to S4.
A virtual Class: No objects may be created from it.
.S3Class
:Object of class ffdf
~~
Class "oldClass"
, directly.
Class "list_or_ffdf"
, directly.
No methods defined with class "ffdf" in the signature.
CNSet
objects can contain ff
-derived objects that
contain pointers to files on disk, or ordinary matrices. Here we
define open and close methods for ordinary matrices and vectors that
that simply pass back the original matrix/vector.
open(con, ...) openff(object) closeff(object)
open(con, ...) openff(object) closeff(object)
con |
matrix or vector |
object |
A |
... |
Ignored |
not applicable
R. Scharpf
open(rnorm(15)) open(matrix(rnorm(15), 5,3))
open(rnorm(15)) open(matrix(rnorm(15), 5,3))
Used to flag SNPs with low minor allele frequencies, or for possible problems during the CN estimation step. Currently, this is primarily more for internal use.
flags(object)
flags(object)
object |
An object of class |
A matrix
or ff_matrix
object with rows
corresponding to markers and columns corresponding to batch.
x <- matrix(runif(250*96*2, 0, 2), 250, 96*2) test1 <- new("CNSet", alleleA=x, alleleB=x, call=x, callProbability=x, batch=as.character(rep(letters[1:2], each=96))) dim(flags(test1))
x <- matrix(runif(250*96*2, 0, 2), 250, 96*2) test1 <- new("CNSet", alleleA=x, alleleB=x, call=x, callProbability=x, batch=as.character(rep(letters[1:2], each=96))) dim(flags(test1))
Miscellaneous generics. Methods defined in packages that depend on oligoClasses
baf(object) lrr(object)
baf(object) lrr(object)
object |
A |
R. Scharpf
"GenomeAnnotatedDataFrame"
AnnotatedDataFrame with genomic coordinates (chromosome, position)
varMetadata
:Object of class "data.frame"
~~
data
:Object of class "data.frame"
~~
dimLabels
:Object of class "character"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "AnnotatedDataFrame"
, directly.
Class "Versioned"
, by class "AnnotatedDataFrame", distance 2.
as(from, "GenomeAnnotatedDataFrame")
:
Coerce an object of class AnnotatedDataFrame
to a
GenomeAnnotatedDataFrame
.
makeFeatureGRanges(object, genome, ...)
:
Construct a GRanges
instance from a
GenomeAnnotatedDataFrame
object. genome
is a
character string indicating the UCSC build. Supported builds are
"hg18" and "hg19", but are platform specific. In particular, some
platforms only support build hg19 at this time.
updateObject(object)
:
For updating a GenomeAnnotatedDataFrame
chromosome(object)
, chromosome(object) <- value
Get or set chromosome.
isSnp(object)
:
Many platforms include polymorphic and nonpolymorphic markers. isSnp
evalutes to TRUE
if the marker is polymorphic.
position(ojbect)
:
Physical position in the genome
getArm(object, genome)
:
Retrieve character vector indicating the chromosome arm of each
marker in object
. genome
should indicate which genome
build was used to define the chromosomal locations (currently, only
UCSC genome builds 'hg18' and 'hg19' supported for this function).
R. Scharpf
GenomeAnnotatedDataFrameFrom
is a convenience for creating
GenomeAnnotatedDataFrame
objects.
Use the method with GenomeAnnotatedDataFrameFrom(object,
annotationPkg, genome, ...)
; the argument annotationPkg
must be specified for matrix
and AssayData
classes.
signature(object="assayData")
This method creates an GenomeAnnotatedDataFrame
using
feature names and dimensions of an AssayData
object as
a template.
signature(object="matrix")
This method creates an GenomeAnnotatedDataFrame
using row
names and dimensions of a matrix
object as a template.
signature(object="NULL")
This method (called with 'NULL' as the object) creates an
empty GenomeAnnotatedDataFrame
.
signature(object="array")
This method (called with 'array' as the object) creates a GenomeAnnotatedDataFrame using the first dimension of the array (rows are the number of features).
R Scharpf
require(Biobase) minReqVersion <- "1.0.2" require(human370v1cCrlmm) if (packageDescription("human370v1cCrlmm", fields='Version') >= minReqVersion){ x <- matrix(1:25, 5, 5, dimnames=list(c("rs10000092","rs1000055", "rs100016", "rs10003241", "rs10004197"), NULL)) gd <- GenomeAnnotatedDataFrameFrom(x, annotationPkg="human370v1cCrlmm", genome="hg18") pData(gd) chromosome(gd) position(gd) }
require(Biobase) minReqVersion <- "1.0.2" require(human370v1cCrlmm) if (packageDescription("human370v1cCrlmm", fields='Version') >= minReqVersion){ x <- matrix(1:25, 5, 5, dimnames=list(c("rs10000092","rs1000055", "rs100016", "rs10003241", "rs10004197"), NULL)) gd <- GenomeAnnotatedDataFrameFrom(x, annotationPkg="human370v1cCrlmm", genome="hg18") pData(gd) chromosome(gd) position(gd) }
Returns the genome build. This information comes from the annotation package and is given as an argument during the package creation process.
genomeBuild(object)
genomeBuild(object)
object |
Supported objects include |
character string
Supported builds are UCSC genome builds are 'hg18' and 'hg19'.
showMethods("genomeBuild", where="package:oligoClasses")
showMethods("genomeBuild", where="package:oligoClasses")
For a given array, geometry
returns the physical geometry of it.
geometry(object)
geometry(object)
object |
|
if (require(pd.mapping50k.xba240)) geometry(pd.mapping50k.xba240)
if (require(pd.mapping50k.xba240)) geometry(pd.mapping50k.xba240)
Methods to compute average log-intensities and log-ratios across alleles, within strand.
getA(object) getM(object) A(object, ...) B(object, ...)
getA(object) getM(object) A(object, ...) B(object, ...)
object |
|
... |
arguments to be passed to |
For SNP data, SNPRMA summarizes the SNP information into 4 quantities (log2-scale):
antisenseThetaAantisense allele A. (Not applicable for Affymetrix 5.0 and 6.0 platforms.)
antisenseThetaBantisense allele B. (Not applicable for Affymetrix 5.0 and 6.0 platforms.)
senseThetaAsense allele A. (Not applicable for Affymetrix 5.0 and 6.0 platforms.)
senseThataBsense allele B. (Not applicable for Affymetrix 5.0 and 6.0 platforms.)
alleleAAffymetrix 5.0 and 6.0 platforms
alleleBAffymetrix 5.0 and 6.0 platforms
The average log-intensities are given by:
(antisenseThetaA+antisenseThetaB)/2
and
(senseThetaA+senseThetaB)/2
.
The average log-ratios are given by:
antisenseThetaA-antisenseThetaB
and
senseThetaA-senseThetaB
.
For Tiling data, getM
and getA
return the log-ratio and
average log-intensities computed across channels:
M = log2(channel1)-log2(channel2)
A = (log2(channel1)+log2(channel2))/2
When large data support is enabled with the ff package, the
AssayData elements of an AlleleSet
object can be
ff_matrix
or ffdf
, in which case pointers to the ff
object are stored in the assay data. The functions open
and
close
can be used to open or close the connection,
respectively.
A 3-dimensional array (SNP's x Samples x Strand) with the requested measure, when the input SNP data (50K, 250K).
A 2-dimensional array (SNP's x Samples), when the input is from SNP 5.0 and SNP 6.0 arrays.
A 2-dimensional array if the input is from Tiling arrays.
Gets a bar of a given length.
getBar(width = getOption("width"))
getBar(width = getOption("width"))
width |
desired length of the bar. |
character string.
Benilton S Carvalho
message(getBar())
message(getBar())
Load chromosome sequence lengths for UCSC genome build hg18 or hg19
getSequenceLengths(build)
getSequenceLengths(build)
build |
character string: "hg18" or "hg19" |
The chromosome sequence lengths for UCSC builds hg18 and hg19 were extracted from the packages BSgenome.Hsapiens.UCSC.hg18 and BSgenome.Hsapiens.UCSC.hg19, respectively.
Names integer vector of chromosome lengths.
R. Scharpf
getSequenceLengths("hg18") getSequenceLengths("hg19") if(require("GenomicRanges")){ ## from GenomicRanges sl <- getSequenceLengths("hg18")[c("chr1", "chr2", "chr3")] gr <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10), seqlengths=sl) metadata(gr) <- list(genome="hg18") gr metadata(gr) }
getSequenceLengths("hg18") getSequenceLengths("hg19") if(require("GenomicRanges")){ ## from GenomicRanges sl <- getSequenceLengths("hg18")[c("chr1", "chr2", "chr3")] gr <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), ranges = IRanges(1:10, width = 10:1, names = head(letters,10)), strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)), score = 1:10, GC = seq(1, 0, length=10), seqlengths=sl) metadata(gr) <- list(genome="hg18") gr metadata(gr) }
Methods for GRanges objects
findOverlaps(query, subject, ...)
:
Find the feature indices in subject
that overlap the
genomic intervals in query
, where query
is a
GRanges
object and subject is a gSet
-derived object.
Additional arguments to the findOverlaps
method in the
package IRanges can be passed through the ...
operator.
object
is an instance of the GRanges
class.
coverage2(object)
:
For the GRanges
and GRangesList
objects returned by
the hidden Markov model implemented in the "VanillaICE" package
and the segmentation algorithm in the "MinimumDistance" package,
the intervals are annotated by the number of probes (markers) for
SNPs and nonpolymorphic regions. coverage2
and
numberProbes
are convenient accessors for these
annotations.
genomeBuild(object)
:
Accessor for the UCSC genome build.
numberProbes(object)
:
Integer vector indicating the number of probes (markers) for each
range in object
. Equivalent to coverage2
.
state(object)
:
Accessor for the elementMetadata
column 'state', when
applicable. State is used to contain the index of the inferred
copy number state for various hmm
methods defined in the
VanillaICE.
library(IRanges) library(GenomicRanges) gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(3, 6), state=3L, numberProbes=100L) ## convenience functions state(gr1) numberProbes(gr1) gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), state=c(2L, 2L), numberProbes=c(200L, 250L)) gr3 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(1, 4), c(3, 9)), state=c(1L, 4L), numberProbes=c(300L, 350L)) ## Ranges organized by sample grl <- GRangesList("sample1" = gr1, "sample2" = gr2, "sample3" = gr3) sampleNames(grl) ## same as names(grl) numberProbes(grl) chromosome(grl) state(grl) gr <- stack(grl) sampleNames(gr) chromosome(gr) state(gr)
library(IRanges) library(GenomicRanges) gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(3, 6), state=3L, numberProbes=100L) ## convenience functions state(gr1) numberProbes(gr1) gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), state=c(2L, 2L), numberProbes=c(200L, 250L)) gr3 <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(c(1, 4), c(3, 9)), state=c(1L, 4L), numberProbes=c(300L, 350L)) ## Ranges organized by sample grl <- GRangesList("sample1" = gr1, "sample2" = gr2, "sample3" = gr3) sampleNames(grl) ## same as names(grl) numberProbes(grl) chromosome(grl) state(grl) gr <- stack(grl) sampleNames(gr) chromosome(gr) state(gr)
Container for objects with genomic annotation on SNPs
A virtual Class: No objects may be created from it.
featureData
:Object of class "GenomeAnnotatedDataFrame"
~~
assayData
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAxE"
~~
annotation
:Object of class "character"
~~
protocolData
:Object of class
"AnnotatedDataFrame"
~~
genome
:Object of class "character"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "eSet"
, directly.
Class "VersionedBiobase"
, by class "eSet", distance 2.
Class "Versioned"
, by class "eSet", distance 3.
The object
for the below methods is a class that extends the
virtual class gSet
.
checkOrder(object)
: checks that the object is ordered
by chromosome and physical position. Returns logical
.
chromosome(object)
: accessor for chromosome in the
GenomeAnnotatedDataFrame
slot.
chromosome(object) <- value
: replacement method for chromosome in the
GenomeAnnotatedDataFrame
slot. value
must be an
integer
vector.
db(object)
: database connection
genomeBuild(object)
, genomeBuild(object) <- value
:
Get or set the UCSC genome build. Supported builds are hg18 and hg19.
getArm(object)
: Character vector indicating the chromosomal arm for
each marker in object
.
isSnp(object)
: whether the marker is
polymorphic. Returns a logical
vector.
makeFeatureGRanges(object)
: Construct an instance of the
GRanges
class from a GenomeAnnotatedDataFrame
.
position(object)
: integer
vector of the
genomic position
show(object)
:
Print a concise summary of object
.
R. Scharpf
showClass("gSet")
showClass("gSet")
Virtual Class for Lists of eSets.
A virtual Class: No objects may be created from it.
assayDataList
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
protocolData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAME"
~~
featureDataList
:Object of class "list"
~~
chromosome
:Object of class "vector"
~~
annotation
:Object of class "character"
~~
genome
:Object of class "character"
~~
object
is an instance of a gSetList
-derived class.
annotation(object)
:
character string indicating the package used to provide annotation for the features on the array.
chromosome(object)
:
Returns the chromosome corresponding to each element in the
gSetList
object
elementNROWS(object)
: Returns the number of rows for each
list of assays. In most gSetList
-derived classes, the
assays are organized by chromosome and elementNROWS
returns the number of markers for each chromosome.
genomeBuild(object)
, genomeBuild(object) <- value
:
Get or set the UCSC genome build. Supported builds are hg18 and hg19.
object
is an instance of a gSetList
-derived class.
makeFeatureGRanges(object, ...)
:
Create a GRanges object for the featureData. The featureData is
stored as a list. This method stacks the featureData from each
list element. Metadata columns in the GRanges object include
physical position ('position'), a SNP indicator ('isSnp'), and the
chromosome. The genome build is extracted from object
using
the method genomeBuild
.
R. Scharpf
oligoSetList
, BeadStudioSetList
showClass("gSetList")
showClass("gSetList")
Probabilities estimated in the crlmm
package are often
stored as integers to save memory. We provide a few utility
functions to go back and forth between the probability and
integer representations.
i2p(i) p2i(p)
i2p(i) p2i(p)
i |
A matrix or vector of integers. |
p |
A matrix or vector of probabilities. |
The value returned by i2p
is
1 - exp(-i/1000)
The value returned by 2pi
is
as.integer(-1000*log(1-p))
i2p(693) p2i(0.5) i2p(p2i(0.5))
i2p(693) p2i(0.5) i2p(p2i(0.5))
Initialize big matrices or vectors appropriately (conditioned on the status of support for large datasets - see Details).
initializeBigMatrix(name=basename(tempfile()), nr=0L, nc=0L, vmode = "integer", initdata = NA) initializeBigVector(name=basename(tempfile()), n=0L, vmode = "integer", initdata = NA) initializeBigArray(name=basename(tempfile()), dim=c(0L,0L,0L), vmode="integer", initdata=NA)
initializeBigMatrix(name=basename(tempfile()), nr=0L, nc=0L, vmode = "integer", initdata = NA) initializeBigVector(name=basename(tempfile()), n=0L, vmode = "integer", initdata = NA) initializeBigArray(name=basename(tempfile()), dim=c(0L,0L,0L), vmode="integer", initdata=NA)
name |
prefix to be used for file stored on disk |
nr |
number of rows |
nc |
number of columns |
n |
length of the vector |
vmode |
mode - "integer", "double" |
initdata |
Default is NA |
dim |
Integer vector indicating the dimensions of the array to initialize |
These functions are meant to be used by developers. They provide means to appropriately create big vectors or matrices for packages like oligo and crlmm (and friends). These objects are created conditioned on the status of support for large datasets.
If the 'ff' package is loaded (in the search path), then an 'ff' object is returned. A regular R vector or array is returned otherwise.
x <- initializeBigVector("test", 10) class(x) x if (isPackageLoaded("ff")) finalizer(x) <- "delete" rm(x) initializeBigMatrix(nr=5L, nc=5L) initializeBigArray(dim=c(10, 5, 3))
x <- initializeBigVector("test", 10) class(x) x if (isPackageLoaded("ff")) finalizer(x) <- "delete" rm(x) initializeBigMatrix(nr=5L, nc=5L) initializeBigArray(dim=c(10, 5, 3))
Coerce numeric matrix to matrix of integers, retaining dimnames.
integerMatrix(x, scale = 100) integerArray(x, scale=100)
integerMatrix(x, scale = 100) integerArray(x, scale=100)
x |
a |
scale |
scalar (numeric). If not 1, |
A matrix
or array
of integers.
R. Scharpf
x <- matrix(rnorm(10), 5, 2) rownames(x) = letters[1:5] i <- integerMatrix(x, scale=100)
x <- matrix(rnorm(10), 5, 2) rownames(x) = letters[1:5] i <- integerMatrix(x, scale=100)
Check if object is an ff-matrix object.
is.ffmatrix(object)
is.ffmatrix(object)
object |
object to be checked |
Logical.
This function is meant to be used by developers.
if (isPackageLoaded("ff")){ x1 <- ff(vmode="double", dim=c(10, 2)) is.ffmatrix(x1) } x1 <- matrix(0, nr=10, nc=2) is.ffmatrix(x1)
if (isPackageLoaded("ff")){ x1 <- ff(vmode="double", dim=c(10, 2)) is.ffmatrix(x1) } x1 <- matrix(0, nr=10, nc=2) is.ffmatrix(x1)
Checks if package is loaded.
isPackageLoaded(pkg)
isPackageLoaded(pkg)
pkg |
Package to be checked. |
Checks if package name is in the search path.
Logical.
search
isPackageLoaded("oligoClasses") isPackageLoaded("ff") isPackageLoaded("snow")
isPackageLoaded("oligoClasses") isPackageLoaded("ff") isPackageLoaded("snow")
~~ Methods for function isSnp
in package oligoClasses ~~
Return an indicator for whether the marker is polymorphic (value 1) or nonpolymorphic (value 0).
signature(object = "character", pkgname = "character")
Return an indicator for whether the vector of marker identifiers in
object
is polymorphic. pkgname
must be one of the
supported annotation packages specific to the platform.
signature(object = "eSet", pkgname = "ANY")
If 'isSnp' is included in fvarLabels(object)
, an indicator for
polymorphic markers is returned. Otherwise, an error is thrown.
signature(object = "GenomeAnnotatedDataFrame", pkgname =
"ANY")
Accessor for indicator of whether the marker is polymorphic. If annotation was not available due to a missing or non-existent annotation package, the value returned by the accessor will be a vector of zero's.
Retrieves the array type.
kind(object)
kind(object)
object |
|
String: "Expression", "Exon", "SNP" or "Tiling"
if (require(pd.mapping50k.xba240)){ data(sfsExample) Biobase::annotation(sfsExample) <- "pd.mapping50k.xba240" kind(sfsExample) }
if (require(pd.mapping50k.xba240)){ data(sfsExample) Biobase::annotation(sfsExample) <- "pd.mapping50k.xba240" kind(sfsExample) }
Set/check large dataset options.
ldSetOptions(nsamples=100, nprobesets=20000, path=getwd(), verbose=FALSE) ldStatus(verbose=FALSE) ldPath(path)
ldSetOptions(nsamples=100, nprobesets=20000, path=getwd(), verbose=FALSE) ldStatus(verbose=FALSE) ldPath(path)
nsamples |
number of samples to be processed at once. |
nprobesets |
number of probesets to be processed at once. |
path |
path where to store large dataset objects. |
verbose |
verbosity (logical). |
Some functions in oligo/crlmm can process data in batches to minimize memory footprint. When using this feature, the 'ff' package resources are used (and possibly combined with cluster resources set in options() via 'snow' package).
Methods that are executed on a sample-by-sample manner can use ocSamples() to automatically define how many samples are processed at once (on a compute node). Similarly, methods applied to probesets can use ocProbesets(). Users should set these options appropriately.
ldStatus
checks the support for large datasets.
ldPath
checks where ff files are stored.
Benilton S Carvalho
ocSamples, ocProbesets
ldStatus(TRUE)
ldStatus(TRUE)
Number of samples for FeatureSet-like objects.
Number of samples
Supress package startup messages when loading a library
library2(...)
library2(...)
... |
arguments to |
R. Scharpf
library2("Biobase")
library2("Biobase")
Function used to get a list of CEL files.
list.celfiles(..., listGzipped=FALSE)
list.celfiles(..., listGzipped=FALSE)
... |
Passed to |
listGzipped |
Logical. List .CEL.gz files? |
Character vector with filenames.
Quite often users want to use this function to pass filenames to other methods. In this situations, it is safer to use the argument 'full.names=TRUE'.
if (require(hapmapsnp5)){ path <- system.file("celFiles", package="hapmapsnp5") ## only the filenames list.celfiles(path) ## the filenames with full path... ## very useful when genotyping samples not in the working directory list.celfiles(path, full.names=TRUE) }else{ ## this won't return anything ## if in the working directory there isn't any CEL list.celfiles(getwd()) }
if (require(hapmapsnp5)){ path <- system.file("celFiles", package="hapmapsnp5") ## only the filenames list.celfiles(path) ## the filenames with full path... ## very useful when genotyping samples not in the working directory list.celfiles(path, full.names=TRUE) }else{ ## this won't return anything ## if in the working directory there isn't any CEL list.celfiles(getwd()) }
This object is a list containing the basic data elements required for the HMM
data(locusLevelData)
data(locusLevelData)
A list
The basic assay data elements that can be used for fitting the HMM are:
1. a mapping of platform identifiers to chromosome and physical position
2. (optional) a matrix of copy number estimates
3. (optional) a matrix of confidence scores for the copy number estimates (e.g., inverse standard deviations)
4. (optional) a matrix of genotype calls
5. (optional) CRLMM confidence scores for the genotype calls
At least (2) or (4) is required. The locusLevelData is a list that contains (1), (2), (4), and (5).
A HapMap sample on the Affymetrix 50k platform. Chromosomal alterations were simulated. The last 100 SNPs on chromosome 2 are, in fact, a repeat of the first 100 SNPs on chromosome 1 – this was added for internal use.
data(locusLevelData) str(locusLevelData)
data(locusLevelData) str(locusLevelData)
Construct a GRanges object from several possible feature-level
classes. The conversion is useful for subsequent ranged-data queries,
such as findOverlaps
, countOverlaps
, etc.
makeFeatureGRanges(object, ...)
makeFeatureGRanges(object, ...)
object |
A |
... |
See the |
A GRanges
object.
R. Scharpf
findOverlaps
, GRanges
, GenomeAnnotatedDataFrame
library(oligoClasses) library(GenomicRanges) library(Biobase) library(foreach) registerDoSEQ() data(oligoSetExample, package="oligoClasses") oligoSet <- oligoSet[chromosome(oligoSet) == 1, ] makeFeatureGRanges(oligoSet)
library(oligoClasses) library(GenomicRanges) library(Biobase) library(foreach) registerDoSEQ() data(oligoSetExample, package="oligoClasses") oligoSet <- oligoSet[chromosome(oligoSet) == 1, ] makeFeatureGRanges(oligoSet)
Manufacturer ID for FeatureSet-like and DBPDInfo-like objects.
Manufacturer ID
Manufacturer ID
ocLapply is an lapply-like function that checks if ff/snow are loaded and if the cluster variable is set to execute FUN on a cluster. If these requirements are not available, then lapply is used.
ocLapply(X, FUN, ..., neededPkgs)
ocLapply(X, FUN, ..., neededPkgs)
X |
first argument to FUN. |
FUN |
function to be executed. |
... |
additional arguments to FUN. |
neededPkgs |
packages needed to execute FUN on the compute nodes. |
neededPkgs
is needed when parallel computing is expected to be
used. These packages are loaded on the compute nodes before the
execution of FUN.
A list of length length(X).
Benilton S Carvalho
lapply, parStatus
Tools to simplify management of clusters via 'snow' package and large dataset handling through the 'bigmemory' package.
ocSamples(n) ocProbesets(n)
ocSamples(n) ocProbesets(n)
n |
integer representing the maximum number of samples/probesets to be processed simultaneously on a compute node. |
Some methods in the oligo/crlmm packages, like backgroundCorrect
,
normalize
, summarize
and rma
can use a cluster
(set through the 'foreach' package). The use of cluster features is
conditioned on the availability of the 'ff' (used to
provide shared objects across compute nodes) and 'foreach' packages.
To use a cluster, 'oligo/crlmm' checks for three requirements: 1) 'ff' is loaded; 2) an adaptor for the parallel backend (like 'doMPI', 'doSNOW', 'doMC') is loaded and registered.
If only the 'ff' package is available and loaded (in addition to the caller package - 'oligo' or 'crlmm'), these methods will allow the user to analyze datasets that would not fit in RAM at the expense of performance.
In the situations above (large datasets and cluster), oligo/crlmm uses the
options ocSamples
and ocProbesets
to limit the
amount of RAM used by the machine(s). For example, if ocSamples is
set to 100, steps like background correction and normalization process
(in RAM) 100 samples simultaneously on each compute node. If
ocProbesets is set to 10K, then summarization processes 10K
probesets at a time on each machine.
In both scenarios (large dataset and/or cluster use), there is a penalty in performance because data are written to disk (to either minimize memory footprint or share data across compute nodes).
Benilton Carvalho
if(require(doMC)) { registerDoMC() ## tasks like summarize() }
if(require(doMC)) { registerDoMC() ## tasks like summarize() }
An example instance of the oligoSnpSet
class
data(oligoSetExample)
data(oligoSetExample)
Created from the simulated locusLevelData provided in this package.
## Not run: ## 'oligoSetExample' created by the following data(locusLevelData) oligoSet <- new("oligoSnpSet", copyNumber=integerMatrix(log2(locusLevelData[["copynumber"]]/100), 100), call=locusLevelData[["genotypes"]], callProbability=locusLevelData[["crlmmConfidence"]], annotation=locusLevelData[["platform"]], genome="hg19") oligoSet <- oligoSet[!is.na(chromosome(oligoSet)), ] oligoSet <- oligoSet[chromosome(oligoSet) < 3, ] ## End(Not run) data(oligoSetExample) oligoSet
## Not run: ## 'oligoSetExample' created by the following data(locusLevelData) oligoSet <- new("oligoSnpSet", copyNumber=integerMatrix(log2(locusLevelData[["copynumber"]]/100), 100), call=locusLevelData[["genotypes"]], callProbability=locusLevelData[["crlmmConfidence"]], annotation=locusLevelData[["platform"]], genome="hg19") oligoSet <- oligoSet[!is.na(chromosome(oligoSet)), ] oligoSet <- oligoSet[chromosome(oligoSet) < 3, ] ## End(Not run) data(oligoSetExample) oligoSet
Methods for oligoSnpSet class
In the following code, object
is an instance of the
oligoSnpSet
class.
new("oligoSnpSet", ...)
:
Instantiates an object of class oligoSnpSet
. The assayData
elements of the oligoSnpSet
class can include matrices of
genotype calls, confidence scores for the genotype calls, B allele
frequencies, absolute or relative copy number, and confidence
scores for the copy number estimates. Each matrix should be
coerced to an integer scale prior to assignment to the
oligoSnpSet
object. Validity methods defined for the class
will fail if the matrices are not integers. See examples for
additional details.
baf(object)
:
Accessor for integer representation of the B allele frequencies.
The value returned by this method can be divided by 1000 to obtain
B allele frequencies on the original [0,1] scale.
baf(object) <- value
:
Assign an integer representation of the B allele frequencies to
the 'baf' element of the assayData slot. value
must be a
matrix of integers. See the examples for help converting BAFs to a
matrix of integers.
Checks if oligo/crlmm can use parallel resources (needs ff and snow package, in addition to options(cluster=makeCluster(...)).
parStatus()
parStatus()
logical
Benilton S Carvalho
This function checks if a given package is available on BioConductor and installs it, in case it is.
pdPkgFromBioC(pkgname, lib = .libPaths()[1], verbose = TRUE)
pdPkgFromBioC(pkgname, lib = .libPaths()[1], verbose = TRUE)
pkgname |
character. Name of the package to be installed. |
lib |
character. Path where to install the package at. |
verbose |
logical. Verbosity flag. |
Internet connection required.
Logical: TRUE if package was found, downloaded and installed; FALSE otherwise.
Benilton Carvalho
download.packages
## Not run: pdPkgFromBioC("pd.mapping50k.xba240") ## End(Not run)
## Not run: pdPkgFromBioC("pd.mapping50k.xba240") ## End(Not run)
Platform Information
platform information
This method will return the fragment length for PM probes.
On AffySNPPDInfo
objects, it will
return the fragment length that contains the SNP in question.
Methods for function position
in package oligoClasses
The methods for position
extracts the physical position stored
as an integer for each marker in a eSet
-derived class or a
AnnotatedDataFrame
-derived class.
signature(object = "AnnotatedDataFrame")
Accessor for physical position.
signature(object = "eSet")
If 'position' is included in fvarLabels(object)
, the physical
position will be returned. Otherwise, an error is thrown.
signature(object = "GenomeAnnotatedDataFrame")
Accessor for physical position. If annotation was not available due to a missing or non-existent annotation package, the value returned by the accessor will be a vector of zero's.
This function checkes the existence of a given package and loads it if available. If the package is not available, the function checks its availability on BioConductor, downloads it and installs it.
requireAnnotation(pkgname, lib=.libPaths()[1], verbose = TRUE)
requireAnnotation(pkgname, lib=.libPaths()[1], verbose = TRUE)
pkgname |
character. Package name (usually an annotation package). |
lib |
character. Path where to install packages at. |
verbose |
logical. Verbosity flag. |
Logical: TRUE if package is available or FALSE if package unavailable for download.
Benilton Carvalho
install.packages
## Not run: requirePackage("pd.mapping50k.xba240") ## End(Not run)
## Not run: requirePackage("pd.mapping50k.xba240") ## End(Not run)
Package loaders for clusters.
requireClusterPkgSet(packages) requireClusterPkg(pkg, character.only)
requireClusterPkgSet(packages) requireClusterPkg(pkg, character.only)
packages |
character vector with the names of the packages to be loaded on the compute nodes. |
pkg |
name of a package given as a name or literal character string |
character.only |
a logical indicating whether ‘pkg’ can be assumed to be a character string |
requireClusterPkgSet
applies require
for a set of
packages on the cluster nodes.
requireClusterPkg
applies require
for *ONE* package on
the cluster nodes and accepts every argument taken by require
.
Logical.
Benilton S Carvalho
require
Returns sample names for FeatureSet-like objects.
Sample names
Example of SnpCnvQSet object.
data(scqsExample)
data(scqsExample)
Object belongs to SnpCnvQSet class.
data(scqsExample) class(scqsExample)
data(scqsExample) class(scqsExample)
Tools to simplify management of clusters via 'snow' package and large dataset handling through the 'bigmemory' package.
setCluster(...) getCluster() delCluster()
setCluster(...) getCluster() delCluster()
... |
arguments to be passed to |
Some methods in the oligo/crlmm packages, like backgroundCorrect
,
normalize
, summarize
and rma
can use a cluster
(set through 'snow' package). The use of cluster features is
conditioned on the availability of the 'bigmemory' (used to
provide shared objects across compute nodes) and 'snow' packages.
To use a cluster, 'oligo/crlmm' checks for three requirements: 1) 'ff' is loaded; 2) 'snow' is loaded; and 3) the 'cluster' option is set (e.g., via options(cluster=makeCluster(...)) or setCluster(...)).
If only the 'ff' package is available and loaded (in addition to the caller package - 'oligo' or 'crlmm'), these methods will allow the user to analyze datasets that would not fit in RAM at the expense of performance.
In the situations above (large datasets and cluster), oligo/crlmm uses the
options ocSamples
and ocProbesets
to limit the
amount of RAM used by the machine(s). For example, if ocSamples is
set to 100, steps like background correction and normalization process
(in RAM) 100 samples simultaneously on each compute node. If
ocProbesets is set to 10K, then summarization processes 10K
probesets at a time on each machine.
In both scenarios (large dataset and/or cluster use), there is a penalty in performance because data are written to disk (to either minimize memory footprint or share data across compute nodes).
Benilton Carvalho
Example of SnpFeatureSet object.
data(sfsExample)
data(sfsExample)
Object belongs to SnpFeatureSet class
data(sfsExample) class(sfsExample)
data(sfsExample) class(sfsExample)
Utility functions for accessing data in SnpSet
objects.
calls(object) calls(object) <- value confs(object, transform=TRUE) confs(object) <- value
calls(object) calls(object) <- value confs(object, transform=TRUE) confs(object) <- value
object |
A SnpSet object. |
transform |
Logical. Whether to transform the integer representation of the confidence score (for memory efficiency) to a probability. See details. |
value |
A matrix. |
calls
returns the genotype calls. CRLMM stores genotype calls
as integers (1 - AA; 2 - AB; 3 - BB).
confs
returns the confidences associated with the genotype
calls. The current implementation of CRLMM stores the confidences as
integers to save memory on disk by using the transformation:
round(-1000*log2(1-p)),
where 'p' is the posterior probability of the call. confs
is
a convenience function that transforms the integer representation
back to a probability. Note that if the assayData elements of the
SnpSet
objects are ff_matrix
or ffdf
, the
confs
function will return a warning. For such objects, one
should first subset the ff
object and coerce to a matrix,
then apply the above conversion. The function
snpCallProbability
for the callProbability
slot of
SnpSet
objects. See the examples below.
checkOrder
checks whether the object is ordered by chromosome
and physical position, evaluating to TRUE or FALSE.
Note that the replacement method for confs<-
expects a matrix
of probabilities and will automatically convert the probabilities to
an integer representation. See details for the conversion.
The accessor snpCallProbability
is an accessor for the
'callProbability' element of the assayData
. The name can be
misleading, however, as the accessor will not return a probability if
the call probabilities are represented as integers.
The helper functions p2i
converts probabilities to
integers and i2p
converts integers to probabilities.
See order
and checkOrder
.
theCalls <- matrix(sample(1:3, 20, rep=TRUE), nc=2) p <- matrix(runif(20), nc=2) integerRepresentation <- matrix(as.integer(round(-1000*log(1-p))), 10, 2) obj <- new("SnpSet2", call=theCalls, callProbability=integerRepresentation) calls(obj) confs(obj) ## coerces to probability scale int <- Biobase::snpCallProbability(obj) ## not necessarily a probability p3 <- i2p(int) ## to convert back to a probability
theCalls <- matrix(sample(1:3, 20, rep=TRUE), nc=2) p <- matrix(runif(20), nc=2) integerRepresentation <- matrix(as.integer(round(-1000*log(1-p))), 10, 2) obj <- new("SnpSet2", call=theCalls, callProbability=integerRepresentation) calls(obj) confs(obj) ## coerces to probability scale int <- Biobase::snpCallProbability(obj) ## not necessarily a probability p3 <- i2p(int) ## to convert back to a probability
"SnpSet2"
A container for genotype calls and confidence scores. Similar to the
SnpSet
class in Biobase, but SnpSet2
extends
gSet
directly whereas SnpSet
extends eSet
.
Useful properties of gSet
include the genome
slot and
the GenomeAnnotatedDataFrame
.
Objects can be created by calls of the form new("SnpSet2", assayData, phenoData, featureData, experimentData, annotation, protocolData, call, callProbability, genome, ...)
.
genome
:Object of class "character"
indicating
the UCSC genome build. Supported builds are 'hg18' and 'hg19'.
assayData
:Object of class "AssayData"
.
phenoData
:Object of class "AnnotatedDataFrame"
.
featureData
:Object of class "AnnotatedDataFrame"
.
experimentData
:Object of class "MIAxE"
.
annotation
:Object of class "character"
~~
protocolData
:Object of class "AnnotatedDataFrame"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "gSet"
, directly.
Class "eSet"
, by class "gSet", distance 2.
Class "VersionedBiobase"
, by class "gSet", distance 3.
Class "Versioned"
, by class "gSet", distance 4.
The argument object
for the following methods is an instance of the
SnpSet2
class.
calls(object)
: calls(object) <- value
:
Gets or sets the genotype calls. value
can be a
matrix
or a ff_matrix
.
confs(object)
: confs(object) <- value
:
Gets or sets the genotype confidence scores. value
can be a
matrix
or a ff_matrix
.
snpCall(object)
: snpCallProbability(object) <- value
:
Gets or sets the genotype confidence scores.
R. Scharpf
showClass("SnpSet2") new("SnpSet2")
showClass("SnpSet2") new("SnpSet2")
A class to store locus-level summaries of the quantile normalized intensities, genotype calls, and genotype confidence scores
new("SnpSuperSet", allelea=alleleA, alleleB=alleleB, call=call, callProbability, ...)
.
assayData
:Object of class "AssayData"
~~
phenoData
:Object of class "AnnotatedDataFrame"
~~
featureData
:Object of class "AnnotatedDataFrame"
~~
experimentData
:Object of class "MIAME"
~~
annotation
:Object of class "character"
~~
protocolData
:Object of class "AnnotatedDataFrame"
~~
.__classVersion__
:Object of class "Versions"
~~
Class "AlleleSet"
, directly.
Class "SnpSet"
, directly.
Class "eSet"
, by class "AlleleSet", distance 2.
Class "VersionedBiobase"
, by class "AlleleSet", distance 3.
Class "Versioned"
, by class "AlleleSet", distance 4.
No methods defined with class "SnpSuperSet" in the signature.
R. Scharpf
showClass("SnpSuperSet") ## empty object from the class x <- new("matrix") new("SnpSuperSet", alleleA=x, alleleB=x, call=x, callProbability=x)
showClass("SnpSuperSet") ## empty object from the class x <- new("matrix") new("SnpSuperSet", alleleA=x, alleleB=x, call=x, callProbability=x)
Tools to distribute objects across nodes or by length.
splitIndicesByLength(x, lg, balance=FALSE) splitIndicesByNode(x)
splitIndicesByLength(x, lg, balance=FALSE) splitIndicesByNode(x)
x |
object to be split |
lg |
length |
balance |
logical. Currently ignored |
splitIndicesByLength
splits x
in groups of length lg
.
splitIndicesByNode
splits x
in N groups (where N is the
number of compute nodes available).
List.
Benilton S Carvalho
split
x <- 1:100 splitIndicesByLength(x, 8) splitIndicesByLength(x, 8, balance=TRUE) splitIndicesByNode(x)
x <- 1:100 splitIndicesByLength(x, 8) splitIndicesByLength(x, 8, balance=TRUE) splitIndicesByNode(x)
Example of SnpQSet instance.
data(sqsExample)
data(sqsExample)
Belongs to SnpQSet class.
data(sqsExample) class(sqsExample)
data(sqsExample) class(sqsExample)
Methods for RangedSummarizedExperiment.
## S4 method for signature 'RangedSummarizedExperiment' baf(object) ## S4 method for signature 'RangedSummarizedExperiment' chromosome(object,...) ## S4 method for signature 'RangedSummarizedExperiment' isSnp(object, ...) ## S4 method for signature 'RangedSummarizedExperiment' lrr(object)
## S4 method for signature 'RangedSummarizedExperiment' baf(object) ## S4 method for signature 'RangedSummarizedExperiment' chromosome(object,...) ## S4 method for signature 'RangedSummarizedExperiment' isSnp(object, ...) ## S4 method for signature 'RangedSummarizedExperiment' lrr(object)
object |
A RangedSummarizedExperiment object. |
... |
ignored |
baf
and lrr
are accessors for the B allele
frequencies and log R ratio assays (matrices or arrays), respectively,
chromosome
returns the seqnames
of the rowRanges
.
isSnp
returns a logical vector for each marker in
rowRanges
indicating whether the marker targets a SNP
(nonpolymorphic regions are FALSE).