Title: | CDF Environment Maker |
---|---|
Description: | This package has two functions. One reads a Affymetrix chip description file (CDF) and creates a hash table environment containing the location/probe set membership mapping. The other creates a package that automatically loads that environment. |
Authors: | Rafael A. Irizarry <[email protected]>, Laurent Gautier <[email protected]>, Wolfgang Huber <[email protected]>, Ben Bolstad <[email protected]> |
Maintainer: | James W. MacDonald <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.83.0 |
Built: | 2024-12-06 06:11:31 UTC |
Source: | https://github.com/bioc/makecdfenv |
A class describing the content a Chip Description File.
The class Cdf
was designed to store the content of Affymetrix's
Chip Definition Files (CDF). This early class is no longer widely used
in the package affy. Environments (called cdfenvs
) are
preferred to have an efficient mapping between probe set identifiers
and indexes (needed to access/subset particular probe intensities).
Unless one needs to access every information contained in a
CDF file, the cdfenvs
will be preferred.
The following comments are only relevant to someone with interest in
what is in a CDF file.
The name associated
to each probe is not unique, it corresponds to the gene name. It is
very common to have a name repeated 40 times (20 perfect matches and 20
mismatches). Probes can be grouped by pairs: a perfect
match (PM) probe has its mismatch (MM) counterpart. The two probes in a
pair differ by one base (usually located in the middle of the
sequence). The information relative to this particular base are stored
in pbase
and tbase
. At a given position (x,y), having
pbase[x,y] == tbase[x,y] means having a MM while having pbase[x,y] !=
tbase[x,y] means having a perfect mismatch if and only if one of the
bases is A while the other is TRUE, or one is G and the other is C. The
function
pmormm
returns TRUE
for PMs and FALSE
for
MMs.
To know more about the file structure of a CDF file, one has to refer to the parsing code (mostly in C).
new('Cdf',
cdfName = ...., # Object of class character
name = ...., # Object of class matrix
name.levels = ...., # Object of class character
pbase = ...., # Object of class matrix
pbase.levels = ...., # Object of class character
tbase = ...., # Object of class matrix
tbase.levels = ...., # Object of class character
atom = ...., # Object of class matrix
)
cdfName
:the CDF name tag. Used to link with the right CEL files
name
:Object of class "matrix" of 'factors' for the gene names corresponding to the probes.
name.levels
:Object of class "character" containing
the levels corresponding to name
.
pbase
:Object of class "matrix" of pbase levels.
pbase.levels
:Object of class "character" containing
the levels corresponding to pbase
.
tbase
:Object of class "matrix" of tbase levels.
tbase.levels
:Object of class "character" containing
the levels corresponding to tbase
.
atom
:Object of class "matrix" of atom or probe numbers.
(Cdf): An accesor function for the
ato,
slot.
(Cdf): A replacement function for
atom
slot.
(Cdf): An accesor function for the
name.levels
slot.
(Cdf): A replacement function for
name.levels
slot.
(Cdf): An accesor function for the
pbase
slot.
(Cdf): A replacement function for
pbase
slot.
(Cdf): An accesor function for the
pbase.levels
slot.
(Cdf): A replacement function for
pbase.levels
slot.
(Cdf): renders information about the Cdf
object in a concise way on stdout.
(Cdf): An accesor function for the
tbase
slot.
(Cdf): A replacement function for
tbase
slot.
(Cdf): An accesor function for the
tbase.levels
slot.
(Cdf): A replacement function for
tbase.levels
slot.
L. Gautier <[email protected]>
read.cdffile
,
make.cdf.env
,
make.cdf.package
Get specific information stored in Affymetrix data files (CEL or CDF).
getInfoInFile(filename, type, unit, property, compress = NULL)
getInfoInFile(filename, type, unit, property, compress = NULL)
filename |
a file name |
type |
|
unit |
the 'unit' to find the information |
property |
the 'property' of interest |
compress |
a boolean |
The data files seem to have a structure of 'units'. A 'unit' start
with something between square brackets (ex: "[UNIT1]"). The parameter
unit
let one specify in which unit the information of interest
is located. The 'properties' are constituted of an identifier, the
sign '=' and the value for the property (ex: "numCells=12000").
If you are planning to use this function, I assume you know the what is in CEL and CDF files in details and that you know what are doing.
a character with the value of the 'property'.
The code goes through the file until the rigth 'unit' is found. Then it looks for the right 'property' further down (eventually going through the next units if the property is not found).
read.celfile
, read.cdffile
, whatcdf
Reads an Affymetrix chip description file (CDF) and creates an environment used as a hash table for the probe set mapping to location.
make.cdf.env(filename, cdf.path = getwd(), compress = FALSE, return.env.only = TRUE, verbose = TRUE)
make.cdf.env(filename, cdf.path = getwd(), compress = FALSE, return.env.only = TRUE, verbose = TRUE)
filename |
Character. Filename of the CDF file - without the path prefix! |
cdf.path |
Character. Path to the CDF file. |
compress |
Logical. If |
return.env.only |
Logical. If |
verbose |
Logical. If |
Normally, this function should not be called directly. The
preferred way to handle CDF information is to use
make.cdf.package
to build a package, and to
install it into R. The CDF information can then either be invoked
automatically by the package affy
, or can be loaded manually by
calling, for example, library(hgu133a)
.
Some R installations (typically on Windows) do not offer all the tools that are necessary for package building. In such situations, this function may be called directly. Please see the vignette for details - type:
openVignette("makecdfenv")
Return values:
env
is an environment, used as a hash table.
For every probe set name we have a matrix with 2 columns. The first
column contains the PM locations and the second column the MM
locations. For PM only chips the MM column will have NAs.
syms
is a list that contains chip-specific (i.e., CDF-file
specific) information that can be used in the construction of the help
files for the CDF package.
Depending on the argument return.env.only
,
either the environment, or a list with two elements,
env
and syms
. See details.
Rafael A. Irizarry, Wolfgang Huber
env <- make.cdf.env("Hu6800.CDF.gz", cdf.path=system.file("extdata", package="makecdfenv"), compress=TRUE) length(ls(env)) get("U53347_at", env)
env <- make.cdf.env("Hu6800.CDF.gz", cdf.path=system.file("extdata", package="makecdfenv"), compress=TRUE) length(ls(env)) get("U53347_at", env)
This function reads an Affymetrix chip description file (CDF) and creates an R package that when loaded has the CDF environment available for use.
make.cdf.package(filename, packagename = NULL, cdf.path = getwd(), package.path = getwd(), compress = FALSE, author = "The Bioconductor Project", maintainer = "Biocore Package Maintainer <[email protected]>", version = packageDescription("makecdfenv", fields ="Version"), species = NULL, unlink = FALSE, verbose = TRUE)
make.cdf.package(filename, packagename = NULL, cdf.path = getwd(), package.path = getwd(), compress = FALSE, author = "The Bioconductor Project", maintainer = "Biocore Package Maintainer <[email protected]>", version = packageDescription("makecdfenv", fields ="Version"), species = NULL, unlink = FALSE, verbose = TRUE)
filename |
Character. Filename of the CDF file - without the path prefix! |
packagename |
Character. Name wanted for the package. |
cdf.path |
Character. Path to the CDF file. |
package.path |
Character. Path where the package will be created. |
compress |
Logical. If |
author |
Character. What to put in the |
maintainer |
Character. What to put in the |
version |
Character. What to put in the version
field. Should be a of the form |
species |
Character. Must be specified using the format e.g., Homo\_sapiens |
unlink |
Logical. If |
verbose |
Logical. If |
The function is called for its side effect, creating a package. By default the package name will be the name of the CDF file made lower case and with special characters removed (i.e. only alpha-numeric).
In general one would want to use the name given in by
cleancdfname(abatch@cdfName)
with abatch
an
AffyBatch
object obtained, for
example, using ReadAffy
. This is the
package name that the
affy
package looks for by default.
If the user has a CEL file, called filename
, the recommended
package name for the environment is
cleancdfname(whatcdf(filename))
. This usually coincides with
the default.
Please see the vignette for more details.
If success, the function returns the name of the created package.
Rafael A. Irizarry, Wolfgang Huber
pkgpath <- tempdir() make.cdf.package("Hu6800.CDF.gz", cdf.path=system.file("extdata", package="makecdfenv"), compress=TRUE, species = "Homo_sapiens", package.path = pkgpath) dir(pkgpath)
pkgpath <- tempdir() make.cdf.package("Hu6800.CDF.gz", cdf.path=system.file("extdata", package="makecdfenv"), compress=TRUE, species = "Homo_sapiens", package.path = pkgpath) dir(pkgpath)
Determing if intensities on an array from the cdffile are PM, MM, or neither.
pmormm(cdf)
pmormm(cdf)
cdf |
A |
The intensities on a chip fall in three categories: perfect match (PM), mismatch (MM) or unknown (No information about them is contained in the CDF file, yet they were found informative (details to come)). PM are coded as TRUE, MM as FALSE and unknown as NA.
A matrix (see section 'details').
L. Gautier <[email protected]>
Read the data contained in a CDF file
read.cdffile(file, compress=FALSE)
read.cdffile(file, compress=FALSE)
file |
the name of the CDF file |
compress |
whether the file is compressed or not |
This function is intended for use by
make.cdf.env
, which in turn is called by
make.cdf.package
. User may not have much
benefit from calling this function directly.
In order to save memory, the name corresponding to each value in the
CEL is a factor. As in R factor
objects
cannot be also of type matrix
, the names corresponding to the indices
were stored in a vector of type character
called
name.levels
.
The same thing was done with the pbase and cbase information (more for
consistency than by conviction that some memory
could be saved here).
Returns a Cdf-class
object.
A Cdf
object is not a cdf environment, which will be needed for
the computation of expression values from the probe intensities in a
AffyBatch
.
Laurent Gautier ([email protected])
fn <- system.file("extdata", "Hu6800.CDF.gz", package="makecdfenv") mycdf <- read.cdffile(fn, compress=TRUE) mycdf
fn <- system.file("extdata", "Hu6800.CDF.gz", package="makecdfenv") mycdf <- read.cdffile(fn, compress=TRUE) mycdf