Package 'bigmelon'

Title:	Illumina methylation array analysis for large experiments
Description:	Methods for working with Illumina arrays using gdsfmt.
Authors:	Tyler J. Gorrie-Stone [aut], Ayden Saffari [aut], Karim Malki [aut], Leonard C. Schalkwyk [cre, aut]
Maintainer:	Leonard C. Schalkwyk <[email protected]>
License:	GPL-3
Version:	1.33.0
Built:	2025-03-29 06:26:25 UTC
Source:	https://github.com/bioc/bigmelon

Help Index

Write large-scale Illumina array datasets to CoreArray Genomic Data Structure (GDS) data files for efficient storage, access, and modification.
Append a MethyLumiSet object to a preexisting gds file
Copy gds node to a backup folder within gds object
Bigmelon accessors
Bigmelon Quantile Normalization methods.
tea.gds - description and validation for gds objects
Bumphunter using bigmelon
Small MethyLumi 450k data sets for testing
Chainsaw – modify gds file by subsetting all nodes
Combine two different gds objects together
Dasen Quantile Normalization by storing ranks
dim.gds.class S3 method returning dimensions of data represented by a gds file handle.
Coersion method for MethyLumiSet, RGChannelSet or MethylSet to CoreArray Genomic Data Structure (GDS) data file
Cell Proportion Estimation using bigmelon
Read finalreport files and convert to genomic data structure files
Convert Genomic Data Structure file to Methylumiset or Methylset object.
Download data from GEO and convert it into a gdsfmt object
Compute the quantiles and ranks for a given gdsn.node
Add data from Illumina IDAT files to a gds file.
Basic data filtering for Illumina methylation data in gds objects
Principal Component Analysis for high-dimensional data
Probe-Wise Outlier Detection for DNA methylation data.
Change the location of the paths for row and column names in a gds file.
Functions imported from wateRmelon

Write large-scale Illumina array datasets to CoreArray Genomic Data Structure (GDS) data files for efficient storage, access, and modification.

Description

Functions for storing Illumina array data as CoreArray Genomic Data Structure (GDS) data files (via the gdsfmt package), appending these files , and applying array normalization methods from the wateRmelon package.

Details

Package:	bigmelon
Type:	Package
Version:	1.13.8
Date:	2020-02-24
License:	GPL3

Author(s)

Tyler Gorrie-Stone Leonard C Schalkwyk, Ayden Saffari, Karim Malki. Who to contact: <[email protected]>, <[email protected]>

References

[1]Tyler J Gorrie-Stone, Melissa C Smart, Ayden Saffari, Karim Malki, Eilis Hannon, Joe Burrage, Jonathan Mill, Meena Kumari, Leonard C Schalkwyk: Bigmelon: tools for analysing large DNA methylation datasets, Bioinformatics, Volume 35, Issue 6, 15 March 2019, Pages 981-986. https://doi.org/10.1093/bioinformatics/bty713

[2]Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC: A data-driven approach to preprocessing Illumina 450K methylation array data. BMC genomics, 14(1), 293.

Append a MethyLumiSet object to a preexisting gds file

Description

This function will append a MethyLumiSet object to a gds file and return a gds.class object.

Usage

app2gds(m, bmln)
app2gds(m, bmln)

Arguments

m

The MethyLumiSet object to be appended to the gds file, with the same number of rows as the gds file.

bmln

Either: A gds.class object

Or: A character string specifying the filepath of an existing .gds file to write to.

Or: A character string specifying the file path of a new .gds file to write to

Details

Currently this function only takes a MethyLumiSet object as the only type of eligible input. This function will also produce unexpected results if the number of rows of the new object does not match the existing .gds file. Hopefully the function will noisily fail if this is the case however to prevent any errors from occuring it is recommended that raw idat files are read in using readEPIC or appended with iadd or iadd2 to ensure that all rows are of the same length and have the same annotation.

Value

A gds.class object pointed towards the newly appended .gds file.

Author(s)

Leonard C Schalkwyk, Ayden Saffari, Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

#load example dataset
data(melon)

#split data into halves
melon_1 <- melon[,1:6]
melon_2 <- melon[,7:12]

#convert first half to gds
e <- es2gds(melon_1,'1_half_melon.gds')

#append second half to existing gds file
f <- app2gds(melon_2,e)

unlink("1_half_melon.gds")
#load example dataset
data(melon)

#split data into halves
melon_1 <- melon[,1:6]
melon_2 <- melon[,7:12]

#convert first half to gds
e <- es2gds(melon_1,'1_half_melon.gds')

#append second half to existing gds file
f <- app2gds(melon_2,e)

unlink("1_half_melon.gds")

Copy gds node to a backup folder within gds object

Description

This function will copy a designated gdsn.class object stored inside a gds object to a backup folder (aptly named backup). If the backup folder does not exist, this will be created. This is a wrapper to copyto.gdsn which should be used if one wishes to copy a gds node to a seperate gds file.

Usage

backup.gdsn(gds = NULL, node)
backup.gdsn(gds = NULL, node)

Arguments

`gds`	If NULL, function will call `getfolder.gdsn` to find the root node. Otherwise, user can specify a separate gds.class object to copy the specified node to.
`node`	gdsn.class object (a gds node) which can be specified using `index.gdsn`

Value

A gdsobject with an additional folder called backup with supplied node within.

Author(s)

Tyler Gorrie-Stone <[email protected]>

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
nod <- index.gdsn(e, "betas")
backup.gdsn(gds = NULL, node = nod)
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
nod <- index.gdsn(e, "betas")
backup.gdsn(gds = NULL, node = nod)
closefn.gds(e)
unlink("melon.gds")

Bigmelon accessors

Description

Functions to access data nodes in gds.class objects.

Usage


## S4 method for signature 'gds.class'
betas(object)
## S4 method for signature 'gds.class'
methylated(object)
## S4 method for signature 'gds.class'
unmethylated(object)
## S4 method for signature 'gds.class'
pvals(object)
## S4 method for signature 'gds.class'
fData(object)
## S4 method for signature 'gds.class'
pData(object)
## S4 method for signature 'gds.class'
QCmethylated(object)
## S4 method for signature 'gds.class'
QCunmethylated(object)
## S4 method for signature 'gds.class'
QCrownames(object)
## S4 method for signature 'gds.class'
getHistory(object)
## S4 method for signature 'gds.class'
colnames(x, do.NULL=TRUE, prefix=NULL)
## S4 method for signature 'gds.class'
rownames(x, do.NULL=TRUE, prefix=NULL)
## S4 method for signature 'gds.class'
exprs(object)
## S4 method for signature 'gds.class'
fot(x)
## S4 method for signature 'gds.class'
betas(object)
## S4 method for signature 'gds.class'
methylated(object)
## S4 method for signature 'gds.class'
unmethylated(object)
## S4 method for signature 'gds.class'
pvals(object)
## S4 method for signature 'gds.class'
fData(object)
## S4 method for signature 'gds.class'
pData(object)
## S4 method for signature 'gds.class'
QCmethylated(object)
## S4 method for signature 'gds.class'
QCunmethylated(object)
## S4 method for signature 'gds.class'
QCrownames(object)
## S4 method for signature 'gds.class'
getHistory(object)
## S4 method for signature 'gds.class'
colnames(x, do.NULL=TRUE, prefix=NULL)
## S4 method for signature 'gds.class'
rownames(x, do.NULL=TRUE, prefix=NULL)
## S4 method for signature 'gds.class'
exprs(object)
## S4 method for signature 'gds.class'
fot(x)

Arguments

object

A gds.class object.

for colnames and rownames:

`x`	A gds.class object.
`do.NULL`	logical. If 'FALSE' and names are 'NULL', names are created.
`prefix`	prefix: for created names.

Details

Each function will return the data stored in the corresponding node as either a gdsn.class object, matrix, or data.frame. These are names following the conventions of the methylumi package and perform similar functions.

Each function which returns a gdsn.class object can be further index using the '[' operators. This includes an additional name argument which optionally returns the named attributes to the data as these are not stored inside the gdsn.node.

The QC functions will return the QC data as a matrix, these are seperated for methylation and unmethylated values and the rownames.

exprs will return a data.frame from beta values for all probes and all samples.

Value

Returns specified node representing the called accessor

Author(s)

Leonard C Schalkwyk, Ayden Saffari, Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

data(melon)
e <- es2gds(melon,'wat_melon.gds')
betas(e)
head(betas(e)[,])
methylated(e)[1:5, 1:3]
unmethylated(e)[1:3 ,1:5]
pvals(e)[1:5, 1:5]
head(fData(e))
head(pData(e))
head(colnames(e))
head(rownames(e))

closefn.gds(e)
unlink("wat_melon.gds")
data(melon)
e <- es2gds(melon,'wat_melon.gds')
betas(e)
head(betas(e)[,])
methylated(e)[1:5, 1:3]
unmethylated(e)[1:3 ,1:5]
pvals(e)[1:5, 1:5]
head(fData(e))
head(pData(e))
head(colnames(e))
head(rownames(e))

closefn.gds(e)
unlink("wat_melon.gds")

Bigmelon Quantile Normalization methods.

Description

Set of functions that are used to perform quantile normalization methods on gds.class objects.

Usage

## S4 method for signature 'gds.class'
dasen(mns, fudge = 100, ret2 = FALSE, node="betas",...)
dasen.gds(gds, node, mns, uns, onetwo, roco, fudge, ret2)
qn.gdsn(gds, target, newnode)
db.gdsn(gds, mns, uns)
dfsfit.gdsn(gds, targetnode, newnode, roco, onetwo)
## S4 method for signature 'gds.class'
dasen(mns, fudge = 100, ret2 = FALSE, node="betas",...)
dasen.gds(gds, node, mns, uns, onetwo, roco, fudge, ret2)
qn.gdsn(gds, target, newnode)
db.gdsn(gds, mns, uns)
dfsfit.gdsn(gds, targetnode, newnode, roco, onetwo)

Arguments

`gds`	A gds.class object
`node`	The "name" of desired output `gdsn.class` node
`mns`	The `gdsn.class` node that corresponds to "methylated" intensities.
`uns`	The `gdsn.class` node that corresponds to "unmethylated" intensities.
`onetwo`	The `gdsn.class` node that corresponds to probe designs (in reference to 450k and EPIC arrays) OR character string pointing to location of `gdsn.class` node. e.g. "fData/DESIGN" OR vector containing probe design types of length > 1.
`roco`	This allows a background gradient model to be fit. This is split from data column names by default. roco=NULL disables model fitting (and speeds up processing), otherwise roco can be supplied as a character vector of strings like 'R01C01' (only 3rd and 6th characters used).
`fudge`	The value added to total intensity to prevent denominactors close to zero when calculation betas. default = 100
`ret2`	if TRUE, appends the newly calculated methylated and unmethylated intensities to original gds (as specified in gds arguement). Will overwrite the raw intensities.
`target`	Target `gdsn.class` node to perform normalization on. If using "*****.gds" method you do not need to specify this.
`targetnode`	Target `gdsn.class` node to perform normalization on. If using "*****.gds" method you do not need to specify this.
`newnode`	"name" of desired output `gdsn.class` node. If using "*****.gds" method you do not need to specify this.
`...`	Additional args such as roco or onetwo.

Details

Each function performs a normalization method described within the wateRmelon package. Functions: qn.gdsn, design.qn.gdsn, db.gdsn and dfsfit.gdsn are described to allow users to create their own custom normalization methods. Otherwise calling dasen or dasen.gds e.t.c will perform the necessary operations for quantile normalization.

Each 'named' normalization method will write a temporary gds object called "temp.gds" into the current working directory and it is removed when normalization is complete. Current methods supplied by default arguments will replace the raw betas with normalized betas, but leave the methylated and unmethylated intensities unprocessed.

Value

Normalization methods return nothing but will affect the gds file and replace/add nodes given to the function.

Author(s)

Tyler J Gorrie-Stone <[email protected]>

Examples

data(melon)
e <- es2gds(melon,'wat_melon.gds')
dasen(e)
closefn.gds(e) # Close gds object
unlink('wat_melon.gds') # Delete Temp file
data(melon)
e <- es2gds(melon,'wat_melon.gds')
dasen(e)
closefn.gds(e) # Close gds object
unlink('wat_melon.gds') # Delete Temp file

tea.gds - description and validation for gds objects

Description

Checks the validity, file location and read/write status of a gds object

Usage

bigPepo(path, gds, manifest, chunksize = NULL, force = TRUE, ...)
bigPepo(path, gds, manifest, chunksize = NULL, force = TRUE, ...)

Details

For DNAm use with the bigmelon package. For interactive use the default verbose option also prints further details. Returns a list.

Bumphunter using bigmelon

Description

Estimate regions for which a genomic profile deviates from its baseline value. Originally implemented to detect differentially methylated genomic regions between two populations. Functions identically to bumphunter.

Usage

bumphunterEngine.gdsn(mat, design, chr = NULL, pos, cluster = NULL, coef = 2, cutoff = NULL, pickCutoff = FALSE, pickCutoffQ = 0.99, maxGap = 500, nullMethod=c("permutation","bootstrap"), smooth = FALSE, smoothFunction = locfitByCluster, useWeights = FALSE, B=ncol(permutations), permutations=NULL, verbose = TRUE, ...)
bumphunterEngine.gdsn(mat, design, chr = NULL, pos, cluster = NULL, coef = 2, cutoff = NULL, pickCutoff = FALSE, pickCutoffQ = 0.99, maxGap = 500, nullMethod=c("permutation","bootstrap"), smooth = FALSE, smoothFunction = locfitByCluster, useWeights = FALSE, B=ncol(permutations), permutations=NULL, verbose = TRUE, ...)

Arguments

`mat`	A gdsn.class object (e.g betas(gfile)
`design`	Design matrix with rows representing samples and columns representing covariates. Regression is applied to each row of mat.
`chr`	A character vector with the chromosomes of each location.
`pos`	A numeric vector representing the chromosomal position.
`cluster`	The clusters of locations that are to be analyzed together. In the case of microarrays, the clusters are many times supplied by the manufacturer. If not available the function `clusterMaker` can be used to cluster nearby locations.
`coef`	An integer denoting the column of the design matrix containing the covariate of interest. The hunt for bumps will be only be done for the estimate of this coefficient.
`cutoff`	A numeric value. Values of the estimate of the genomic profile above the cutoff or below the negative of the cutoff will be used as candidate regions. It is possible to give two separate values (upper and lower bounds). If one value is given, the lower bound is minus the value.
`pickCutoff`	Should bumphunter attempt to pick a cutoff using the permutation distribution?
`pickCutoffQ`	The quantile used for picking the cutoff using the permutation distribution.
`maxGap`	If cluster is not provided this maximum location gap will be used to define cluster via the `clusterMaker` function.
`nullMethod`	Method used to generate null candidate regions, must be one of ‘bootstrap’ or ‘permutation’ (defaults to ‘permutation’). However, if covariates in addition to the outcome of interest are included in the design matrix (ncol(design)>2), the ‘permutation’ approach is not recommended. See vignette and original paper for more information.
`smooth`	A logical value. If TRUE the estimated profile will be smoothed with the smoother defined by `smoothFunction`
`smoothFunction`	A function to be used for smoothing the estimate of the genomic profile. Two functions are provided by the package: `loessByCluster` and `runmedByCluster`.
`useWeights`	A logical value. If `TRUE` then the standard errors of the point-wise estimates of the profile function will be used as weights in the loess smoother `loessByCluster`. If the `runmedByCluster` smoother is used this argument is ignored.
`B`	An integer denoting the number of resamples to use when computing null distributions. This defaults to 0. If `permutations` is supplied that defines the number of permutations/bootstraps and `B` is ignored.
`permutations`	is a matrix with columns providing indexes to be used to scramble the data and create a null distribution when `nullMethod` is set to permutations. If the bootstrap approach is used this argument is ignored. If this matrix is not supplied and `B`>0 then these indexes are created using the function `sample`.
`verbose`	logical value. If `TRUE`, it writes out some messages indicating progress. If `FALSE` nothing should be printed.
`...`	further arguments to be passed to the smoother functions.

Details

This function is a direct replication of the bumphunter function by Rafael A. Irizarry, Martin J. Aryee, Kasper D. Hansen, and Shan Andrews.

Author(s)

Original Function by Rafael A. Irizarry, Martin J. Aryee, Kasper D. Hansen, and Shan Andrews. Bigmelon implementation by Tyler Gorrie-Stone Who to contact if this all goes horribly wrong: <[email protected]>

Small MethyLumi 450k data sets for testing

Description

Small MethyLumi 450k data sets intended for testing purposes only.

Usage

data(cantaloupe)
data(honeydew)
data(cantaloupe)
data(honeydew)

Format

cantaloupe: MethyLumiSet with assayData containing 841 features, 3 samples. honeydew: MethyLumiSet with assayData containing 841 features, 4 samples.

Value

Loads data into R

Chainsaw – modify gds file by subsetting all nodes

Description

Currently the '[' function for the gds.class objects used by bigmelon only subsets a single node. This function does more like what you would normally expect a subsetting function to do, it returns a subset of the entire object. It may in future be a replacement for '[.gds.class'.

Usage

chainsaw(gfile, i = "", j = "", v = FALSE, cleanup = TRUE)
chainsaw(gfile, i = "", j = "", v = FALSE, cleanup = TRUE)

Arguments

`gfile`	A gds.class object.
`i`	Specifies rows (ie probes) in the desired subset, similar to behaviour of '['
`j`	Specifies columns (ie sampless) in the desired subset, similar to behaviour of '['
`v`	If true, spew many messages.
`cleanup`	If true, run a cleanup function that can substantially reduce the file size.

Details

This function is intended for use in the preprocessing and QC phase of a DNA methylation workflow. For efficiency, bigmelon stores data in a file, and the gds.class object is a file handle. True to its name, chainsaw chops the underlying file, this is a side affect of the function and is not affected by assignment of the return value.

Value

a gds.class object. This is a handle to the same file that the gfile argument points to. It's not generally useful to have two handles to the same file, but it may make code more readable. In interactive use, if not assigned, the returned object is usefully pretty-printed.

Combine two different gds objects together

Description

Function will attempt to combine together the shared gdsn.class nodes between two gds object depending on the dimensions of the primary gds.class object.

Usage

combo.gds(file, primary, secondary)
combo.gds(file, primary, secondary)

Arguments

`file`	Name of the new gds file to be created.
`primary`	A gds.class object.
`secondary`	A gds.class object.

Details

–EXPERIMENTAL– Will crudely combine shared nodes between primary and secondary based on the dimensions / rownames of the primary node. NAs will be coerced where probes are missing from secondary gds.

Currently will only look for nodes with the names "betas", "methylated", "unmethylated", "pvals" and "NBeads".

Value

Returns (and creates) as new gds file in the specified location with the combination of two gds objects together.

Note

Will lose information relating to "pData". Therefore we recommend compiling separate pData object manually and adding combined pData post-function

Author(s)

Tyler Gorrie-Stone <[email protected]>

Examples

data(melon)
a <- es2gds(melon[,1:6], "primary.gds")
b <- es2gds(melon[,7:12], "secondary.gds")

ab <- combo.gds("combo.gds", primary = a, secondary = b)

closefn.gds(a)
unlink("primary.gds")
closefn.gds(b)
unlink("secondary.gds")
closefn.gds(ab)
unlink("combo.gds")
data(melon)
a <- es2gds(melon[,1:6], "primary.gds")
b <- es2gds(melon[,7:12], "secondary.gds")

ab <- combo.gds("combo.gds", primary = a, secondary = b)

closefn.gds(a)
unlink("primary.gds")
closefn.gds(b)
unlink("secondary.gds")
closefn.gds(ab)
unlink("combo.gds")

Dasen Quantile Normalization by storing ranks

Description

This performs an experimental variant of dasen normalisation for .gds format objects which stores the ranks of the methylated and unmethylated intensities inside of the normalised values and interpolates the quantiles when they are needed.

Notably this eliminates a secondary re-sorting pass which is required by quantile normalisatoin as it will be performed downstream using computebeta.gds which will produce normalise betas or manually with '[' which will access the ranks and interpolate the specific quantiles as needed.

Usage

dasenrank(gds, mns, uns, onetwo, roco, calcbeta = NULL, perc = 1)
computebeta.gds(gds, new.node, mns, uns, fudge)
dasenrank(gds, mns, uns, onetwo, roco, calcbeta = NULL, perc = 1)
computebeta.gds(gds, new.node, mns, uns, fudge)

Arguments

`gds`	gds.class object which contains methylated and unmethylated intensities. The function will write two (four) nodes to this object called 'mnsrank' and 'unsrank' which contain the ranks of the given nodes.
`mns`	gdsn.class object OR character string that refers to location in gds that relates to the (raw) methylated intensities.
`uns`	gdsn.class object OR character string that refers to location in gds that relates to the (raw) unmethylated intensities.
`onetwo`	gdsn.class object OR character string that refers to location in gds that contains information relating to probe design OR vector of length equal to the number of rows in the array that contains 'I' and 'II' in accordance to Illumina HumanMethylation micro-array design.
`roco`	Sentrix (R0#C0#) position of all samples.
`calcbeta`	Default = NULL, if supplied with a string, a new gdsn.node will be made with supplied string, which will contain the calculated betas.
`perc`	A number between 0 and 1 that relates to the given proportion of columns that are used to normalise the data. Default is set to 1, but incase there are lots of samples to normalise this number can be reduce to increase speed of code.
`new.node`	Character string depicting name of new betas node in given gds object.
`fudge`	Arbitrary value to offset low intensities

Details

calcbeta is a known bottle-neck for this code! Also function is highly experimental.

Value

Nothing is returned to the R environment, however the supplied gds will have 4 or 5 gdsn.nodes added. These are: 'mnsrank', 'unsrank', 'isnamnsrank' (hidden), 'isnaunsrank'(hidden) and calcbeta if supplied. 'mnsrank' and 'unsrank' have been given some attributes - which contain the calculated quantiles from getquantilesandranks.

Author(s)

Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
#dasenrank(gds = e)
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
#dasenrank(gds = e)
closefn.gds(e)
unlink("melon.gds")

dim.gds.class S3 method returning dimensions of data represented by a gds file handle.

Description

dim.gds.class S3 method returning dimensions of data represented by a gds file handle.

Usage

## S3 method for class 'gds.class'
dim(gfile, v = FALSE)
## S3 method for class 'gds.class'
dim(gfile, v = FALSE)

Coersion method for MethyLumiSet, RGChannelSet or MethylSet to CoreArray Genomic Data Structure (GDS) data file

Description

The es2gds function takes a MethyLumiSet, RGChannelSet or MethylSet data object and converts it into a CoreArray Genomic Data Structure (GDS) data file (via the gdsfmt package), returning this as a gds.class object for use with bigmelon.

Usage

es2gds(m, file, qc = TRUE)
es2gds(m, file, qc = TRUE)

Arguments

`m`	A MethyLumiSet, RGChannelSet or MethylSet object
`file`	A character string specifying the name of the .gds file to write to.
`qc`	When set to true (default), data from control probes included.

Value

A gds.class object, which points to the newly created .gds file.

Author(s)

Leonard C Schalkwyk, Ayden Saffari, Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

#load example dataset
data(melon)
#convert to gds
e <- es2gds(melon,'melon.gds')
closefn.gds(e)
unlink('melon.gds')
#load example dataset
data(melon)
#convert to gds
e <- es2gds(melon,'melon.gds')
closefn.gds(e)
unlink('melon.gds')

Cell Proportion Estimation using bigmelon

Description

Estimates the relative proprotion of pure cell types within a sample, identical to estimateCellCounts. Currently, only a reference data-set exists for 450k arrays. As a result, if performed on EPIC data, function will convert gds to 450k array dimensions (this will not be memory efficient).

Usage

    estimateCellCounts.gds(
        gds,
        gdPlatform = c("450k", "EPIC", "27k"),
        mn = NULL,
        un = NULL,
        bn = NULL,
        perc = 1,
        compositeCellType = "Blood",
        probeSelect = "auto",
        cellTypes = c("CD8T","CD4T","NK","Bcell","Mono","Gran"),
        referencePlatform = c("IlluminaHumanMethylation450k",
            "IlluminaHumanMethylationEPIC",
            "IlluminaHumanMethylation27k"),
        returnAll = FALSE,
        meanPlot = FALSE,
        verbose=TRUE,
        ...)
estimateCellCounts.gds(
        gds,
        gdPlatform = c("450k", "EPIC", "27k"),
        mn = NULL,
        un = NULL,
        bn = NULL,
        perc = 1,
        compositeCellType = "Blood",
        probeSelect = "auto",
        cellTypes = c("CD8T","CD4T","NK","Bcell","Mono","Gran"),
        referencePlatform = c("IlluminaHumanMethylation450k",
            "IlluminaHumanMethylationEPIC",
            "IlluminaHumanMethylation27k"),
        returnAll = FALSE,
        meanPlot = FALSE,
        verbose=TRUE,
        ...)

Arguments

`gds`	An object of class gds.class, which contains (un)normalised methylated and unmethylated intensities
`gdPlatform`	Which micro-array platform was used to analysed samples
`mn`	'Name' of gdsn node within gds that contains methylated intensities, if NULL it will default to 'methylated' or 'mnsrank' if `dasenrank` was used prior
`un`	'Name' of gdsn node within gds that contains unmethylated intensities, if NULL it will default to 'unmethylated' or 'unsrank' if `dasenrank` was used prior
`bn`	'Name' of gdsn node within gds that contains un(normalised) beta intensities. If NULL - function will default to 'betas'.
`perc`	Percentage of query-samples to use to normalise reference dataset. This should be 1 unless using a very large data-set which will allow for an increase in performance
`compositeCellType`	Which composite cell type is being deconvoluted. Should be either "Blood", "CordBlood", or "DLPFC"
`probeSelect`	How should probes be selected to distinguish cell types? Options include "both", which selects an equal number (50) of probes (with F-stat p-value < 1E-8) with the greatest magnitude of effect from the hyper- and hypo-methylated sides, and "any", which selects the 100 probes (with F-stat p-value < 1E-8) with the greatest magnitude of difference regardless of direction of effect. Default input "auto" will use "any" for cord blood and "both" otherwise, in line with previous versions of this function and/or our recommendations. Please see the references for more details.
`cellTypes`	Which cell types, from the reference object, should be we use for the deconvolution? See details.
`referencePlatform`	The platform for the reference dataset; if the input `rgSet` belongs to another platform, it will be converted using `convertArray`.
`returnAll`	Should the composition table and the normalized user supplied data be return?
`verbose`	Should the function be verbose?
`meanPlot`	Whether to plots the average DNA methylation across the cell-type discrimating probes within the mixed and sorted samples.
`...`	Other arguments, i.e arguments passed to plots

Details

See estimateCellCounts for more information regarding the exact details. estimateCellCounts.gds differs slightly, as it will impose the quantiles of type I and II probes onto the reference Dataset rather than normalising the two together. This is 1) More memory efficient and 2) Faster - due to not having to normalise out a very small effect the other 60 samples from the reference set will have on the remaining quantiles.

Optionally, a proportion of samples can be used to derive quantiles when there are more than 1000 samples in a dataset, this will further increase performance of the code at a cost of precision.

Read finalreport files and convert to genomic data structure files

Description

Function to easily load Illumina methylation data into a genomic data structure (GDS) file.

Usage

finalreport2gds(finalreport, gds, ...)
finalreport2gds(finalreport, gds, ...)

Arguments

`finalreport`	A filename of the text file exported from GenomeStudio
`gds`	The filename for the gds file to be created
`...`	Additional arguments passed to `methylumiR`

Details

Creates a .gds file.

Value

A gds.class object

Author(s)

Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

finalreport <- "finalreport.txt"
## Not run: finalreport2gds(finalreport, gds="finalreport.gds")
finalreport <- "finalreport.txt"
## Not run: finalreport2gds(finalreport, gds="finalreport.gds")

Convert Genomic Data Structure file to Methylumiset or Methylset object.

Description

Convert a Genomic Data Structure object back into a methylumi object, with subsetting features.

Usage

gds2mlumi(gds, i, j)
gds2mset(gds, i, j, anno)
gds2mlumi(gds, i, j)
gds2mset(gds, i, j, anno)

Arguments

`gds`	a gds object
`i`	Index of rows
`j`	Index of Columns
`anno`	If NULL, function will attempt to guess the annotation to be used. Otherwise can be specified with either "27k", "450k", "epic" or "unknown".

Value

A methylumi object

Author(s)

Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
gds2mlumi(e)
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
gds2mlumi(e)
closefn.gds(e)
unlink("melon.gds")

Download data from GEO and convert it into a gdsfmt object

Description

Uses the GEOquery R package to download a GSE Accession into the current working directory. This will only work for GSE's that have raw idat files associated with them.

Usage

geotogds(geo, gds, method = "wget", keepidat = F, keeptar = F, ...)
geotogds(geo, gds, method = "wget", keepidat = F, keeptar = F, ...)

Arguments

`geo`	Either a GEO accession number ('GSE########') or a previously downloaded tarball 'GSE######.tar.gz'
`gds`	A character string that specifies the path and name of the .gds file you want to write to.
`method`	Character value to indicate which method should be used to download data from GEO. Default is 'wget'
`keepidat`	Logical, indicate whether or not raw idat files in the working directon should be removed after parsing, if FALSE: idat files will be removed.
`keeptar`	Logical, indicate whether or not the downloaded tarball should be removed after parsing, if FALSE: the tarball will be removed.
`...`	Additional Arguments to pass to other functions (if any)

Value

geotogds will return a gds.class object that will point towards a the newly created .gds file with majority of downloaded contents inside.

Author(s)

Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

#load example dataset
# gfile <- geotogds("GSE*******", "Nameoffile.gds")
# Will not work if gds has no idats submitted. May also fail if idats
# are not deposited in a way readily readable by readEPIC().
# closefn.gds(gfile)
#load example dataset
# gfile <- geotogds("GSE*******", "Nameoffile.gds")
# Will not work if gds has no idats submitted. May also fail if idats
# are not deposited in a way readily readable by readEPIC().
# closefn.gds(gfile)

Compute the quantiles and ranks for a given gdsn.node

Description

Used inside dasenrank to generate the quantiles for both type 'I' and type 'II' probes to normalise DNA methylation data using bigmelon.

Usage

getquantilesandranks(gds, node, onetwo, rank.node = NULL, perc = 1)
getquantilesandranks(gds, node, onetwo, rank.node = NULL, perc = 1)

Arguments

`gds`	A gds.class object
`node`	A gdsn.class object, or a character string that refers to a node within supplied gds.
`onetwo`	gdsn.class object OR character string that refers to location in gds that contains information relating to probe design OR vector of length equal to the number of rows in the array that contains 'I' and 'II' in accordance to Illumina HumanMethylation micro-array design. This can be obtained with fot(gds)
`rank.node`	Default = NULL. If supplied with character string, function will calculate the ranks of given node and store them in gds. Additionally, the computed quantiles will now instead be attributed to rank.node which can be accessed with `get.attr.gdsn`
`perc`	A number between 0 and 1 that relates to the given proportion of columns that are used to normalise the data. Default is set to 1, but in cases where there many of samples to normalise this number can be reduced to increase speed of code.

Details

Used in dasenrank, can be used externally for testing purposes.

Value

If rank.node is NULL. A list containing quantiles, intervals and supplied probe design will be returned. If rank.node was supplied, nothing will be returned. Instead a new node will be created in given gds that has the otherwise returned list attached as an attribute. Which can be accessed with get.attr.gdsn

Author(s)

Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
output <- getquantilesandranks(gds = e, 'methylated', onetwo = fot(e), perc = 1, rank.node = NULL)
# with-out put.
#getquantilesandranks(gds = e, 'methylated', onetwo = fot(e), perc = 1, rank.node = 'mnsrank')
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
output <- getquantilesandranks(gds = e, 'methylated', onetwo = fot(e), perc = 1, rank.node = NULL)
# with-out put.
#getquantilesandranks(gds = e, 'methylated', onetwo = fot(e), perc = 1, rank.node = 'mnsrank')
closefn.gds(e)
unlink("melon.gds")

Add data from Illumina IDAT files to a gds file.

Description

iadd will add data from multiple, specified, idat files providing '/path/to/barcode' is valid path to a specified gds file. Barcode here implies the first part of the idat file name i.e without '_(Red|Grn).idat'

iadd2 will add data from all idat files that are stored within a single directory to a gds file.

idats2gds will add data from a set of barcodes into the same gds file, one by one, optionally will handle idat files from different maps providing force=TRUE. However, will not combine from arrays of differing types e.g. 450k vs EPIC.

Usage

iadd(bar, gds, n = TRUE, force = TRUE, target_cpgs = NULL, ...)
iadd2(path, gds, chunksize = NULL, force=TRUE, ...)
idats2gds(barcodes, gds, n=TRUE, force = FALSE, ...)
iadd(bar, gds, n = TRUE, force = TRUE, target_cpgs = NULL, ...)
iadd2(path, gds, chunksize = NULL, force=TRUE, ...)
idats2gds(barcodes, gds, n=TRUE, force = FALSE, ...)

Arguments

`bar`	The barcode for an IDAT file OR the file path of the file containing red or green channel intensities for that barcode (this will automatically locate and import both files regardless of which one you provide)
`path`	The file path where (multiple) IDAT files exist. iadd2 will process every idat within the specified directory.
`gds`	Either: A gds.class object Or: A character string specifying the name of an existing .gds file to write to. Or: A character string specifying the name of a new .gds file to write to
`chunksize`	If NULL, iadd2 will read in all barcodes in one go. Or if supplied with a numeric value, iadd2 will read in that number of idat files in batches
`n`	Logical, whether or not bead-counts are extracted from idat files.
`force`	Logical, whether or not rownames from the first idat file are applied to all idat files. Useful in combining together idat files of differing lengths.
`target_cpgs`	A vector of CpGs to specifically read in and set the dimensions of array to.
`barcodes`	A vector of barcodes to load into an existing gds file.
`...`	Additional Arguments passed to wateRmelons methylumIDATepic.

Value

returns a gds.class object, which points to the appended .gds file.

Author(s)

Tyler Gorrie-Stone, Leonard C Schalkwyk, Ayden Saffari. Who to contact: <[email protected]>

Examples

if(require('minfiData')){
bd <- system.file('extdata', package='minfiData')
gfile <- iadd2(file.path(bd, '5723646052'), gds = 'melon.gds')
closefn.gds(gfile)
unlink('melon.gds')
}
if(require('minfiData')){
bd <- system.file('extdata', package='minfiData')
gfile <- iadd2(file.path(bd, '5723646052'), gds = 'melon.gds')
closefn.gds(gfile)
unlink('melon.gds')
}

Basic data filtering for Illumina methylation data in gds objects

Description

The pfilter function filters data sets based on bead count and detection p-values. The user can set their own thresholds or use the default pfilter settings. This specific function will take a Genomic Data Structure (GDS) file as input and perform pfilter similar to how pfilter in wateRmelon is performed.

Usage

## S4 method for signature 'gds.class'
pfilter(mn, perCount = NULL, pnthresh = NULL,
perc = NULL, pthresh = NULL)
## S4 method for signature 'gds.class'
pfilter(mn, perCount = NULL, pnthresh = NULL,
perc = NULL, pthresh = NULL)

Arguments

`mn`	a gds object OR node corresponding to methylated intensities
`perCount`	Threshold specifying which sites should be removed if they have a given percentage of samples with a beadcount <3, default = 5
`pnthresh`	cut off for detection p-value, default= 0.05
`perc`	remove sample having this percentage of sites with a detection p-value greater than pnthresh, default = 1
`pthresh`	Threshold specifying which sites should be removed if they have a given percentage of samples with a detection p-value greater than pnthresh, default = 1

Value

See pfilter. If using pfilter.gds, function If using pfilter.gds function will return a list of containing two locical vectors of length(nrow) and lneght(ncol) which can be used to subset data. Otherwise if called using pfilter data will be subsetted automatically.

Author(s)

Tyler Gorrie-Stone, Original (wateRmelon) Function by Ruth Pidsley Who to Contact: <[email protected]

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
pfilter(e)
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
pfilter(e)
closefn.gds(e)
unlink("melon.gds")

Principal Component Analysis for high-dimensional data

Description

Performs principal components analysis on the given gds object and returns the results as an object of class "prcomp".

Usage

## S3 method for class 'gds.class'
prcomp(x, node.name, center = FALSE, scale. = FALSE,
rank. = NULL, retx = FALSE, tol = NULL, perc = 0.01,
npcs = NULL, parallel = NULL, method = c('quick', 'sorted'), verbose = FALSE, ...)
## S3 method for class 'gds.class'
prcomp(x, node.name, center = FALSE, scale. = FALSE,
rank. = NULL, retx = FALSE, tol = NULL, perc = 0.01,
npcs = NULL, parallel = NULL, method = c('quick', 'sorted'), verbose = FALSE, ...)

Arguments

`x`	A gds.class object.
`node.name`	Name of the gdsn.class node to learn principal components from
`center`	Logical value indicating whether variables should be shifted to be zero centered.
`scale.`	Logical value indicating whether the variables should be scaled to have unit variance
`tol`	a value indicating the magnitude below which components should be omitted.
`rank.`	(Still functional) Number of principal components to be returned
`retx`	a logical value indicating whether the rotated variables should be returned.
`perc`	The percentage of the number of rows that should be used to calculate principle components. Ranging from 0 to 1, a value of 1 would indicicate all rows will be used.
`npcs`	Number of principal components to be returned
`parallel`	Can supply either a cluster object (made from makeCluster) or a integer describing the number of cores to be used. This is only used if method="sorted".
`method`	Indicates whhich method to use out of "quick" and "sorted". "quick" stochastically selects number of rows according to perc. And the supplies them to svd. "sorted" determines the interquartile range for each row then selects the top percentage (according to perc) of probes with the largest interquartile range and supplies selected rows to svd.
`verbose`	A logical value indicating whether message outputs are displayed.
`...`	arguments passed to or from other methods. If "x" is a formula one might specify "scale." or "tol".

Details

The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using "eigen" on the covariance matrix. This is generally the preferred method for numerical accuracy. The "print" method for these objects prints the results in a nice format and the "plot" method produces a scree plot.

Value

An object of prcomp class

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
prcomp(e, node.name="betas", perc=0.01, method='quick')
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
prcomp(e, node.name="betas", perc=0.01, method='quick')
closefn.gds(e)
unlink("melon.gds")

Probe-Wise Outlier Detection for DNA methylation data.

Description

Function performed outlier detection for each probe (row) using Tukey's Interquartile Range method.

Usage

pwod.gdsn(node, mul = 4)
pwod.gdsn(node, mul = 4)

Arguments

`node`	gdsn.class node that contains the data matrix to be filtered
`mul`	The number of interquartile ranges used to determine outlying probes. Default is 4 to ensure only very obvious outliers are removed.

Details

Detects outlying probes across arrays in methylumi and minfi objects.

Value

Nothing is returned. However the supplied gds object (of-which the node is a child of) will have a new node with NAs interdispersed where outliers are found.

Author(s)

Tyler Gorrie-Stone Who to contact: <[email protected]>

Examples

data(melon)
e <- es2gds(melon, "melon.gds")
pwod(e)
closefn.gds(e)
unlink("melon.gds")
data(melon)
e <- es2gds(melon, "melon.gds")
pwod(e)
closefn.gds(e)
unlink("melon.gds")

Change the location of the paths for row and column names in a gds file.

Description

Change how a gds object ascribes row and column names by changing the paths node of a gds file. Although a bit laborious, the row and column information is not preserved inside a node which contains data. Thusly during accession the row and column names are attributed after data has been accessed.

Usage

redirect.gds(gds, rownames, colnames)
redirect.gds(gds, rownames, colnames)

Arguments

`gds`	`gds.class` object containing node named "paths".
`rownames`	Character string that points to named part of supplied gds that corresponds to rownames. e.g. "fData/Target_ID". Default = "fData/Probe_ID"
`colnames`	Character string that points to names part of supplied gds that corresponds to colnames. e.g. "pData/Sample_ID". Default = "pData/barcode"

Details

This function is particularly important within many functions inside bigmelon and may lead to downstream errors if the row and column names are not correctly specified. If data is read in through es2gds the path nodex should be correctly set up and all downstream analysis will carry out as normal. Will fail noisily if given a pathway that does not exist.

Value

Changes the gdsn.class node named "paths" to supplied rownames and colnames within supplied gds.class object.

Author(s)

Tyler J. Gorrie-Stone Who to contact: <[email protected]>

Examples

data(melon)
e <- es2gds(melon, "melon.gds") # Create gds object
redirect.gds(e, rownames = "fData/TargetID", colnames = "pData/sampleID")
# Deleting Temp files
closefn.gds(e)
unlink("melon.gds")

data(melon)
e <- es2gds(melon, "melon.gds") # Create gds object
redirect.gds(e, rownames = "fData/TargetID", colnames = "pData/sampleID")
# Deleting Temp files
closefn.gds(e)
unlink("melon.gds")

Functions imported from wateRmelon

Description

Manual page for methods for the extraenous functions from wateRmelon. For more details for specific functions see the respective manual pages in wateRmelon.

Usage

## S4 method for signature 'gdsn.class,gdsn.class'
qual(norm, raw)
## S4 method for signature 'gds.class'
predictSex(x, x.probes=NULL, pc=2, plot=TRUE, irlba=TRUE, center=FALSE, scale.=FALSE)
## S4 method for signature 'gdsn.class'
predictSex(x, x.probes=NULL, pc=2, plot=TRUE, irlba=TRUE, center=FALSE, scale.=FALSE)
## S4 method for signature 'gdsn.class,gdsn.class'
qual(norm, raw)
## S4 method for signature 'gds.class'
predictSex(x, x.probes=NULL, pc=2, plot=TRUE, irlba=TRUE, center=FALSE, scale.=FALSE)
## S4 method for signature 'gdsn.class'
predictSex(x, x.probes=NULL, pc=2, plot=TRUE, irlba=TRUE, center=FALSE, scale.=FALSE)

Arguments

`norm`	normalized node (gdsn.class)
`raw`	raw node (gdsn.class)
`x`	gdsclass object or node corresponding to betas
`x.probes`	Default is NULL, is required to be supplied in bigmelon. logical or numeric vector containing indicies of X chromosome probes
`pc`	The principal component to guess sex across (usually the 2nd one in most cases)
`plot`	Logical, indicated whether or not to plot the prediction
`irlba`	Logical, indicate whether or not to use the faster method to generate principal components
`center`	Logical, indicate whether or not to center data around 0
`scale.`	Logical, indicate whether or not to scale data prior to prcomp

Details

For the full usage and description of any functions that link to this manual page, please visit the respective manual pages from wateRmelon.

Value

Returns expected output of functions from wateRmelon

Package 'bigmelon'

Help Index

Write large-scale Illumina array datasets to CoreArray Genomic Data Structure (GDS) data files for efficient storage, access, and modification.

Description

Details

Author(s)

References

See Also

Append a MethyLumiSet object to a preexisting gds file

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Copy gds node to a backup folder within gds object

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Bigmelon accessors

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Bigmelon Quantile Normalization methods.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

tea.gds - description and validation for gds objects

Description

Usage

Details

Bumphunter using bigmelon

Description

Usage

Arguments

Details

Author(s)

Small MethyLumi 450k data sets for testing

Description

Usage

Format

Value

Chainsaw – modify gds file by subsetting all nodes

Description

Usage

Arguments

Details

Value

Combine two different gds objects together

Description

Usage

Arguments

Details

Value

Note

Author(s)

Examples

Dasen Quantile Normalization by storing ranks

Description

Usage

Arguments

Details

Value

Author(s)

Examples