Title: | SEnsible Step-wise Analysis of DNA MEthylation BeadChips |
---|---|
Description: | Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines. |
Authors: | Wanding Zhou [aut, cre] , Wubin Ding [ctb], David Goldberg [ctb], Ethan Moyer [ctb], Bret Barnes [ctb], Timothy Triche [ctb], Hui Shen [aut] |
Maintainer: | Wanding Zhou <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.25.0 |
Built: | 2024-11-01 06:29:25 UTC |
Source: | https://github.com/bioc/sesame |
SEnsible and step-wise analysis of DNA methylation data
This package complements array functionalities that allow processing >10,000 samples in parallel on clusters.
package
Wanding Zhou [email protected], Hui Shen [email protected] Timothy J Triche Jr [email protected]
Zhou W, Triche TJ, Laird PW, Shen H (2018)
Useful links:
sdf <- readIDATpair(sub('_Grn.idat','',system.file( 'extdata','4207113116_A_Grn.idat',package='sesameData'))) ## The OpenSesame pipeline betas <- openSesame(sdf)
sdf <- readIDATpair(sub('_Grn.idat','',system.file( 'extdata','4207113116_A_Grn.idat',package='sesameData'))) ## The OpenSesame pipeline betas <- openSesame(sdf)
This function essentially merge existing probe masking with new probes to mask
addMask(sdf, probes)
addMask(sdf, probes)
sdf |
a |
probes |
a vector of probe IDs or a logical vector with TRUE representing masked probes |
a SigDF
with added mask
sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sum(addMask(sdf, c("cg14057072", "cg22344912"))$mask)
sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sum(addMask(sdf, c("cg14057072", "cg22344912"))$mask)
Aggregate test enrichment results
aggregateTestEnrichments(result_list, column = "estimate", return_df = FALSE)
aggregateTestEnrichments(result_list, column = "estimate", return_df = FALSE)
result_list |
a list of results from testEnrichment |
column |
the column name to aggregate (Default: estimate) |
return_df |
whether to return a merged data frame |
a matrix for all results
## pick some big TFBS-overlapping CpG groups cg_lists <- KYCG_getDBs("MM285.TFBS") queries <- cg_lists[(sapply(cg_lists, length) > 40000)] result_list <- lapply(queries, testEnrichment, "MM285.chromHMM") mtx <- aggregateTestEnrichments(result_list)
## pick some big TFBS-overlapping CpG groups cg_lists <- KYCG_getDBs("MM285.TFBS") queries <- cg_lists[(sapply(cg_lists, length) > 40000)] result_list <- lapply(queries, testEnrichment, "MM285.chromHMM") mtx <- aggregateTestEnrichments(result_list)
assemble plots
assemble_plots( betas, txns, probes, plt.txns, plt.mapLines, plt.cytoband, heat.height = NULL, mapLine.height = 0.2, show.probeNames = TRUE, show.samples.n = NULL, show.sampleNames = TRUE, sample.name.fontsize = 10, dmin = 0, dmax = 1 )
assemble_plots( betas, txns, probes, plt.txns, plt.mapLines, plt.cytoband, heat.height = NULL, mapLine.height = 0.2, show.probeNames = TRUE, show.samples.n = NULL, show.sampleNames = TRUE, sample.name.fontsize = 10, dmin = 0, dmax = 1 )
betas |
beta value |
txns |
transcripts GRanges |
probes |
probe GRanges |
plt.txns |
transcripts plot objects |
plt.mapLines |
map line plot objects |
plt.cytoband |
cytoband plot objects |
heat.height |
heatmap height (auto inferred based on rows) |
mapLine.height |
height of the map lines |
show.probeNames |
whether to show probe names |
show.samples.n |
number of samples to show (default: all) |
show.sampleNames |
whether to show sample names |
sample.name.fontsize |
sample name font size |
dmin |
data min |
dmax |
data max |
a grid object
Collapse betas by averagng probes with common probe ID prefix
betasCollapseToPfx(betas, BPPARAM = SerialParam())
betasCollapseToPfx(betas, BPPARAM = SerialParam())
betas |
either a named numeric vector or a numeric matrix (row: probes, column: samples) |
BPPARAM |
use MulticoreParam(n) for parallel processing |
either named numeric vector or a numeric matrix of collapsed beta value matrix
## input is a matrix m <- matrix(seq(0,1,length.out=9), nrow=3) rownames(m) <- c("cg00004963_TC21", "cg00004963_TC22", "cg00004747_TC21") colnames(m) <- c("A","B","C") betasCollapseToPfx(m) ## input is a vector m <- setNames(seq(0,1,length.out=3), c("cg00004963_TC21", "cg00004963_TC22", "cg00004747_TC21")) betasCollapseToPfx(m)
## input is a matrix m <- matrix(seq(0,1,length.out=9), nrow=3) rownames(m) <- c("cg00004963_TC21", "cg00004963_TC22", "cg00004747_TC21") colnames(m) <- c("A","B","C") betasCollapseToPfx(m) ## input is a vector m <- setNames(seq(0,1,length.out=3), c("cg00004963_TC21", "cg00004963_TC22", "cg00004747_TC21")) betasCollapseToPfx(m)
Logit transform a beta value vector to M-value vector.
BetaValueToMValue(b)
BetaValueToMValue(b)
b |
vector of beta values |
Convert beta-value to M-value (aka logit transform)
a vector of M values
BetaValueToMValue(c(0.1, 0.5, 0.9))
BetaValueToMValue(c(0.1, 0.5, 0.9))
require GenomicRanges
binSignals(probe.signals, bin.coords, probeCoords)
binSignals(probe.signals, bin.coords, probeCoords)
probe.signals |
probe signals |
bin.coords |
bin coordinates |
probeCoords |
probe coordinates |
bin signals
Compute GCT score for internal bisulfite conversion control. The function
takes a SigSet
as input. The higher the GCT score, the more likely
the incomplete conversion.
bisConversionControl(sdf, extR = NULL, extA = NULL, verbose = FALSE)
bisConversionControl(sdf, extR = NULL, extA = NULL, verbose = FALSE)
sdf |
a SigDF |
extR |
a vector of probe IDs for Infinium-I probes that extend to converted A |
extA |
a vector of probe IDs for Infinium-I probes that extend to original A |
verbose |
print more messages |
GCT score (the higher, the more incomplete conversion)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') bisConversionControl(sdf) ## For more recent platforms like EPICv2, MSA: ## One need extR and extA of other arrays using the sesameAnno ## Not run: mft = sesameAnno_buildManifestGRanges(sprintf( "%s/EPICv2/EPICv2.hg38.manifest.tsv.gz", "https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/"), columns="nextBase") extR = names(mft)[!is.na(mft$nextBase) & mft$nextBase=="R"] extA = names(mft)[!is.na(mft$nextBase) & mft$nextBase=="A"] ## End(Not run)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') bisConversionControl(sdf) ## For more recent platforms like EPICv2, MSA: ## One need extR and extA of other arrays using the sesameAnno ## Not run: mft = sesameAnno_buildManifestGRanges(sprintf( "%s/EPICv2/EPICv2.hg38.manifest.tsv.gz", "https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/"), columns="nextBase") extR = names(mft)[!is.na(mft$nextBase) & mft$nextBase=="R"] extA = names(mft)[!is.na(mft$nextBase) & mft$nextBase=="A"] ## End(Not run)
The effect size is defined by the maximum variation of a variable with all the other variables controled constant.
calcEffectSize(pred)
calcEffectSize(pred)
pred |
predictions |
a data.frame of effect sizes. Columns are different variables. Rows are different probes.
data <- sesameDataGet('HM450.76.TCGA.matched') res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo) head(calcEffectSize(res))
data <- sesameDataGet('HM450.76.TCGA.matched') res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo) head(calcEffectSize(res))
filter data matrix by factor completeness only works for discrete factors
checkLevels(betas, fc)
checkLevels(betas, fc)
betas |
matrix data |
fc |
factors, or characters |
a boolean vector whether there is non-NA value for each tested group for each probe
se0 <- sesameDataGet("MM285.10.SE.tissue")[1:100,] se_ok <- checkLevels(SummarizedExperiment::assay(se0), SummarizedExperiment::colData(se0)$tissue) sum(se_ok) # number of good probes se1 <- se0[se_ok,] sesameDataGet_resetEnv()
se0 <- sesameDataGet("MM285.10.SE.tissue")[1:100,] se_ok <- checkLevels(SummarizedExperiment::assay(se0), SummarizedExperiment::colData(se0)$tissue) sum(se_ok) # number of good probes se1 <- se0[se_ok,] sesameDataGet_resetEnv()
Lookup address and transform address to probe
chipAddressToSignal(dm, mft, min_beads = NULL)
chipAddressToSignal(dm, mft, min_beads = NULL)
dm |
data frame in chip address, 2 columns: cy3/Grn and cy5/Red |
mft |
a data frame with columns Probe_ID, M, U and col |
min_beads |
minimum bead counts, otherwise masked |
Translate data in chip address to probe address. Type I probes can be separated into Red and Grn channels. The methylated allele and unmethylated allele are at different addresses. For type II probes methylation allele and unmethylated allele are at the same address. Grn channel is for methylated allele and Red channel is for unmethylated allele. The out-of-band signals are type I probes measured using the other channel.
a SigDF, indexed by probe ID address
Perform copy number segmentation using the signals in the signal set.
The function takes a SigDF
for the target sample and a set of
normal SigDF
for the normal samples. An optional arguments specifies
the version of genome build that the inference will operate on. The function
outputs an object of class CNSegment
with signals for the segments (
seg.signals), the bin coordinates (
bin.coords) and bin signals (bin.signals).
cnSegmentation( sdf, sdfs.normal = NULL, genomeInfo = NULL, probeCoords = NULL, tilewidth = 50000, verbose = FALSE, return.probe.signals = FALSE )
cnSegmentation( sdf, sdfs.normal = NULL, genomeInfo = NULL, probeCoords = NULL, tilewidth = 50000, verbose = FALSE, return.probe.signals = FALSE )
sdf |
|
sdfs.normal |
a list of |
genomeInfo |
the genomeInfo files. The default is retrieved from sesameData. Alternative genomeInfo files can be found at https://github.com/zhou-lab/GenomeInfo |
probeCoords |
the probe coordinates in the corresponding genome if NULL (default), then the default genome assembly is used. Default genome is given by, e.g., sesameData_check_genome(NULL, "EPIC") For additional mapping, download the GRanges object from http://zwdzwd.github.io/InfiniumAnnotation and provide the following argument ..., probeCoords = sesameAnno_buildManifestGRanges("downloaded_file"),... to this function. |
tilewidth |
tile width for smoothing |
verbose |
print more messages |
return.probe.signals |
return probe-level instead of bin-level signal |
an object of CNSegment
sesameDataCache() ## Not run: sdfs <- sesameDataGet('EPICv2.8.SigDF') sdf <- sdfs[["K562_206909630040_R01C01"]] seg <- cnSegmentation(sdf) seg <- cnSegmentation(sdf, return.probe.signals=TRUE) visualizeSegments(seg) ## End(Not run)
sesameDataCache() ## Not run: sdfs <- sesameDataGet('EPICv2.8.SigDF') sdf <- sdfs[["K562_206909630040_R01C01"]] seg <- cnSegmentation(sdf) seg <- cnSegmentation(sdf, return.probe.signals=TRUE) visualizeSegments(seg) ## End(Not run)
calculates the pariwise overlap between given list of database sets using a distance metric.
compareDatbaseSetOverlap(databases = NA, metric = "Jaccard")
compareDatbaseSetOverlap(databases = NA, metric = "Jaccard")
databases |
List of vectors corresponding to the database sets of interest with associated meta data as an attribute to each element. Optional. (Default: NA) |
metric |
String representing the similarity metric to use. Optional. (Default: "Jaccard"). |
An upper triangular matrix containing a metric (Jaccard) comparing the pairwise distances between database sets.
Compare Strain SNPs with a reference panel
compareMouseStrainReference( betas = NULL, show_sample_names = FALSE, query_width = NULL )
compareMouseStrainReference( betas = NULL, show_sample_names = FALSE, query_width = NULL )
betas |
beta value vector or matrix (for multiple samples) |
show_sample_names |
whether to show sample name |
query_width |
optional argument for adjusting query width |
grid object that contrast the target sample with pre-built mouse strain reference
sesameDataCache() # if not done yet compareMouseStrainReference()
sesameDataCache() # if not done yet compareMouseStrainReference()
Compare mouse array data with mouse tissue references
compareMouseTissueReference( betas = NULL, ref = NULL, color = "blueYellow", query_width = 0.3 )
compareMouseTissueReference( betas = NULL, ref = NULL, color = "blueYellow", query_width = 0.3 )
betas |
matrix of betas for the target sample This argument is optional. If not given, only the reference will be shown. |
ref |
the reference beta values in SummarizedExperiment. This argument is optional. If not given, the reference will be downloaded from the sesameData package. |
color |
either blueYellow or fullJet |
query_width |
the width of the query beta value matrix |
grid object that contrast the target sample with pre-built mouse tissue reference
cat("Deprecated, see compareReference")
cat("Deprecated, see compareReference")
Compare array data with references (e.g., tissue, cell types)
compareReference( ref, betas = NULL, stop.points = NULL, query_width = 0.3, show_sample_names = FALSE )
compareReference( ref, betas = NULL, stop.points = NULL, query_width = 0.3, show_sample_names = FALSE )
ref |
the reference beta values in SummarizedExperiment. One can download them from the sesameData package. See examples. |
betas |
matrix of betas for the target sample This argument is optional. If not given, only the reference will be shown. |
stop.points |
stop points for the color palette. Default to blue, yellow. |
query_width |
the width of the query beta value matrix |
show_sample_names |
whether to show sample names (default: FALSE) |
grid object that contrast the target sample with references.
sesameDataCache() # if not done yet compareReference(sesameDataGet("MM285.tissueSignature")) sesameDataGet_resetEnv()
sesameDataCache() # if not done yet compareReference(sesameDataGet("MM285.tissueSignature")) sesameDataGet_resetEnv()
get the controls attributes
controls(sdf, verbose = FALSE)
controls(sdf, verbose = FALSE)
sdf |
a |
verbose |
print more messages |
the controls data frame
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') head(controls(sdf))
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') head(controls(sdf))
Convert Probe ID
convertProbeID( x, target_platform, source_platform = NULL, mapping = NULL, target_uniq = TRUE, include_new = FALSE, include_old = FALSE, return_mapping = FALSE )
convertProbeID( x, target_platform, source_platform = NULL, mapping = NULL, target_uniq = TRUE, include_new = FALSE, include_old = FALSE, return_mapping = FALSE )
x |
source probe IDs |
target_platform |
the platform to take the data to |
source_platform |
optional source platform |
mapping |
a liftOver mapping file. Typically this file contains empirical evidence whether a probe mapping is reliable. If given, probe ID-based mapping will be skipped. This is to perform more stringent probe ID mapping. |
target_uniq |
whether the target Probe ID should be kept unique. |
include_new |
if true, include mapping of added probes |
include_old |
if true, include mapping of deleted probes |
return_mapping |
return mapping table, instead of the target IDs. |
mapped probe IDs, or mapping table if return_mapping = T
createGeneNetwork creates database network using the Jaccard index.
createDBNetwork(databases)
createDBNetwork(databases)
databases |
Vector of probes corresponding to a single database set of interest. |
ggplot lollipop plot
Turn beta values into a UCSC browser track
createUCSCtrack(betas, output = NULL, platform = "HM450", genome = "hg38")
createUCSCtrack(betas, output = NULL, platform = "HM450", genome = "hg38")
betas |
a named numeric vector |
output |
output file name |
platform |
HM450, EPIC etc. |
genome |
hg38, mm10, ..., will infer if not given. For additional mapping, download the GRanges object from http://zwdzwd.github.io/InfiniumAnnotation and provide the following argument ..., genome = sesameAnno_buildManifestGRanges("downloaded_file"),... to this function. |
when output is null, return a data.frame, otherwise NULL
betas.tissue <- sesameDataGet('HM450.1.TCGA.PAAD')$betas ## add output to create an actual file df <- createUCSCtrack(betas.tissue) ## to convert to bigBed ## sort -k1,1 -k2,2n output.bed >output_sorted.bed ## bedToBigBed output_sorted.bed hg38.chrom output.bb
betas.tissue <- sesameDataGet('HM450.1.TCGA.PAAD')$betas ## add output to create an actual file df <- createUCSCtrack(betas.tissue) ## to convert to bigBed ## sort -k1,1 -k2,2n output.bed >output_sorted.bed ## bedToBigBed output_sorted.bed hg38.chrom output.bb
The function convert a data frame back to a list of sesameQC objects
dataFrame2sesameQC(df)
dataFrame2sesameQC(df)
df |
a publicQC data frame |
a list sesameQC objects
dbStats builds dataset for a given betas matrix composed of engineered features from the given database sets
dbStats( betas, databases, fun = mean, na.rm = TRUE, n_min = NULL, f_min = 0.1, long = FALSE )
dbStats( betas, databases, fun = mean, na.rm = TRUE, n_min = NULL, f_min = 0.1, long = FALSE )
betas |
matrix of beta values where probes are on the rows and samples are on the columns |
databases |
List of vectors corresponding to probe locations for which the features will be extracted |
fun |
aggregation function, default to mean |
na.rm |
whether to remove NA |
n_min |
min number of non-NA for aggregation function to apply, overrides f_min |
f_min |
min fraction of non-NA for aggregation function to apply |
long |
produce long-form result |
matrix with samples on the rows and database set on the columns
library(SummarizedExperiment) se <- sesameDataGet('MM285.467.SE.tissue20Kprobes') head(dbStats(assay(se), "MM285.chromHMM")[,1:3]) sesameDataGet_resetEnv()
library(SummarizedExperiment) se <- sesameDataGet('MM285.467.SE.tissue20Kprobes') head(dbStats(assay(se), "MM285.chromHMM")[,1:3]) sesameDataGet_resetEnv()
Mask SNP probe intensity mean by zero.
deIdentify(path, out_path = NULL, snps = NULL, mft = NULL, randomize = FALSE)
deIdentify(path, out_path = NULL, snps = NULL, mft = NULL, randomize = FALSE)
path |
input IDAT file |
out_path |
output IDAT file |
snps |
SNP definition, if not given, default to SNP probes |
mft |
sesame-compatible manifest if non-standard |
randomize |
whether to randomize the SNPs. if TRUE, randomize the signal intensities. one can use set.seed to reidentify the IDAT with the secret seed (see examples). If FALSE, this sets all SNP intensities to zero. |
NULL, changes made to the IDAT files
my_secret <- 13412084 set.seed(my_secret) temp_out <- tempfile("test") deIdentify(system.file( "extdata", "4207113116_A_Grn.idat", package = "sesameData"), temp_out, randomize = TRUE) unlink(temp_out)
my_secret <- 13412084 set.seed(my_secret) temp_out <- tempfile("test") deIdentify(system.file( "extdata", "4207113116_A_Grn.idat", package = "sesameData"), temp_out, randomize = TRUE) unlink(temp_out)
The function takes a SigDF
as input, computes detection p-value
using negative control probes' empirical distribution and returns a new
SigDF
with an updated mask slot.
detectionPnegEcdf(sdf, return.pval = FALSE, pval.threshold = 0.05)
detectionPnegEcdf(sdf, return.pval = FALSE, pval.threshold = 0.05)
sdf |
a |
return.pval |
whether to return p-values, instead of a
masked |
pval.threshold |
minimum p-value to mask |
a SigDF
, or a p-value vector if return.pval is TRUE
sdf <- sesameDataGet("EPIC.1.SigDF") sum(sdf$mask) sum(detectionPnegEcdf(sdf)$mask)
sdf <- sesameDataGet("EPIC.1.SigDF") sum(sdf$mask) sum(detectionPnegEcdf(sdf)$mask)
The function takes a matrix with probes on the rows and cell types on the columns and output a subset matrix and only probes that show discordant methylation levels among the cell types.
diffRefSet(g)
diffRefSet(g)
g |
a matrix with probes on the rows and cell types on the columns |
g a matrix with a subset of input probes (rows)
g = diffRefSet(getRefSet(platform='HM450')) sesameDataGet_resetEnv()
g = diffRefSet(getRefSet(platform='HM450')) sesameDataGet_resetEnv()
List all contrasts of a DMLSummary
dmContrasts(smry)
dmContrasts(smry)
smry |
a DMLSummary object |
a character vector of contrasts
data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:10,], ~type, meta=data$sampleInfo) dmContrasts(smry) sesameDataGet_resetEnv()
data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:10,], ~type, meta=data$sampleInfo) dmContrasts(smry) sesameDataGet_resetEnv()
The function takes a beta value matrix with probes on the rows and samples on the columns. It also takes a sample information data frame (meta) and formula for testing. The function outputs a list of coefficient tables for each factor tested.
DML(betas, fm, meta = NULL, BPPARAM = SerialParam())
DML(betas, fm, meta = NULL, BPPARAM = SerialParam())
betas |
beta values, matrix or SummarizedExperiment rows are probes and columns are samples. |
fm |
formula |
meta |
data frame for sample information, column names are predictor variables (e.g., sex, age, treatment, tumor/normal etc) and are referenced in formula. Rows are samples. When the betas argument is a SummarizedExperiment object, this is ignored. colData(betas) will be used instead. The row order of the data frame must match the column order of the beta value matrix. |
BPPARAM |
number of cores for parallel processing, default to SerialParam() Use MulticoreParam(mc.cores) for parallel processing. For Windows, try DoparParam or SnowParam. |
a list of test summaries, summary.lm objects
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:1000,], ~type, meta=data$sampleInfo) sesameDataGet_resetEnv()
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:1000,], ~type, meta=data$sampleInfo) sesameDataGet_resetEnv()
This function is also important for investigating factor interactions.
DMLpredict(betas, fm, pred = NULL, meta = NULL, BPPARAM = SerialParam())
DMLpredict(betas, fm, pred = NULL, meta = NULL, BPPARAM = SerialParam())
betas |
beta values, matrix or SummarizedExperiment rows are probes and columns are samples. |
fm |
formula |
pred |
new data for prediction, useful for studying effect size. This argument is a data.frame to specify new data. If the argument is NULL, all combinations of all contrasts will be used as input. It might not work if there is a continuous variable input. One may need to explicitly provide the input in a data frame. |
meta |
data frame for sample information, column names are predictor variables (e.g., sex, age, treatment, tumor/normal etc) and are referenced in formula. Rows are samples. When the betas argument is a SummarizedExperiment object, this is ignored. colData(betas) will be used instead. |
BPPARAM |
number of cores for parallel processing, default to SerialParam() Use MulticoreParam(mc.cores) for parallel processing. For Windows, try DoparParam or SnowParam. |
a SummarizedExperiment of predictions. The colData describes the input of the prediction.
data <- sesameDataGet('HM450.76.TCGA.matched') ## use all contrasts as new input res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo) ## specify new input res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo, pred = data.frame(type=c("Normal","Tumour"))) ## note that the prediction needs to be a factor of the same ## level structure as the original training data. pred = data.frame(type=factor(c("Normal"), levels=c("Normal","Tumour"))) res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo, pred = pred)
data <- sesameDataGet('HM450.76.TCGA.matched') ## use all contrasts as new input res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo) ## specify new input res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo, pred = data.frame(type=c("Normal","Tumour"))) ## note that the prediction needs to be a factor of the same ## level structure as the original training data. pred = data.frame(type=factor(c("Normal"), levels=c("Normal","Tumour"))) res <- DMLpredict(data$betas[1:10,], ~type, meta=data$sampleInfo, pred = pred)
This subroutine uses Euclidean distance to group CpGs and then combine p-values for each segment. The function performs DML test first if cf is NULL. It groups the probe testing results into differential methylated regions in a coefficient table with additional columns designating the segment ID and statistical significance (P-value) testing the segment.
DMR( betas, smry, contrast, platform = NULL, probe.coords = NULL, dist.cutoff = NULL, seg.per.locus = 0.5 )
DMR( betas, smry, contrast, platform = NULL, probe.coords = NULL, dist.cutoff = NULL, seg.per.locus = 0.5 )
betas |
beta values for distance calculation |
smry |
DML |
contrast |
the pair-wise comparison or contrast check colnames(attr(smry, "model.matrix")) if uncertain |
platform |
EPIC, HM450, MM285, ... |
probe.coords |
GRanges object that defines CG coordinates if NULL (default), then the default genome assembly is used. Default genome is given by, e.g., sesameData_check_genome(NULL, "EPIC") For additional mapping, download the GRanges object from http://zwdzwd.github.io/InfiniumAnnotation and provide the following argument ..., probe.coords = sesameAnno_buildManifestGRanges("downloaded_file"),... to this function. |
dist.cutoff |
cutoff of beta value differences for two neighboring CGs to be considered the same DMR (by default it's determined using the quantile function on seg.per.locus) |
seg.per.locus |
number of segments per locus higher value leads to more segments |
coefficient table with segment ID and segment P-value each row is a locus, multiple loci may share a segment ID if they are merged to the same segment. Records are ordered by Seg_Est.
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:1000,], ~type, meta=data$sampleInfo) colnames(attr(smry, "model.matrix")) # pick a contrast from here ## showing on a small set of 100 CGs merged_segs <- DMR(data$betas[1:1000,], smry, "typeTumour", platform="HM450") sesameDataGet_resetEnv()
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:1000,], ~type, meta=data$sampleInfo) colnames(attr(smry, "model.matrix")) # pick a contrast from here ## showing on a small set of 100 CGs merged_segs <- DMR(data$betas[1:1000,], smry, "typeTumour", platform="HM450") sesameDataGet_resetEnv()
The function takes a SigDF
as input and scale both the Grn and Red
signal to a reference (ref) level. If the reference level is not given, it
is set to the mean intensity of all the in-band signals. The function
returns a SigDF
with dye bias corrected.
dyeBiasCorr(sdf, ref = NULL)
dyeBiasCorr(sdf, ref = NULL)
sdf |
a |
ref |
reference signal level |
a normalized SigDF
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf.db <- dyeBiasCorr(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf.db <- dyeBiasCorr(sdf)
The function chose the reference signal level from a list of SigDF
.
The chosen sample has the smallest difference in Grn and Red signal
intensity as measured using the normalization control probes. In practice,
it doesn't matter which sample is chosen as long as the reference level
does not deviate much. The function returns a list of SigDF
s with
dye bias corrected.
dyeBiasCorrMostBalanced(sdfs)
dyeBiasCorrMostBalanced(sdfs)
sdfs |
a list of normalized |
a list of normalized SigDF
s
sesameDataCache() # if not done yet sdfs <- sesameDataGet('HM450.10.SigDF')[1:2] sdfs.db <- dyeBiasCorrMostBalanced(sdfs)
sesameDataCache() # if not done yet sdfs <- sesameDataGet('HM450.10.SigDF')[1:2] sdfs.db <- dyeBiasCorrMostBalanced(sdfs)
The function takes a SigDF
as input and scale both the Grn and Red
signal to a reference (ref) level. If the reference level is not given, it
is set to the mean intensity of all the in-band signals. The function
returns a SigDF
with dye bias corrected.
dyeBiasL(sdf, ref = NULL)
dyeBiasL(sdf, ref = NULL)
sdf |
a |
ref |
reference signal level |
a normalized SigDF
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf.db <- dyeBiasL(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf.db <- dyeBiasL(sdf)
This function compares the Type-I Red probes and Type-I Grn probes and
generates and mapping to correct signal of the two channels to the middle.
The function takes one single SigDF
and returns a SigDF
with dye bias corrected.
dyeBiasNL(sdf, mask = TRUE, verbose = FALSE) dyeBiasCorrTypeINorm(sdf, mask = TRUE, verbose = FALSE)
dyeBiasNL(sdf, mask = TRUE, verbose = FALSE) dyeBiasCorrTypeINorm(sdf, mask = TRUE, verbose = FALSE)
sdf |
a |
mask |
include masked probes in Infinium-I probes. No big difference is noted in practice. More probes are generally better. |
verbose |
print more messages |
a SigDF
after dye bias correction.
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf.db <- dyeBiasNL(sdf) sdf <- sesameDataGet('EPIC.1.SigDF') sdf <- dyeBiasCorrTypeINorm(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf.db <- dyeBiasNL(sdf) sdf <- sesameDataGet('EPIC.1.SigDF') sdf <- dyeBiasCorrTypeINorm(sdf)
ELiminate BAckground-dominated Reading (ELBAR)
ELBAR( sdf, return.pval = FALSE, pval.threshold = 0.05, margin = 0.05, capMU = 3000, delta.beta = 0.2, n.windows = 500 )
ELBAR( sdf, return.pval = FALSE, pval.threshold = 0.05, margin = 0.05, capMU = 3000, delta.beta = 0.2, n.windows = 500 )
sdf |
a |
return.pval |
whether to return p-values, instead of a SigDF |
pval.threshold |
minimum p-value to mask |
margin |
the percentile margin to define envelope, the smaller the value the more aggressive the masking. |
capMU |
the maximum M+U to search for intermediate betas |
delta.beta |
maximum beta value change from sheer background-dominated readings |
n.windows |
number of windows for smoothing |
a SigDF
with mask added
sdf <- sesameDataGet("EPIC.1.SigDF") sum(sdf$mask) sum(ELBAR(sdf)$mask)
sdf <- sesameDataGet("EPIC.1.SigDF") sum(sdf$mask) sum(ELBAR(sdf)$mask)
The method assumes only two components in the mixture: the leukocyte component and the target tissue component. The function takes the beta values matrix of the target tissue and the beta value matrix of the leukocyte. Both matrices have probes on the row and samples on the column. Row names should have probe IDs from the platform. The function outputs a single numeric describing the fraction of leukocyte.
estimateLeukocyte( betas.tissue, betas.leuko = NULL, betas.tumor = NULL, platform = c("EPIC", "HM450", "HM27") )
estimateLeukocyte( betas.tissue, betas.leuko = NULL, betas.tumor = NULL, platform = c("EPIC", "HM450", "HM27") )
betas.tissue |
tissue beta value matrix (#probes X #samples) |
betas.leuko |
leukocyte beta value matrix, if missing, use the SeSAMe default by infinium platform |
betas.tumor |
optional, tumor beta value matrix |
platform |
"HM450", "HM27" or "EPIC" |
leukocyte estimate, a numeric vector
betas.tissue <- sesameDataGet('HM450.1.TCGA.PAAD')$betas estimateLeukocyte(betas.tissue) sesameDataGet_resetEnv()
betas.tissue <- sesameDataGet('HM450.1.TCGA.PAAD')$betas estimateLeukocyte(betas.tissue) sesameDataGet_resetEnv()
Convert SNP from Infinium array to VCF file
formatVCF(sdf, anno, vcf = NULL, genome = "hg38", verbose = FALSE)
formatVCF(sdf, anno, vcf = NULL, genome = "hg38", verbose = FALSE)
sdf |
SigDF |
anno |
SNP variant annotation, available at https://github.com/zhou-lab/InfiniumAnnotationV1/tree/main/Anno/EPIC EPIC.hg38.snp.tsv.gz |
vcf |
output VCF file path, if NULL output to console |
genome |
genome |
verbose |
print more messages |
VCF file. If vcf is NULL, a data.frame is output to console. The data.frame does not contain VCF headers. Note the output vcf is not sorted.
sesameDataCacheAll() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') ## Not run: ## download anno from ## http://zwdzwd.github.io/InfiniumAnnotation ## output to console anno = read_tsv(sesameAnno_download("EPICv2.hg38.snp.tsv.gz")) head(formatVCF(sdf, anno)) ## End(Not run)
sesameDataCacheAll() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') ## Not run: ## download anno from ## http://zwdzwd.github.io/InfiniumAnnotation ## output to console anno = read_tsv(sesameAnno_download("EPICv2.hg38.snp.tsv.gz")) head(formatVCF(sdf, anno)) ## End(Not run)
Get allele frequency
getAFs(sdf, ...)
getAFs(sdf, ...)
sdf |
|
... |
additional options to getBetas |
allele frequency
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') af <- getAFs(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') af <- getAFs(sdf)
Takes a SigDF
as input and returns a numeric vector containing
extra allele frequencies based on Color-Channel-Switching (CCS) probes.
If no CCS probes exist in the SigDF
, then an numeric(0) is
returned.
getAFTypeIbySumAlleles(sdf, known.ccs.only = TRUE)
getAFTypeIbySumAlleles(sdf, known.ccs.only = TRUE)
sdf |
|
known.ccs.only |
consider only known CCS probes |
beta values
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') af <- getAFTypeIbySumAlleles(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') af <- getAFTypeIbySumAlleles(sdf)
sum.typeI is used for rescuing beta values on
Color-Channel-Switching CCS probes. The function takes a SigDF
and returns beta value except that Type-I in-band signal and out-of-band
signal are combined. This prevents color-channel switching due to SNPs.
getBetas( sdf, mask = TRUE, sum.TypeI = FALSE, collapseToPfx = FALSE, collapseMethod = c("mean", "minPval") )
getBetas( sdf, mask = TRUE, sum.TypeI = FALSE, collapseToPfx = FALSE, collapseMethod = c("mean", "minPval") )
sdf |
|
mask |
whether to use mask |
sum.TypeI |
whether to sum type I channels |
collapseToPfx |
remove replicate to prefix (e.g., cg number) and remove the suffix |
collapseMethod |
mean or minPval |
a numeric vector, beta values
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') betas <- getBetas(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') betas <- getBetas(sdf)
requires GenomicRanges, IRanges
getBinCoordinates(seqLength, gapInfo, tilewidth = 50000, probeCoords)
getBinCoordinates(seqLength, gapInfo, tilewidth = 50000, probeCoords)
seqLength |
chromosome information object |
gapInfo |
chromosome gap information |
tilewidth |
tile width for smoothing |
probeCoords |
probe coordinates |
bin.coords
get probe masking by mask names
getMask(platform = "EPICv2", mask_names = "recommended")
getMask(platform = "EPICv2", mask_names = "recommended")
platform |
EPICv2, EPIC, HM450, HM27, ... |
mask_names |
mask names (see listAvailableMasks) by default: "recommended" see recommendedMaskNames() for detail. |
a vector of probe ID
length(getMask("MSA", "recommended")) length(getMask("EPICv2", "recommended")) length(getMask("EPICv2", c("recommended", "M_SNPcommon_1pt"))) length(getMask("EPICv2", "M_mapping")) length(getMask("EPIC")) length(getMask("HM450")) length(getMask("MM285"))
length(getMask("MSA", "recommended")) length(getMask("EPICv2", "recommended")) length(getMask("EPICv2", c("recommended", "M_SNPcommon_1pt"))) length(getMask("EPICv2", "M_mapping")) length(getMask("EPIC")) length(getMask("HM450")) length(getMask("MM285"))
The function retrieves the curated reference DNA methylation status for a set of cell type names under the Infinium platform. Supported cell types include "CD4T", "CD19B", "CD56NK", "CD14Monocytes", "granulocytes", "scFat", "skin" etc. See package sesameData for more details. The function output a matrix with probes on the rows and specified cell types on the columns. 0 suggests unmethylation and 1 suggests methylation. Intermediate methylation and nonclusive calls are left with NA.
getRefSet(cells = NULL, platform = c("EPIC", "HM450"))
getRefSet(cells = NULL, platform = c("EPIC", "HM450"))
cells |
reference cell types |
platform |
EPIC or HM450 |
g, a 0/1 matrix with probes on the rows and specified cell types on the columns.
betas = getRefSet('CD4T', platform='HM450') sesameDataGet_resetEnv()
betas = getRefSet('CD4T', platform='HM450') sesameDataGet_resetEnv()
Impute of missing data of specific platform
imputeBetas( betas, platform = NULL, BPPARAM = SerialParam(), celltype = NULL, sd_max = 999 )
imputeBetas( betas, platform = NULL, BPPARAM = SerialParam(), celltype = NULL, sd_max = 999 )
betas |
named vector of beta values |
platform |
platform |
BPPARAM |
use MulticoreParam(n) for parallel processing |
celltype |
celltype/tissue context of imputation, if not given, will use nearest neighbor to determine. |
sd_max |
maximum standard deviation in imputation confidence |
imputed data, vector or matrix
betas = openSesame(sesameDataGet("EPIC.1.SigDF")) sum(is.na(betas)) betas2 = imputeBetas(betas, "EPIC") sum(is.na(betas2))
betas = openSesame(sesameDataGet("EPIC.1.SigDF")) sum(is.na(betas)) betas2 = imputeBetas(betas, "EPIC") sum(is.na(betas2))
Impute missing data based on genomic neighbors.
imputeBetasByGenomicNeighbors( betas, platform = NULL, BPPARAM = SerialParam(), max_neighbors = 3, max_dist = 10000 )
imputeBetasByGenomicNeighbors( betas, platform = NULL, BPPARAM = SerialParam(), max_neighbors = 3, max_dist = 10000 )
betas |
named vector of beta values |
platform |
platform |
BPPARAM |
use MulticoreParam(n) for parallel processing |
max_neighbors |
maximum neighbors to use for dense regions |
max_dist |
maximum distance to count as neighbor |
imputed data, vector or matrix
betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) sum(is.na(betas)) betas2 = imputeBetasByGenomicNeighbors(betas, "EPICv2") sum(is.na(betas2))
betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) sum(is.na(betas)) betas2 = imputeBetasByGenomicNeighbors(betas, "EPICv2") sum(is.na(betas2))
Impute Missing Values with Mean This function replaces missing values (NA) in a matrix, default is row means.
imputeBetasMatrixByMean(mx, axis = 1)
imputeBetasMatrixByMean(mx, axis = 1)
mx |
A matrix |
axis |
A single integer. Use 1 to impute column means (default), and 2 to impute row means. |
A matrix with missing values imputed.
mx <- cbind(c(1, 2, NA, 4), c(NA, 2, 3, 4)) imputeBetasMatrixByMean(mx, axis = 1) imputeBetasMatrixByMean(mx, axis = 2)
mx <- cbind(c(1, 2, NA, 4), c(NA, 2, 3, 4)) imputeBetasMatrixByMean(mx, axis = 1) imputeBetasMatrixByMean(mx, axis = 2)
This function uses both the built-in rsprobes as well as the type I Color-Channel-Switching probes to infer ethnicity.
inferEthnicity(sdf, verbose = FALSE)
inferEthnicity(sdf, verbose = FALSE)
sdf |
a |
verbose |
print more messages |
s better be background subtracted and dyebias corrected for best accuracy
Please note: the betas should come from SigDF *without* channel inference.
string of ethnicity
sdf <- sesameDataGet('EPIC.1.SigDF') ## inferEthnicity(sdf)
sdf <- sesameDataGet('EPIC.1.SigDF') ## inferEthnicity(sdf)
IGG => Type-I green that is inferred to be green IRR => Type-I red that is inferred to be red
inferInfiniumIChannel( sdf, switch_failed = FALSE, mask_failed = FALSE, verbose = FALSE, summary = FALSE )
inferInfiniumIChannel( sdf, switch_failed = FALSE, mask_failed = FALSE, verbose = FALSE, summary = FALSE )
sdf |
a |
switch_failed |
whether to switch failed probes (default to FALSE) |
mask_failed |
whether to mask failed probes (default to FALSE) |
verbose |
whether to print correction summary |
summary |
return summarized numbers only. |
a SigDF
, or numerics if summary == TRUE
sdf <- sesameDataGet('EPIC.1.SigDF') inferInfiniumIChannel(sdf)
sdf <- sesameDataGet('EPIC.1.SigDF') inferInfiniumIChannel(sdf)
We established our sex calling based on the CpGs hypermethylated in inactive X (XiH), CpGs hypomethylated in inactive X (XiL).
inferSex(betas, platform = NULL)
inferSex(betas, platform = NULL)
betas |
DNA methylation beta |
platform |
EPICv2, EPIC, HM450, MM285, etc. |
Note genotype abnormalities such as Dnmt genotype, XXY male (Klinefelter's), 45,X female (Turner's) can confuse the model sometimes. This function works on a single sample.
Inferred sex of sample
## EPICv2 input betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) inferSex(betas) ## Not run: ## MM285 input betas = openSesame(sesameDataGet("MM285.1.SigDF")) inferSex(betas) ## EPIC input betas = openSesame(sesameDataGet('EPIC.1.SigDF')) inferSex(betas) ## HM450 input betas = openSesame(sesameDataGet("HM450.10.SigDF")[[1]]) inferSex(betas) ## End(Not run)
## EPICv2 input betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) inferSex(betas) ## Not run: ## MM285 input betas = openSesame(sesameDataGet("MM285.1.SigDF")) inferSex(betas) ## EPIC input betas = openSesame(sesameDataGet('EPIC.1.SigDF')) inferSex(betas) ## HM450 input betas = openSesame(sesameDataGet("HM450.10.SigDF")[[1]]) inferSex(betas) ## End(Not run)
We infer species based on probes pvalues and alignment score. AUC was calculated for each specie, y_true is 1 or 0 for pval < threshold.pos or pval > threshold.neg, respeceively,
inferSpecies( sdf, topN = 1000, threshold.pos = 0.01, threshold.neg = 0.1, return.auc = FALSE, return.species = FALSE, verbose = FALSE )
inferSpecies( sdf, topN = 1000, threshold.pos = 0.01, threshold.neg = 0.1, return.auc = FALSE, return.species = FALSE, verbose = FALSE )
sdf |
a |
topN |
Top n positive and negative probes used to infer species. increase this number can sometimes improve accuracy (DEFAULT: 1000) |
threshold.pos |
pvalue < threshold.pos are considered positive (default: 0.01). |
threshold.neg |
pvalue > threshold.neg are considered negative (default: 0.2). |
return.auc |
return AUC calculated, override return.species |
return.species |
return a string to represent species |
verbose |
print more messaeges |
a SigDF
sdf <- sesameDataGet("MM285.1.SigDF") sdf <- inferSpecies(sdf) ## all available species all_species <- names(sesameDataGet(sprintf( "%s.addressSpecies", sdfPlatform(sdf)))$species)
sdf <- sesameDataGet("MM285.1.SigDF") sdf <- inferSpecies(sdf) ## all available species all_species <- names(sesameDataGet(sprintf( "%s.addressSpecies", sdfPlatform(sdf)))$species)
Infer strain information for mouse array
inferStrain( sdf, return.strain = FALSE, return.probability = FALSE, return.pval = FALSE, min_frac_dt = 0.2, verbose = FALSE )
inferStrain( sdf, return.strain = FALSE, return.probability = FALSE, return.pval = FALSE, min_frac_dt = 0.2, verbose = FALSE )
sdf |
SigDF |
return.strain |
return strain name |
return.probability |
return probability vector for all strains |
return.pval |
return p-value |
min_frac_dt |
minimum fraction of detected signal (DEFAULT: 0.2) otherwise, we give up strain inference and return NA. |
verbose |
print more messages |
a list of best guess, p-value of the best guess and the probabilities of all strains
sesameDataCache() # if not done yet sdf <- sesameDataGet('MM285.1.SigDF') inferStrain(sdf, return.strain = TRUE) sdf.strain <- inferStrain(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('MM285.1.SigDF') inferStrain(sdf, return.strain = TRUE) sdf.strain <- inferStrain(sdf)
inferTissue infers the tissue of a single sample (as identified through the branchIDs in the row data of the reference) by reporting independent composition through cell type deconvolution.
inferTissue( betas, reference = NULL, platform = NULL, abs_delta_beta_min = 0.3, auc_min = 0.99, coverage_min = 0.8, topN = 15 )
inferTissue( betas, reference = NULL, platform = NULL, abs_delta_beta_min = 0.3, auc_min = 0.99, coverage_min = 0.8, topN = 15 )
betas |
Named vector with probes and their corresponding beta value measurement |
reference |
Summarized Experiment with either hypomethylated or hypermethylated probe selection (row data), sample selection (column data), meta data, and the betas (assay) |
platform |
String representing the array type of the betas and reference |
abs_delta_beta_min |
Numerical value indicating the absolute minimum required delta beta for the probe selection criteria |
auc_min |
Numeric value corresponding to the minimum AUC value required for a probe to be considered |
coverage_min |
Numeric value corresponding to the minimum coverage requirement for a probe to be considered. Coverage is defined here as the proportion of samples without an NA value at a given probe. |
topN |
number of probes to at most use for each branch |
inferred tissue as a string
sesameDataCache() # if not done yet sdf <- sesameDataGet("MM285.1.SigDF") inferTissue(getBetas(dyeBiasNL(noob(sdf)))) sesameDataGet_resetEnv()
sesameDataCache() # if not done yet sdf <- sesameDataGet("MM285.1.SigDF") inferTissue(getBetas(dyeBiasNL(noob(sdf)))) sesameDataGet_resetEnv()
initialize a fileSet class by allocating appropriate storage
initFileSet(map_path, platform, samples, probes = NULL, inc = 4)
initFileSet(map_path, platform, samples, probes = NULL, inc = 4)
map_path |
path of file to map |
platform |
EPIC, HM450 or HM27, consistent with sdfPlatform(sdf) |
samples |
sample names |
probes |
probe names |
inc |
bytes per unit data storage |
a sesame::fileSet object
fset <- initFileSet('mybetas2', 'HM27', c('s1','s2'))
fset <- initFileSet('mybetas2', 'HM27', c('s1','s2'))
see sesameData_annoProbes if you'd like to annotate by genomic coordinates (in GRanges)
KYCG_annoProbes( query, databases, db_names = NULL, platform = NULL, sep = ",", indicator = FALSE, silent = FALSE )
KYCG_annoProbes( query, databases, db_names = NULL, platform = NULL, sep = ",", indicator = FALSE, silent = FALSE )
query |
probe IDs in a character vector |
databases |
character or actual database (i.e. list of probe IDs) |
db_names |
specific database (default to all databases) |
platform |
EPIC, MM285 etc. will infer from probe IDs if not given |
sep |
delimiter used in paste |
indicator |
return the indicator matrix instead of a concatenated annotation (in the case of have multiple annotations) |
silent |
suppress message |
named annotation vector, or indicator matrix
query <- names(sesameData_getManifestGRanges("MM285")) anno <- KYCG_annoProbes(query, "designGroup", silent = TRUE)
query <- names(sesameData_getManifestGRanges("MM285")) anno <- KYCG_annoProbes(query, "designGroup", silent = TRUE)
build gene-probe association database
KYCG_buildGeneDBs( query = NULL, platform = NULL, genome = NULL, max_distance = 10000, silent = FALSE )
KYCG_buildGeneDBs( query = NULL, platform = NULL, genome = NULL, max_distance = 10000, silent = FALSE )
query |
the query probe list. If NULL, use all the probes on the platform |
platform |
HM450, EPIC, MM285, Mammal40, will infer from query if not given |
genome |
hg38, mm10, ..., will infer if not given. For additional mapping, download the GRanges object from http://zwdzwd.github.io/InfiniumAnnotation and provide the following argument ..., genome = sesameAnno_buildManifestGRanges("downloaded_file"),... to this function. |
max_distance |
probe-gene distance for association |
silent |
suppress messages |
gene databases
query <- c("cg04707299", "cg13380562", "cg00480749") dbs <- KYCG_buildGeneDBs(query, platform = "EPIC") testEnrichment(query, dbs, platform = "EPIC")
query <- c("cg04707299", "cg13380562", "cg00480749") dbs <- KYCG_buildGeneDBs(query, platform = "EPIC") testEnrichment(query, dbs, platform = "EPIC")
Get databases by full or partial names of the database group(s)
KYCG_getDBs( group_nms, db_names = NULL, platform = NULL, summary = FALSE, allow_multi = FALSE, ignore.case = FALSE, type = NULL, silent = FALSE )
KYCG_getDBs( group_nms, db_names = NULL, platform = NULL, summary = FALSE, allow_multi = FALSE, ignore.case = FALSE, type = NULL, silent = FALSE )
group_nms |
database group names |
db_names |
name of the database, fetech only the given databases |
platform |
EPIC, HM450, MM285, ... If given, will restrict to that platform. |
summary |
return a summary of database instead of db itself |
allow_multi |
allow multiple groups to be returned for |
ignore.case |
ignore case or not |
type |
numerical, categorical, default: all |
silent |
no messages each query. |
a list of databases, return NULL if no database is found
dbs <- KYCG_getDBs("MM285.chromHMM") dbs <- KYCG_getDBs(c("MM285.chromHMM", "MM285.probeType"))
dbs <- KYCG_getDBs("MM285.chromHMM") dbs <- KYCG_getDBs(c("MM285.chromHMM", "MM285.probeType"))
List database group names
KYCG_listDBGroups(filter = NULL, path = NULL, type = NULL)
KYCG_listDBGroups(filter = NULL, path = NULL, type = NULL)
filter |
keywords for filtering |
path |
file path to downloaded knowledgebase sets |
type |
categorical, numerical (default: all) |
a list of db group names
head(KYCG_listDBGroups("chromHMM")) ## or KYCG_listDBGroups(path = "~/Downloads")
head(KYCG_listDBGroups("chromHMM")) ## or KYCG_listDBGroups(path = "~/Downloads")
Load database groups
KYCG_loadDBs(in_paths, group_use_filename = FALSE)
KYCG_loadDBs(in_paths, group_use_filename = FALSE)
in_paths |
folder that contains all databases |
group_use_filename |
whether to use file name for groups |
a list of db group names
## download regulatory annotations from ## http://zwdzwd.github.io/InfiniumAnnotation ## unzip the file if (FALSE) { dbs <- KYCG_loadDBs(path_to_unzipped_folder) }
## download regulatory annotations from ## http://zwdzwd.github.io/InfiniumAnnotation ## unzip the file if (FALSE) { dbs <- KYCG_loadDBs(path_to_unzipped_folder) }
The input data frame should have an "estimate" and a "FDR" columns.
KYCG_plotBar(df, y = "-log10(FDR)", n = 20, order_by = "FDR", label = FALSE)
KYCG_plotBar(df, y = "-log10(FDR)", n = 20, order_by = "FDR", label = FALSE)
df |
KYCG result data frame |
y |
the column to be plotted on y-axis |
n |
number of CG groups to plot |
order_by |
the column by which CG groups are ordered |
label |
whether to label significant bars |
Top CG groups are determined by estimate (descending order).
grid plot object
KYCG_plotBar(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=10, overlap=as.integer(runif(10,0,30)), group="g", dbname=seq_len(10)))
KYCG_plotBar(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=10, overlap=as.integer(runif(10,0,30)), group="g", dbname=seq_len(10)))
The input data frame should have an "estimate" and a "FDR" columns.
KYCG_plotDot( df, y = "-log10(FDR)", n = 20, order_by = "FDR", title = "Enriched Databases", label_by = "dbname", size_by = "overlap", color_by = "estimate", short_label = FALSE )
KYCG_plotDot( df, y = "-log10(FDR)", n = 20, order_by = "FDR", title = "Enriched Databases", label_by = "dbname", size_by = "overlap", color_by = "estimate", short_label = FALSE )
df |
KYCG result data frame |
y |
the column to be plotted on y-axis |
n |
number of CG groups to plot |
order_by |
the column by which CG groups are ordered |
title |
plot title |
label_by |
the column for label |
size_by |
the column by which CG group size plot |
color_by |
the column by which CG groups are colored |
short_label |
omit group in label |
Top CG groups are determined by estimate (descending order).
grid plot object (by ggplot)
KYCG_plotDot(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=runif(10,10,20), overlap=as.integer(runif(10,0,30)), group="g", dbname=seq_len(10)))
KYCG_plotDot(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=runif(10,10,20), overlap=as.integer(runif(10,0,30)), group="g", dbname=seq_len(10)))
plot enrichment test result
KYCG_plotEnrichAll( df, fdr_max = 25, n_label = 15, min_estimate = 0, short_label = TRUE )
KYCG_plotEnrichAll( df, fdr_max = 25, n_label = 15, min_estimate = 0, short_label = TRUE )
df |
test enrichment result data frame |
fdr_max |
maximum fdr for capping |
n_label |
number of database to label |
min_estimate |
minimum estimate |
short_label |
use short label |
grid object
query <- KYCG_getDBs("MM285.designGroup")[["PGCMeth"]] res <- testEnrichment(query, platform="MM285") KYCG_plotEnrichAll(res)
query <- KYCG_getDBs("MM285.designGroup")[["PGCMeth"]] res <- testEnrichment(query, platform="MM285") KYCG_plotEnrichAll(res)
creates a lollipop plot of log(estimate) given data with fields estimate.
KYCG_plotLollipop(df, label_column = "dbname", n = 20)
KYCG_plotLollipop(df, label_column = "dbname", n = 20)
df |
DataFrame where each row is a database name with its estimate. |
label_column |
column in df to be used as the label (default: dbname) |
n |
Integer representing the number of top enrichments to report. Optional. (Default: 10) |
ggplot lollipop plot
KYCG_plotLollipop(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=runif(10,10,20), overlap=as.integer(runif(10,0,30)), group="g", dbname=as.character(seq_len(10))))
KYCG_plotLollipop(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=runif(10,10,20), overlap=as.integer(runif(10,0,30)), group="g", dbname=as.character(seq_len(10))))
KYCG_plotManhattan makes a manhattan plot to summarize EWAS results
KYCG_plotManhattan( vals, platform = NULL, genome = NULL, title = NULL, label_min = 100, col = c("wheat1", "sienna3"), ylabel = "Value" )
KYCG_plotManhattan( vals, platform = NULL, genome = NULL, title = NULL, label_min = 100, col = c("wheat1", "sienna3"), ylabel = "Value" )
vals |
named vector of values (P,Q etc), vector name is Probe ID. |
platform |
String corresponding to the type of platform to use for retrieving GRanges coordinates of probes. Either MM285, EPIC, HM450, or HM27. If it is not provided, it will be inferred from the query set probeIDs (Default: NA). |
genome |
hg38, mm10, ..., will infer if not given. For additional mapping, download the GRanges object from http://zwdzwd.github.io/InfiniumAnnotation and provide the following argument ..., genome = sesameAnno_buildManifestGRanges("downloaded_file"),... to this function. |
title |
title for plot |
label_min |
Threshold above which data points will be labelled with Probe ID |
col |
color |
ylabel |
y-axis label |
a ggplot object
## see vignette for examples sesameDataGet_resetEnv()
## see vignette for examples sesameDataGet_resetEnv()
Plot meta gene or other meta genomic features
KYCG_plotMeta(betas, platform = NULL)
KYCG_plotMeta(betas, platform = NULL)
betas |
a named numeric vector or a matrix (row: probes; column: samples) |
platform |
if not given and x is a SigDF, will be inferred the meta features |
a grid plot object
sdf <- sesameDataGet("EPIC.1.SigDF") KYCG_plotMeta(getBetas(sdf))
sdf <- sesameDataGet("EPIC.1.SigDF") KYCG_plotMeta(getBetas(sdf))
Plot meta gene or other meta genomic features
KYCG_plotMetaEnrichment(result_list)
KYCG_plotMetaEnrichment(result_list)
result_list |
one or a list of testEnrichment |
a grid plot object
cg_lists <- KYCG_getDBs("MM285.TFBS") queries <- cg_lists[(sapply(cg_lists, length) > 40000)] result_list <- lapply(queries, testEnrichment, "MM285.metagene", silent=TRUE, platform="MM285") KYCG_plotMetaEnrichment(result_list)
cg_lists <- KYCG_getDBs("MM285.TFBS") queries <- cg_lists[(sapply(cg_lists, length) > 40000)] result_list <- lapply(queries, testEnrichment, "MM285.metagene", silent=TRUE, platform="MM285") KYCG_plotMetaEnrichment(result_list)
Plot point range for a list of enrichment testing results against the same set of databases
KYCG_plotPointRange(result_list)
KYCG_plotPointRange(result_list)
result_list |
a list of testEnrichment resultsx |
grid plot object
## pick some big TFBS-overlapping CpG groups cg_lists <- KYCG_getDBs("MM285.TFBS") queries <- cg_lists[(sapply(cg_lists, length) > 40000)] result_list <- lapply(queries, testEnrichment, "MM285.chromHMM", platform="MM285") KYCG_plotPointRange(result_list)
## pick some big TFBS-overlapping CpG groups cg_lists <- KYCG_getDBs("MM285.TFBS") queries <- cg_lists[(sapply(cg_lists, length) > 40000)] result_list <- lapply(queries, testEnrichment, "MM285.chromHMM", platform="MM285") KYCG_plotPointRange(result_list)
Plot Set Enrichment
KYCG_plotSetEnrichment(result, n_sample = 1000, n_presence = 200)
KYCG_plotSetEnrichment(result, n_sample = 1000, n_presence = 200)
result |
result object as returned from an element of the list of testEnrichmentSEA(..., prepPlot=TRUE) |
n_sample |
number of CpGs to sample |
n_presence |
number of overlap to sample for the plot |
grid object for plot
query <- KYCG_getDBs("KYCG.MM285.designGroup")[["VMR"]] db <- KYCG_getDBs("MM285.seqContextN", "distToTSS") res <- testEnrichmentSEA(query, db, prepPlot = TRUE) KYCG_plotSetEnrichment(res[[1]])
query <- KYCG_getDBs("KYCG.MM285.designGroup")[["VMR"]] db <- KYCG_getDBs("MM285.seqContextN", "distToTSS") res <- testEnrichmentSEA(query, db, prepPlot = TRUE) KYCG_plotSetEnrichment(res[[1]])
creates a volcano plot of -log2(p.value) and log(estimate) given data with fields estimate and p.value.
KYCG_plotVolcano(df, label_by = "dbname", alpha = 0.05)
KYCG_plotVolcano(df, label_by = "dbname", alpha = 0.05)
df |
DataFrame where each field is a database name with two fields for the estimate and p.value. |
label_by |
column in df to be used as the label (default: dbname) |
alpha |
Float representing the cut-off alpha value for the plot. Optional. (Default: 0.05) |
ggplot volcano plot
KYCG_plotVolcano(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=runif(10,10,20), overlap=as.integer(runif(10,0,30)), group="g", dbname=seq_len(10)))
KYCG_plotVolcano(data.frame( estimate=runif(10,0,10), FDR=runif(10,0,1), nD=runif(10,10,20), overlap=as.integer(runif(10,0,30)), group="g", dbname=seq_len(10)))
create a waterfall plot of log(estimate) given test enrichment
KYCG_plotWaterfall( df, order_by = "Log2(OR)", size_by = "-log10(FDR)", label_by = "dbname", n_label = 10 )
KYCG_plotWaterfall( df, order_by = "Log2(OR)", size_by = "-log10(FDR)", label_by = "dbname", n_label = 10 )
df |
data frame where each row is a database with test enrichment result |
order_by |
the column by which CG groups are ordered |
size_by |
the column by which CG group size plot |
label_by |
column in df to be used as the label (default: dbname) |
n_label |
number of datapoints to label |
grid
library(SummarizedExperiment) df <- rowData(sesameDataGet('MM285.tissueSignature')) query <- df$Probe_ID[df$branch == "fetal_brain" & df$type == "Hypo"] results <- testEnrichment(query, "TFBS", platform="MM285") KYCG_plotWaterfall(results)
library(SummarizedExperiment) df <- rowData(sesameDataGet('MM285.tissueSignature')) query <- df$Probe_ID[df$branch == "fetal_brain" & df$type == "Hypo"] results <- testEnrichment(query, "TFBS", platform="MM285") KYCG_plotWaterfall(results)
liftOver, see mLiftOver (renamed)
liftOver(...)
liftOver(...)
... |
see mLiftOver |
imputed data, vector, matrix, SigDF(s)
list existing quality masks for a SigDF
listAvailableMasks(platform, verbose = FALSE)
listAvailableMasks(platform, verbose = FALSE)
platform |
EPIC, MM285, HM450 etc |
verbose |
print more messages |
a tibble of masks
listAvailableMasks("EPICv2")
listAvailableMasks("EPICv2")
Deposit data of one sample to a fileSet (and hence to file)
mapFileSet(fset, sample, named_values)
mapFileSet(fset, sample, named_values)
fset |
a sesame::fileSet, as obtained via readFileSet |
sample |
sample name as a string |
named_values |
value vector named by probes |
a sesame::fileSet
## create two samples fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) ## a hypothetical numeric array (can be beta values, intensities etc) hypothetical <- setNames(runif(fset$n), fset$probes) ## map the numeric to file mapFileSet(fset, 's1', hypothetical) ## get data sliceFileSet(fset, 's1', 'cg00000292')
## create two samples fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) ## a hypothetical numeric array (can be beta values, intensities etc) hypothetical <- setNames(runif(fset$n), fset$probes) ## map the numeric to file mapFileSet(fset, 's1', hypothetical) ## get data sliceFileSet(fset, 's1', 'cg00000292')
Map the SDF (from overlap array platforms) Replicates are merged by picking the best detection
mapToMammal40(sdf)
mapToMammal40(sdf)
sdf |
a |
a named numeric vector for beta values
sdf <- sesameDataGet("Mammal40.1.SigDF") betas <- mapToMammal40(sdf[1:10,])
sdf <- sesameDataGet("Mammal40.1.SigDF") betas <- mapToMammal40(sdf[1:10,])
This is designed to counter tail inflation in Infinium I probes.
matchDesign(sdf, min_dbeta = 0.3)
matchDesign(sdf, min_dbeta = 0.3)
sdf |
SigDF |
min_dbeta |
the default algorithm perform 2-state quantile-normalization of the unmethylated and methylated modes separately. However, when the two modes are too close, we fall back to a one-mode normalization. The threshold defines the maximum inter-mode distance. |
SigDF
library(RPMM) sdf <- sesameDataGet("MM285.1.SigDF") sesameQC_plotBetaByDesign(sdf) sesameQC_plotBetaByDesign(matchDesign(sdf))
library(RPMM) sdf <- sesameDataGet("MM285.1.SigDF") sesameQC_plotBetaByDesign(sdf) sesameQC_plotBetaByDesign(matchDesign(sdf))
The function takes one single SigDF
and computes mean
intensity of all the in-band measurements. This includes all Type-I
in-band measurements and all Type-II probe measurements. Both methylated
and unmethylated alleles are considered. This function outputs a single
numeric for the mean.
meanIntensity(sdf, mask = TRUE)
meanIntensity(sdf, mask = TRUE)
sdf |
a |
mask |
whether to mask probes using mask column |
Note: mean in this case is more informative than median because methylation level is mostly bimodal.
mean of all intensities
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') meanIntensity(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') meanIntensity(sdf)
The function takes one single SigDF
and computes median
intensity of M+U for each probe. This function outputs a single
numeric for the median.
medianTotalIntensity(sdf, mask = TRUE)
medianTotalIntensity(sdf, mask = TRUE)
sdf |
a |
mask |
whether to mask probes using mask column |
median of all intensities
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') medianTotalIntensity(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') medianTotalIntensity(sdf)
Lift over beta values or SigDFs to another Infinium platform This function wraps ID conversion and provide optional imputation functionality.
mLiftOver( x, target_platform, source_platform = NULL, BPPARAM = SerialParam(), mapping = NULL, impute = FALSE, sd_max = 999, celltype = "Blood", ... )
mLiftOver( x, target_platform, source_platform = NULL, BPPARAM = SerialParam(), mapping = NULL, impute = FALSE, sd_max = 999, celltype = "Blood", ... )
x |
either named beta value (vector or matrix), probe IDs or SigDF(s) if input is a matrix, probe IDs should be in the row names if input is a numeric vector, probe IDs should be in the vector names. If input is a character vector, the input will be considered probe IDs. |
target_platform |
the platform to take the data to |
source_platform |
optional information of the source data platform (when there might be ambiguity). |
BPPARAM |
use MulticoreParam(n) for parallel processing |
mapping |
a liftOver mapping file. Typically this file contains empirical evidence whether a probe mapping is reliable. If given, probe ID-based mapping will be skipped. This is to perform more stringent probe ID mapping. |
impute |
whether to impute or not, default is FALSE |
sd_max |
the maximum standard deviation for filtering low confidence imputation. |
celltype |
the cell type / tissue context of imputation, if not given, will use nearest neighbor to find out. |
... |
extra arguments, see ?convertProbeID |
imputed data, vector, matrix, SigDF(s)
## Not run: sesameDataCache() ## lift SigDF sdf = sesameDataGet("EPICv2.8.SigDF")[["GM12878_206909630042_R08C01"]] dim(mLiftOver(sdf, "EPICv2")) dim(mLiftOver(sdf, "EPIC")) dim(mLiftOver(sdf, "HM450")) sdfs = sesameDataGet("EPICv2.8.SigDF")[1:2] sdfs_hm450 = mLiftOver(sdfs, "HM450") ## parallel processing sdfs_hm450 = mLiftOver(sdfs, "HM450", BPPARAM=BiocParallel::MulticoreParam(2)) sdf = sesameDataGet("EPIC.5.SigDF.normal")[[1]] dim(mLiftOver(sdf, "EPICv2")) dim(mLiftOver(sdf, "EPIC")) dim(mLiftOver(sdf, "HM450")) sdf = sesameDataGet("HM450.10.SigDF")[[1]] dim(mLiftOver(sdf, "EPICv2")) dim(mLiftOver(sdf, "EPIC")) dim(mLiftOver(sdf, "HM450")) ## lift beta values betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) betas_hm450 = mLiftOver(betas, "HM450", impute=TRUE) length(betas_hm450) sum(is.na(betas_hm450)) betas_hm450 <- mLiftOver(betas, "HM450", impute=FALSE) length(betas_hm450) sum(is.na(betas_hm450)) betas_epic1 <- mLiftOver(betas, "EPIC", impute=TRUE) length(betas_epic1) sum(is.na(betas_epic1)) betas_epic1 <- mLiftOver(betas, "EPIC", impute=FALSE) length(betas_epic1) sum(is.na(betas_epic1)) betas_matrix = openSesame(sesameDataGet("EPICv2.8.SigDF")[1:4]) dim(betas_matrix) betas_matrix_hm450 = mLiftOver(betas_matrix, "HM450", impute=T) dim(betas_matrix_hm450) ## parallel processing betas_matrix_hm450 = mLiftOver(betas_matrix, "HM450", impute=T, BPPARAM=BiocParallel::MulticoreParam(4)) ## use empirical evidence in mLiftOver mapping = sesameDataGet("liftOver.EPICv2ToEPIC") betas_matrix = openSesame(sesameDataGet("EPICv2.8.SigDF")[1:4]) dim(mLiftOver(betas_matrix, "EPIC", mapping = mapping)) ## compare to without using empirical evidence dim(mLiftOver(betas_matrix, "EPIC")) betas <- c("cg04707299"=0.2, "cg13380562"=0.9, "cg00000103"=0.1) head(mLiftOver(betas, "HM450", impute=TRUE)) betas <- c("cg00004963_TC21"=0, "cg00004963_TC22"=0.5, "cg00004747_TC21"=1.0) betas_hm450 <- mLiftOver(betas, "HM450", impute=TRUE) head(na.omit(mLiftOver(betas, "HM450", impute=FALSE))) ## lift probe IDs cg_epic2 = names(sesameData_getManifestGRanges("EPICv2")) head(mLiftOver(cg_epic2, "HM450")) cg_epic2 = grep("cg", names(sesameData_getManifestGRanges("EPICv2")), value=T) head(mLiftOver(cg_epic2, "HM450")) cg_hm450 = grep("cg", names(sesameData_getManifestGRanges("HM450")), value=T) head(mLiftOver(cg_hm450, "EPICv2")) rs_epic2 = grep("rs", names(sesameData_getManifestGRanges("EPICv2")), value=T) head(mLiftOver(rs_epic2, "HM450", source_platform="EPICv2")) probes_epic2 = names(sesameData_getManifestGRanges("EPICv2")) head(mLiftOver(probes_epic2, "EPIC")) head(mLiftOver(probes_epic2, "EPIC", target_uniq = TRUE)) head(mLiftOver(probes_epic2, "EPIC", include_new = FALSE)) head(mLiftOver(probes_epic2, "EPIC", include_old = FALSE)) head(mLiftOver(probes_epic2, "EPIC", return_mapping=TRUE)) ## End(Not run)
## Not run: sesameDataCache() ## lift SigDF sdf = sesameDataGet("EPICv2.8.SigDF")[["GM12878_206909630042_R08C01"]] dim(mLiftOver(sdf, "EPICv2")) dim(mLiftOver(sdf, "EPIC")) dim(mLiftOver(sdf, "HM450")) sdfs = sesameDataGet("EPICv2.8.SigDF")[1:2] sdfs_hm450 = mLiftOver(sdfs, "HM450") ## parallel processing sdfs_hm450 = mLiftOver(sdfs, "HM450", BPPARAM=BiocParallel::MulticoreParam(2)) sdf = sesameDataGet("EPIC.5.SigDF.normal")[[1]] dim(mLiftOver(sdf, "EPICv2")) dim(mLiftOver(sdf, "EPIC")) dim(mLiftOver(sdf, "HM450")) sdf = sesameDataGet("HM450.10.SigDF")[[1]] dim(mLiftOver(sdf, "EPICv2")) dim(mLiftOver(sdf, "EPIC")) dim(mLiftOver(sdf, "HM450")) ## lift beta values betas = openSesame(sesameDataGet("EPICv2.8.SigDF")[[1]]) betas_hm450 = mLiftOver(betas, "HM450", impute=TRUE) length(betas_hm450) sum(is.na(betas_hm450)) betas_hm450 <- mLiftOver(betas, "HM450", impute=FALSE) length(betas_hm450) sum(is.na(betas_hm450)) betas_epic1 <- mLiftOver(betas, "EPIC", impute=TRUE) length(betas_epic1) sum(is.na(betas_epic1)) betas_epic1 <- mLiftOver(betas, "EPIC", impute=FALSE) length(betas_epic1) sum(is.na(betas_epic1)) betas_matrix = openSesame(sesameDataGet("EPICv2.8.SigDF")[1:4]) dim(betas_matrix) betas_matrix_hm450 = mLiftOver(betas_matrix, "HM450", impute=T) dim(betas_matrix_hm450) ## parallel processing betas_matrix_hm450 = mLiftOver(betas_matrix, "HM450", impute=T, BPPARAM=BiocParallel::MulticoreParam(4)) ## use empirical evidence in mLiftOver mapping = sesameDataGet("liftOver.EPICv2ToEPIC") betas_matrix = openSesame(sesameDataGet("EPICv2.8.SigDF")[1:4]) dim(mLiftOver(betas_matrix, "EPIC", mapping = mapping)) ## compare to without using empirical evidence dim(mLiftOver(betas_matrix, "EPIC")) betas <- c("cg04707299"=0.2, "cg13380562"=0.9, "cg00000103"=0.1) head(mLiftOver(betas, "HM450", impute=TRUE)) betas <- c("cg00004963_TC21"=0, "cg00004963_TC22"=0.5, "cg00004747_TC21"=1.0) betas_hm450 <- mLiftOver(betas, "HM450", impute=TRUE) head(na.omit(mLiftOver(betas, "HM450", impute=FALSE))) ## lift probe IDs cg_epic2 = names(sesameData_getManifestGRanges("EPICv2")) head(mLiftOver(cg_epic2, "HM450")) cg_epic2 = grep("cg", names(sesameData_getManifestGRanges("EPICv2")), value=T) head(mLiftOver(cg_epic2, "HM450")) cg_hm450 = grep("cg", names(sesameData_getManifestGRanges("HM450")), value=T) head(mLiftOver(cg_hm450, "EPICv2")) rs_epic2 = grep("rs", names(sesameData_getManifestGRanges("EPICv2")), value=T) head(mLiftOver(rs_epic2, "HM450", source_platform="EPICv2")) probes_epic2 = names(sesameData_getManifestGRanges("EPICv2")) head(mLiftOver(probes_epic2, "EPIC")) head(mLiftOver(probes_epic2, "EPIC", target_uniq = TRUE)) head(mLiftOver(probes_epic2, "EPIC", include_new = FALSE)) head(mLiftOver(probes_epic2, "EPIC", include_old = FALSE)) head(mLiftOver(probes_epic2, "EPIC", return_mapping=TRUE)) ## End(Not run)
Convert M-value to beta-value (aka inverse logit transform)
MValueToBetaValue(m)
MValueToBetaValue(m)
m |
a vector of M values |
a vector of beta values
MValueToBetaValue(c(-3, 0, 3))
MValueToBetaValue(c(-3, 0, 3))
get negative control signal
negControls(sdf)
negControls(sdf)
sdf |
a SigDF |
a data frame of negative control signals
remove masked probes from SigDF
noMasked(sdf)
noMasked(sdf)
sdf |
input SigDF object |
a SigDF object without masked probes
sesameDataCache() sdf <- sesameDataGet("EPIC.1.SigDF") sdf <- pOOBAH(sdf) sdf_noMasked <- noMasked(sdf)
sesameDataCache() sdf <- sesameDataGet("EPIC.1.SigDF") sdf <- pOOBAH(sdf) sdf_noMasked <- noMasked(sdf)
The function takes a SigDF
and returns a modified SigDF
with background subtracted. Background was modelled in a normal distribution
and true signal in an exponential distribution. The Norm-Exp deconvolution
is parameterized using Out-Of-Band (oob) probes. For species-specific
processing, one should call inferSpecies on SigDF first. Multi-mapping
probes are excluded.
noob(sdf, combine.neg = TRUE, offset = 15)
noob(sdf, combine.neg = TRUE, offset = 15)
sdf |
a |
combine.neg |
whether to combine negative control probe. |
offset |
offset |
When combine.neg = TRUE, background will be parameterized by both negative control and out-of-band probes.
a new SigDF
with noob background correction
sdf <- sesameDataGet('EPIC.1.SigDF') sdf.nb <- noob(sdf)
sdf <- sesameDataGet('EPIC.1.SigDF') sdf.nb <- noob(sdf)
get normalization control signal from SigDF. The function optionally takes mean for each channel.
normControls(sdf, average = FALSE, verbose = FALSE)
normControls(sdf, average = FALSE, verbose = FALSE)
sdf |
a SigDF |
average |
whether to average |
verbose |
print more messages |
a data frame of normalization control signals
This function is a simple wrapper of noob + nonlinear dye bias correction + pOOBAH masking.
openSesame( x, prep = "QCDPB", prep_args = NULL, manifest = NULL, func = getBetas, BPPARAM = SerialParam(), platform = "", min_beads = 1, ... )
openSesame( x, prep = "QCDPB", prep_args = NULL, manifest = NULL, func = getBetas, BPPARAM = SerialParam(), platform = "", min_beads = 1, ... )
x |
SigDF(s), IDAT prefix(es) |
prep |
preprocessing code, see ?prepSesame |
prep_args |
optional preprocessing argument list, see ?prepSesame |
manifest |
optional dynamic manifest |
func |
either getBetas or getAFs, if NULL, then return SigDF list |
BPPARAM |
get parallel with MulticoreParam(n) |
platform |
optional platform string |
min_beads |
minimum bead number, probes with R or G smaller than this threshold will be masked. If NULL, no filtering based on bead count will be applied. Default to 1. |
... |
parameters to getBetas |
Please use mask=FALSE to turn off masking.
If the input is an IDAT prefix or a SigDF
, the output is
the beta value numerics.
a numeric vector for processed beta values
in_dir <- system.file("extdata", "", package = "sesameData") betas <- openSesame(in_dir) ## or IDATprefixes <- searchIDATprefixes(in_dir) betas <- openSesame(IDATprefixes)
in_dir <- system.file("extdata", "", package = "sesameData") betas <- openSesame(in_dir) ## or IDATprefixes <- searchIDATprefixes(in_dir) betas <- openSesame(IDATprefixes)
openSesame pipeline with file-backed storage
openSesameToFile(map_path, idat_dir, BPPARAM = SerialParam(), inc = 4)
openSesameToFile(map_path, idat_dir, BPPARAM = SerialParam(), inc = 4)
map_path |
path of file to be mapped (beta values file) |
idat_dir |
source IDAT directory |
BPPARAM |
get parallel with MulticoreParam(2) |
inc |
bytes per item data storage. increase to 8 if precision is important. Most cases 32-bit representation is enough. |
a sesame::fileSet
openSesameToFile('mybetas', system.file('extdata',package='sesameData'))
openSesameToFile('mybetas', system.file('extdata',package='sesameData'))
Generate some additional color palettes
palgen(pal, n = 150, space = "Lab")
palgen(pal, n = 150, space = "Lab")
pal |
a string for adhoc pals |
n |
the number of colors for interpolation |
space |
rgb or Lab |
a palette-generating function
library(pals) pal.bands(palgen("whiteturbo"))
library(pals) pal.bands(palgen("whiteturbo"))
This overcomes the issue of missing IDAT files. However, out-of-band signals will be missing or faked (sampled from a normal distribution).
parseGEOsignalMU( sigM, sigU, Probe_IDs, oob.mean = 500, oob.sd = 300, platform = NULL )
parseGEOsignalMU( sigM, sigU, Probe_IDs, oob.mean = 500, oob.sd = 300, platform = NULL )
sigM |
methylated signal, a numeric vector |
sigU |
unmethylated signal, a numirc vector |
Probe_IDs |
probe ID vector |
oob.mean |
assumed mean for out-of-band signals |
oob.sd |
assumed standard deviation for out-of-band signals |
platform |
platform code, will infer if not given |
SigDF
sigM <- c(11436, 6068, 2864) sigU <- c(1476, 804, 393) probes <- c("cg07881041", "cg23229610", "cg03513874") sdf <- parseGEOsignalMU(sigM, sigU, probes, platform = "EPIC")
sigM <- c(11436, 6068, 2864) sigU <- c(1476, 804, 393) probes <- c("cg07881041", "cg23229610", "cg03513874") sdf <- parseGEOsignalMU(sigM, sigU, probes, platform = "EPIC")
aka pOOBAH (p-vals by Out-Of-Band Array Hybridization)
pOOBAH( sdf, return.pval = FALSE, combine.neg = TRUE, pval.threshold = 0.05, verbose = FALSE )
pOOBAH( sdf, return.pval = FALSE, combine.neg = TRUE, pval.threshold = 0.05, verbose = FALSE )
sdf |
a |
return.pval |
whether to return p-values, instead of a
masked |
combine.neg |
whether to combine negative control probes with the out-of-band probes in simulating the signal background |
pval.threshold |
minimum p-value to mask |
verbose |
print more messages |
The function takes a SigDF
as input, computes detection p-value
using out-of-band probes empirical distribution and returns a new
SigDF
with an updated mask slot.
a SigDF
, or a p-value vector if return.pval is TRUE
sdf <- sesameDataGet("EPIC.1.SigDF") sum(sdf$mask) sum(pOOBAH(sdf)$mask)
sdf <- sesameDataGet("EPIC.1.SigDF") sum(sdf$mask) sum(pOOBAH(sdf)$mask)
The function takes a named numeric vector of beta values. The name attribute contains the probe ID (cg, ch or rs IDs). The function looks for overlapping probes and estimate age using different models.
predictAge(betas, model, na_fallback = FALSE, min_nonna = 10)
predictAge(betas, model, na_fallback = FALSE, min_nonna = 10)
betas |
a probeID-named vector of beta values |
model |
a model object from sesameDataGet. should contain param, intercept, response2age. default to the Horvath353 model. |
na_fallback |
use fall back values if na |
min_nonna |
the minimum number of non-NA values. |
You can get the models such as the Horvath aging model (Horvath 2013 Genome Biology) from sesameDataGet. The function outputs a single numeric of age in years.
Here are some built-in age models: Anno/HM450/Clock_Horvath353.rds Anno/HM450/Clock_Hannum.rds Anno/HM450/Clock_SkinBlood.rds Anno/EPIC/Clock_PhenoAge.rds Anno/MM285/Clock_Zhou347.rds see vignette inferences.html#Age__Epigenetic_Clock for details
age in the unit specified in the model (usually in year, but sometimes can be month, like in the mouse clocks).
betas <- sesameDataGet('HM450.1.TCGA.PAAD')$betas ## Not run: ## download age models from ## https://github.com/zhou-lab/InfiniumAnnotationV1/tree/main/Anno ## e.g., Anno/HM450/Clock_Horvath353.rds predictAge(betas, model) ## End(Not run)
betas <- sesameDataGet('HM450.1.TCGA.PAAD')$betas ## Not run: ## download age models from ## https://github.com/zhou-lab/InfiniumAnnotationV1/tree/main/Anno ## e.g., Anno/HM450/Clock_Horvath353.rds predictAge(betas, model) ## End(Not run)
The function takes a named numeric vector of beta values. The name attribute contains the probe ID (cg, ch or rs IDs). The function looks for overlapping probes and estimate age using Horvath aging model (Horvath 2013 Genome Biology). The function outputs a single numeric of age in years.
predictAgeHorvath353(betas)
predictAgeHorvath353(betas)
betas |
a probeID-named vector of beta values |
age in years
cat("Deprecated. See predictAge")
cat("Deprecated. See predictAge")
The function takes a named numeric vector of beta values. The name attribute contains the probe ID (cg, ch or rs IDs). The function looks for overlapping probes and estimate age using Horvath aging model (Horvath et al. 2018 Aging, 391 probes). The function outputs a single numeric of age in years.
predictAgeSkinBlood(betas)
predictAgeSkinBlood(betas)
betas |
a probeID-named vector of beta values |
age in years
cat("Deprecated. See predictAge")
cat("Deprecated. See predictAge")
The function takes a named numeric vector of beta values. The name attribute contains the probe ID. The function looks for overlapping probes and estimate age using an aging model built from 321 MM285 probes. The function outputs a single numeric of age in months. The clock is most accurate with the sesame preprocessing.
predictMouseAgeInMonth(betas, na_fallback = TRUE)
predictMouseAgeInMonth(betas, na_fallback = TRUE)
betas |
a probeID-named vector of beta values |
na_fallback |
use the fallback default for NAs. |
age in month
cat("Deprecated. See predictAge")
cat("Deprecated. See predictAge")
Mask SigDF by probe ID prefix
prefixMask(sdf, prefixes = NULL, invert = FALSE)
prefixMask(sdf, prefixes = NULL, invert = FALSE)
sdf |
SigDF |
prefixes |
prefix characters |
invert |
use the complement set |
SigDF
sdf <- resetMask(sesameDataGet("MM285.1.SigDF")) sum(prefixMask(sdf, c("ctl","rs"))$mask) sum(prefixMask(sdf, c("ctl"))$mask) sum(prefixMask(sdf, c("ctl","rs","ch"))$mask)
sdf <- resetMask(sesameDataGet("MM285.1.SigDF")) sum(prefixMask(sdf, c("ctl","rs"))$mask) sum(prefixMask(sdf, c("ctl"))$mask) sum(prefixMask(sdf, c("ctl","rs","ch"))$mask)
Mask all but C probes in SigDF
prefixMaskButC(sdf)
prefixMaskButC(sdf)
sdf |
SigDF |
SigDF
sdf <- resetMask(sesameDataGet("MM285.1.SigDF")) sum(prefixMaskButC(sdf)$mask)
sdf <- resetMask(sesameDataGet("MM285.1.SigDF")) sum(prefixMaskButC(sdf)$mask)
Mask all but CG probes in SigDF
prefixMaskButCG(sdf)
prefixMaskButCG(sdf)
sdf |
SigDF |
SigDF
sdf <- resetMask(sesameDataGet("MM285.1.SigDF")) sum(prefixMaskButCG(sdf)$mask)
sdf <- resetMask(sesameDataGet("MM285.1.SigDF")) sum(prefixMaskButCG(sdf)$mask)
Notes on the order of operation: 1. qualityMask and inferSpecies should go before noob and pOOBAH, otherwise the background is too high because of Multi, uk and other probes 2. dyeBias correction needs to happen early 3. channel inference before dyebias 4. noob should happen last, pOOBAH before noob because noob modifies oob
prepSesame(sdf, prep = "QCDPB", prep_args = NULL)
prepSesame(sdf, prep = "QCDPB", prep_args = NULL)
sdf |
SigDF |
prep |
code that indicates preprocessing functions and their execution order (functions on the left is executed first). |
prep_args |
optional argument list to individual functions, e.g., prepSesame(sdf, prep_args=list(Q=list(mask_names = "design_issue"))) sets qualityMask(sdf, mask_names = "design_issue") |
SigDF
sdf <- sesameDataGet("MM285.1.SigDF") sdf1 <- prepSesame(sdf, "QCDPB")
sdf <- sesameDataGet("MM285.1.SigDF") sdf1 <- prepSesame(sdf, "QCDPB")
List supported prepSesame functions
prepSesameList()
prepSesameList()
a data frame with code, func, description
prepSesameList()
prepSesameList()
Print DMLSummary object
## S3 method for class 'DMLSummary' print(x, ...)
## S3 method for class 'DMLSummary' print(x, ...)
x |
a DMLSummary object |
... |
extra parameter for print |
print DMLSummary result on screen
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') ## test the first 10 smry <- DML(data$betas[1:10,], ~type, meta=data$sampleInfo) smry sesameDataGet_resetEnv()
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') ## test the first 10 smry <- DML(data$betas[1:10,], ~type, meta=data$sampleInfo) smry sesameDataGet_resetEnv()
Print a fileSet
## S3 method for class 'fileSet' print(x, ...)
## S3 method for class 'fileSet' print(x, ...)
x |
a sesame::fileSet |
... |
stuff for print |
string representation
fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) fset
fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) fset
Extract the probe type field from probe ID This only works with the new probe ID system. See https://github.com/zhou-lab/InfiniumAnnotation for illustration
probeID_designType(Probe_ID)
probeID_designType(Probe_ID)
Probe_ID |
Probe ID |
a vector of '1' and '2' suggesting Infinium-I and Infinium-II
probeID_designType("cg36609548_TC21")
probeID_designType("cg36609548_TC21")
This function calculates the probe success rate using pOOBAH detection p-values. Probes that has a detection p-value higher than a specific threshold are considered failed probes.
probeSuccessRate(sdf, mask = TRUE, max_pval = 0.05)
probeSuccessRate(sdf, mask = TRUE, max_pval = 0.05)
sdf |
a |
mask |
whether or not we count the masked probes in SigDF |
max_pval |
the maximum p-value to consider detection success |
a fraction number as probe success rate
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') probeSuccessRate(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') probeSuccessRate(sdf)
Currently quality masking only supports three platforms see also listAvailableMasks(sdfPlatform(sdf))
qualityMask(sdf, mask_names = "recommended", verbose = TRUE)
qualityMask(sdf, mask_names = "recommended", verbose = TRUE)
sdf |
a |
mask_names |
a vector of masking groups, see listAvailableMasks use "recommended" for recommended masking. One can also combine "recommended" with other masking groups by specifying a vector, e.g., c("recommended", "M_mapping") |
verbose |
be verbose |
a filtered SigDF
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sum(qualityMask(sdf)$mask) sum(qualityMask(sdf, mask_names = NULL)$mask) ## list available masks, the dbname column listAvailableMasks(sdfPlatform(sdf)) listAvailableMasks("EPICv2")
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sum(qualityMask(sdf)$mask) sum(qualityMask(sdf, mask_names = NULL)$mask) ## list available masks, the dbname column listAvailableMasks(sdfPlatform(sdf)) listAvailableMasks("EPICv2")
This function only reads the meta-data.
readFileSet(map_path)
readFileSet(map_path)
map_path |
path of file to map (should contain valid _idx.rds index) |
a sesame::fileSet object
## create two samples fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) ## a hypothetical numeric array (can be beta values, intensities etc) hypothetical <- setNames(runif(fset$n), fset$probes) ## map the numeric to file mapFileSet(fset, 's1', hypothetical) ## read it from file fset <- readFileSet('mybetas2') ## get data sliceFileSet(fset, 's1', 'cg00000292')
## create two samples fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) ## a hypothetical numeric array (can be beta values, intensities etc) hypothetical <- setNames(runif(fset$n), fset$probes) ## map the numeric to file mapFileSet(fset, 's1', hypothetical) ## read it from file fset <- readFileSet('mybetas2') ## get data sliceFileSet(fset, 's1', 'cg00000292')
The function takes a prefix string that are shared with _Grn.idat
and _Red.idat. The function returns a SigDF
.
readIDATpair( prefix.path, manifest = NULL, platform = "", min_beads = NULL, controls = NULL, verbose = FALSE )
readIDATpair( prefix.path, manifest = NULL, platform = "", min_beads = NULL, controls = NULL, verbose = FALSE )
prefix.path |
sample prefix without _Grn.idat and _Red.idat |
manifest |
optional design manifest file |
platform |
EPIC, HM450 and HM27 etc. |
min_beads |
minimum bead number, probes with R or G smaller than this threshold will be masked. If NULL, no filtering based on bead count will be applied. |
controls |
optional control probe manifest file |
verbose |
be verbose? (FALSE) |
a SigDF
sdf <- readIDATpair(sub('_Grn.idat','',system.file( "extdata", "4207113116_A_Grn.idat", package = "sesameData")))
sdf <- readIDATpair(sub('_Grn.idat','',system.file( "extdata", "4207113116_A_Grn.idat", package = "sesameData")))
The returned name is the db name used in KYCG.mask
recommendedMaskNames()
recommendedMaskNames()
a named list of mask names
recommendedMaskNames()[["EPICv2"]] recommendedMaskNames()[["EPIC"]]
recommendedMaskNames()[["EPICv2"]] recommendedMaskNames()[["EPIC"]]
This requries setting a seed with a secret number that was used to de-identify the IDAT (see example). This requires a secret number that was used to de-idenitfy the IDAT
reIdentify(path, out_path = NULL, snps = NULL, mft = NULL)
reIdentify(path, out_path = NULL, snps = NULL, mft = NULL)
path |
input IDAT file |
out_path |
output IDAT file |
snps |
SNP definition, if not given, default to SNP probes |
mft |
sesame-compatible manifest if non-standard |
NULL, changes made to the IDAT files
temp_out <- tempfile("test") set.seed(123) reIdentify(system.file( "extdata", "4207113116_A_Grn.idat", package = "sesameData"), temp_out) unlink(temp_out)
temp_out <- tempfile("test") set.seed(123) reIdentify(system.file( "extdata", "4207113116_A_Grn.idat", package = "sesameData"), temp_out) unlink(temp_out)
Reset Masking
resetMask(sdf, verbose = FALSE)
resetMask(sdf, verbose = FALSE)
sdf |
a |
verbose |
print more messages |
a new SigDF
with mask reset to all FALSE
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sdf <- addMask(sdf, c("cg14057072", "cg22344912")) sum(sdf$mask) sum(resetMask(sdf)$mask)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sdf <- addMask(sdf, c("cg14057072", "cg22344912")) sum(sdf$mask) sum(resetMask(sdf)$mask)
This function takes a SigDF
and returns a modified SigDF
with background subtracted. scrub subtracts residual background using
background median
scrub(sdf)
scrub(sdf)
sdf |
a |
This function is meant to be used after noob.
a new SigDF
with noob background correction
sdf <- sesameDataGet('EPIC.1.SigDF') sdf.nb <- noob(sdf) sdf.nb.scrub <- scrub(sdf.nb)
sdf <- sesameDataGet('EPIC.1.SigDF') sdf.nb <- noob(sdf) sdf.nb.scrub <- scrub(sdf.nb)
This function takes a SigDF
and returns a modified SigDF
with background subtracted. scrubSoft subtracts residual background using a
noob-like procedure.
scrubSoft(sdf)
scrubSoft(sdf)
sdf |
a |
This function is meant to be used after noob.
a new SigDF
with noob background correction
sdf <- sesameDataGet('EPIC.1.SigDF') sdf.nb <- noob(sdf) sdf.nb.scrubSoft <- scrubSoft(sdf.nb)
sdf <- sesameDataGet('EPIC.1.SigDF') sdf.nb <- noob(sdf) sdf.nb.scrubSoft <- scrubSoft(sdf.nb)
read a table file to SigDF
sdf_read_table(fname, platform = NULL, verbose = FALSE, ...)
sdf_read_table(fname, platform = NULL, verbose = FALSE, ...)
fname |
file name |
platform |
array platform (will infer if not given) |
verbose |
print more information |
... |
additional argument to read.table |
read table file to SigDF
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') fname <- sprintf("%s/sigdf.txt", tempdir()) sdf_write_table(sdf, file=fname) sdf2 <- sdf_read_table(fname)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') fname <- sprintf("%s/sigdf.txt", tempdir()) sdf_write_table(sdf, file=fname) sdf2 <- sdf_read_table(fname)
write SigDF to table file
sdf_write_table(sdf, ...)
sdf_write_table(sdf, ...)
sdf |
the |
... |
additional argument to write.table |
write SigDF to table file
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf_write_table(sdf, file=sprintf("%s/sigdf.txt", tempdir()))
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sdf_write_table(sdf, file=sprintf("%s/sigdf.txt", tempdir()))
collapse to probe prefix
SDFcollapseToPfx(sdf)
SDFcollapseToPfx(sdf)
sdf |
a SigDF object |
a data frame with updated Probe_ID
Convenience function to output platform attribute of SigDF
sdfPlatform(sdf, verbose = FALSE)
sdfPlatform(sdf, verbose = FALSE)
sdf |
a SigDF object |
verbose |
print more messages |
the platform string for the SigDF object
sesameDataCache() sdf <- sesameDataGet('EPIC.1.SigDF') sdfPlatform(sdf)
sesameDataCache() sdf <- sesameDataGet('EPIC.1.SigDF') sdfPlatform(sdf)
The input is the directory name as a string. The function identifies all the IDAT files under the directory. The function returns a vector of such IDAT prefixes under the directory.
searchIDATprefixes(dir.name, recursive = TRUE, use.basename = TRUE)
searchIDATprefixes(dir.name, recursive = TRUE, use.basename = TRUE)
dir.name |
the directory containing the IDAT files. |
recursive |
search IDAT files recursively |
use.basename |
basename of each IDAT path is used as sample name This won't work in rare situation where there are duplicate IDAT files. |
the IDAT prefixes (a vector of character strings).
## only search what are directly under IDATprefixes <- searchIDATprefixes( system.file("extdata", "", package = "sesameData")) ## search files recursively is by default IDATprefixes <- searchIDATprefixes( system.file(package = "sesameData"), recursive=TRUE)
## only search what are directly under IDATprefixes <- searchIDATprefixes( system.file("extdata", "", package = "sesameData")) ## search files recursively is by default IDATprefixes <- searchIDATprefixes( system.file(package = "sesameData"), recursive=TRUE)
Segment bins using DNAcopy
segmentBins(bin.signals, bin.coords)
segmentBins(bin.signals, bin.coords)
bin.signals |
bin signals (input) |
bin.coords |
bin coordinates |
segment signal data frame
print package verison of sesame and depended packages to help troubleshoot installation issues.
sesame_checkVersion()
sesame_checkVersion()
print the version of sesame, sesameData, biocondcutor and R
sesame_checkVersion()
sesame_checkVersion()
Annotate a data.frame using manifest
sesameAnno_attachManifest( df, probe_id = "Probe_ID", platform = NULL, genome = NULL )
sesameAnno_attachManifest( df, probe_id = "Probe_ID", platform = NULL, genome = NULL )
df |
input data frame with Probe_ID as a column |
probe_id |
the Probe_ID column name, default to "Probe_ID" or rownames |
platform |
which array platform, guess from probe ID if not given |
genome |
the genome build, use default if not given |
a new data.frame with manifest attached
## Not run: df <- data.frame(Probe_ID = c("cg00101675_BC21", "cg00116289_BC21")) sesameAnno_attachManifest(df) ## End(Not run)
## Not run: df <- data.frame(Probe_ID = c("cg00101675_BC21", "cg00116289_BC21")) sesameAnno_attachManifest(df) ## End(Not run)
Build sesame ordering address file from tsv
sesameAnno_buildAddressFile(tsv)
sesameAnno_buildAddressFile(tsv)
tsv |
a platform name, a file path or a tibble/data.frame manifest file |
a list of ordering and controls
## Not run: tsv = sesameAnno_download("HM450.hg38.manifest.tsv.gz") addr <- sesameAnno_buildAddressFile(tsv) ## End(Not run)
## Not run: tsv = sesameAnno_download("HM450.hg38.manifest.tsv.gz") addr <- sesameAnno_buildAddressFile(tsv) ## End(Not run)
manifest tsv files can be downloaded from http://zwdzwd.github.io/InfiniumAnnotation
sesameAnno_buildManifestGRanges( tsv, genome = NULL, decoy = FALSE, columns = NULL )
sesameAnno_buildManifestGRanges( tsv, genome = NULL, decoy = FALSE, columns = NULL )
tsv |
a file path, a platform (e.g., EPIC), or a tibble/data.frame object |
genome |
a genome string, e.g., hg38, mm10 |
decoy |
consider decoy sequence in chromosome order |
columns |
the columns to include in the GRanges |
GRanges
## Not run: tsv = sesameAnno_download("HM450.hg38.manifest.tsv.gz") gr <- sesameAnno_buildManifestGRanges(tsv) ## direct access gr <- sesameAnno_buildManifestGRanges("HM450.hg38.manifest") ## End(Not run)
## Not run: tsv = sesameAnno_download("HM450.hg38.manifest.tsv.gz") gr <- sesameAnno_buildManifestGRanges(tsv) ## direct access gr <- sesameAnno_buildManifestGRanges("HM450.hg38.manifest") ## End(Not run)
see also http://zwdzwd.github.io/InfiniumAnnotation
sesameAnno_download(url, destfile = tempfile(basename(url)))
sesameAnno_download(url, destfile = tempfile(basename(url)))
url |
url or title of the annotation file |
destfile |
download to this file, a temp file if unspecified |
This function acts similarly as sesameAnno_get except that it directly download files without invoking BiocFileCache. This is needed in some situation because BiocFileCache may change the file name and downstream program may depend on the correct file names. It also lets you download files in a cleaner way without routing through BiocFileCache
the path to downloaded file
## Not run: ## avoid testing as this function uses external host sesameAnno_download("Test/3999492009_R01C01_Grn.idat") sesameAnno_download("EPIC.hg38.manifest.tsv.gz") sesameAnno_download("EPIC.hg38.snp.tsv.gz") ## End(Not run)
## Not run: ## avoid testing as this function uses external host sesameAnno_download("Test/3999492009_R01C01_Grn.idat") sesameAnno_download("EPIC.hg38.manifest.tsv.gz") sesameAnno_download("EPIC.hg38.snp.tsv.gz") ## End(Not run)
Read manifest file to a tsv format
sesameAnno_readManifestTSV(tsv_fn)
sesameAnno_readManifestTSV(tsv_fn)
tsv_fn |
tsv file path |
a manifest as a tibble
## Not run: tsv = sesameAnno_download("HM450.hg38.manifest.tsv.gz") mft <- sesameAnno_readManifestTSV(tsv) ## direct access mft <- sesameAnno_readManifestTSV("HM450.hg38.manifest") ## End(Not run)
## Not run: tsv = sesameAnno_download("HM450.hg38.manifest.tsv.gz") mft <- sesameAnno_readManifestTSV(tsv) ## direct access mft <- sesameAnno_readManifestTSV("HM450.hg38.manifest") ## End(Not run)
It is a function to call one or multiple sesameQC_calcStats functions
sesameQC_calcStats(sdf, funs = NULL)
sesameQC_calcStats(sdf, funs = NULL)
sdf |
a SigDF object |
funs |
a sesameQC_calcStats_* function or a list of them default to all functions. One can also use a string such as "detection" or c("detection", "intensity") to reduce typing |
currently supporting: detection, intensity, numProbes, channel, dyeBias, betas
a sesameQC object
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_calcStats(sdf) sesameQC_calcStats(sdf, "detection") sesameQC_calcStats(sdf, c("detection", "channel")) ## retrieve stats as a list sesameQC_getStats(sesameQC_calcStats(sdf, "detection")) ## or as data frames as.data.frame(sesameQC_calcStats(sdf, "detection"))
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_calcStats(sdf) sesameQC_calcStats(sdf, "detection") sesameQC_calcStats(sdf, c("detection", "channel")) ## retrieve stats as a list sesameQC_getStats(sesameQC_calcStats(sdf, "detection")) ## or as data frames as.data.frame(sesameQC_calcStats(sdf, "detection"))
Get stat numbers from an sesameQC object
sesameQC_getStats(qc, stat_names = NULL, drop = TRUE)
sesameQC_getStats(qc, stat_names = NULL, drop = TRUE)
qc |
a sesameQC object |
stat_names |
which stat(s) to retrieve, default to all. |
drop |
whether to drop to a string when stats_names has only one element. |
a list of named stats to be retrieved
sdf <- sesameDataGet("EPIC.1.SigDF") qc <- sesameQC_calcStats(sdf, "detection") sesameQC_getStats(qc, "frac_dt")
sdf <- sesameDataGet("EPIC.1.SigDF") qc <- sesameQC_calcStats(sdf, "detection") sesameQC_getStats(qc, "frac_dt")
By default, it plots median_beta_cg, median_beta_ch, RGratio, RGdistort, frac_dt
sesameQC_plotBar(qcs, keys = NULL)
sesameQC_plotBar(qcs, keys = NULL)
qcs |
a list of SigDFs |
keys |
optional, other key to plot, instead of the default keys can be found in the parenthesis of the print output of each sesameQC output. |
a bar plot comparing different QC metrics
sesameDataCache() # if not done yet sdfs <- sesameDataGet("EPIC.5.SigDF.normal")[1:2] sesameQC_plotBar(lapply(sdfs, sesameQC_calcStats, "detection"))
sesameDataCache() # if not done yet sdfs <- sesameDataGet("EPIC.5.SigDF.normal")[1:2] sesameQC_plotBar(lapply(sdfs, sesameQC_calcStats, "detection"))
Plot betas distinguishing different Infinium chemistries
sesameQC_plotBetaByDesign( sdf, prep = NULL, legend_pos = "top", mar = c(3, 3, 1, 1), main = "", ... )
sesameQC_plotBetaByDesign( sdf, prep = NULL, legend_pos = "top", mar = c(3, 3, 1, 1), main = "", ... )
sdf |
SigDF |
prep |
prep codes to step through |
legend_pos |
legend position (default: top) |
mar |
margin of layout when showing steps of prep |
main |
main title in plots |
... |
additional options to plot |
create a density plot
sdf <- sesameDataGet("EPIC.1.SigDF") sesameQC_plotBetaByDesign(sdf, prep="DB")
sdf <- sesameDataGet("EPIC.1.SigDF") sesameQC_plotBetaByDesign(sdf, prep="DB")
Plot SNP heatmap
sesameQC_plotHeatSNPs(sdfs, cluster = TRUE, filter.nonvariant = TRUE)
sesameQC_plotHeatSNPs(sdfs, cluster = TRUE, filter.nonvariant = TRUE)
sdfs |
beta value matrix, row: probes; column: samples |
cluster |
show clustered heatmap |
filter.nonvariant |
whether to filter nonvariant (range < 0.3) |
a grid graphics object
sdfs <- sesameDataGet("EPIC.5.SigDF.normal")[1:2] plt <- sesameQC_plotHeatSNPs(sdfs, filter.nonvariant = FALSE)
sdfs <- sesameDataGet("EPIC.5.SigDF.normal")[1:2] plt <- sesameQC_plotHeatSNPs(sdfs, filter.nonvariant = FALSE)
Plot Total Signal Intensities vs Beta Values This plot is helpful in revealing the extent of signal background and dye bias.
sesameQC_plotIntensVsBetas( sdf, mask = TRUE, use_max = FALSE, intens.range = c(5, 15), pal = "whiteturbo", ... )
sesameQC_plotIntensVsBetas( sdf, mask = TRUE, use_max = FALSE, intens.range = c(5, 15), pal = "whiteturbo", ... )
sdf |
a |
mask |
whether to remove probes that are masked |
use_max |
to use max(M,U) or M+U |
intens.range |
plot range of signal intensity |
pal |
color palette, whiteturbo, whiteblack, whitejet |
... |
additional arguments to smoothScatter |
create a total signal intensity vs beta value plot
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_plotIntensVsBetas(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_plotIntensVsBetas(sdf)
Plot red-green QQ-Plot using Infinium-I Probes
sesameQC_plotRedGrnQQ(sdf, main = "R-G QQ Plot", ...)
sesameQC_plotRedGrnQQ(sdf, main = "R-G QQ Plot", ...)
sdf |
a |
main |
plot title |
... |
additional options to qqplot |
create a qqplot
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_plotRedGrnQQ(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_plotRedGrnQQ(sdf)
This function compares the input sample with public data. Only overlapping metrics will be compared.
sesameQC_rankStats(qc, publicQC = NULL, platform = "EPIC")
sesameQC_rankStats(qc, publicQC = NULL, platform = "EPIC")
qc |
a sesameQC object |
publicQC |
public QC statistics, filtered from e.g.: EPIC.publicQC, MM285.publicQC and Mammal40.publicQC |
platform |
EPIC, MM285 or Mammal40, used when publicQC is not given |
a sesameQC
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_rankStats(sesameQC_calcStats(sdf, "intensity"))
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') sesameQC_rankStats(sesameQC_calcStats(sdf, "intensity"))
An S4 class to hold QC statistics
sesameQC object
stat
a list to store qc stats
Convert a list of sesameQC to data frame
sesameQCtoDF(qcs, cols = c("frac_dt_cg", "RGdistort", "RGratio"))
sesameQCtoDF(qcs, cols = c("frac_dt_cg", "RGdistort", "RGratio"))
qcs |
sesameQCs |
cols |
QC columns, use NULL to report all |
a data frame
sdf <- sesameDataGet("EPIC.1.SigDF") qcs <- sesameQC_calcStats(sdf, "detection") sesameQCtoDF(qcs)
sdf <- sesameDataGet("EPIC.1.SigDF") qcs <- sesameQC_calcStats(sdf, "detection") sesameQCtoDF(qcs)
sesamize function is deprecated. Please check https://github.com/zwdzwd/sesamize for previous scripts
sesamize(...)
sesamize(...)
... |
arguments for sesamize |
a message text for deprecated function
cat("Deprecated. see https://github.com/zwdzwd/sesamize")
cat("Deprecated. see https://github.com/zwdzwd/sesamize")
Set mask to only the probes specified
setMask(sdf, probes)
setMask(sdf, probes)
sdf |
a |
probes |
a vector of probe IDs or a logical vector with TRUE representing masked probes |
a SigDF
with added mask
sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sum(setMask(sdf, "cg14959801")$mask) sum(setMask(sdf, c("cg14057072", "cg22344912"))$mask)
sdf <- sesameDataGet('EPIC.1.SigDF') sum(sdf$mask) sum(setMask(sdf, "cg14959801")$mask) sum(setMask(sdf, c("cg14057072", "cg22344912"))$mask)
SigDF validation from a plain data frame
SigDF(df, platform = "EPIC", ctl = NULL)
SigDF(df, platform = "EPIC", ctl = NULL)
df |
a |
platform |
a string to specify the array platform |
ctl |
optional control probe data frame |
a SigDF
object
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF')
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF')
report M and U for regular probes
signalMU(sdf, mask = TRUE, MU = FALSE)
signalMU(sdf, mask = TRUE, MU = FALSE)
sdf |
a |
mask |
whether to apply mask |
MU |
add a column for M+U |
a data frame of M and U columns
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') head(signalMU(sdf))
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') head(signalMU(sdf))
Slice a fileSet with samples and probes
sliceFileSet(fset, samples = fset$samples, probes = fset$probes, memmax = 10^5)
sliceFileSet(fset, samples = fset$samples, probes = fset$probes, memmax = 10^5)
fset |
a sesame::fileSet, as obtained via readFileSet |
samples |
samples to query (default to all samples) |
probes |
probes to query (default to all probes) |
memmax |
maximum items to read from file to memory, to protect from accidental memory congestion. |
a numeric matrix of length(samples) columns and length(probes) rows
## create two samples fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) ## a hypothetical numeric array (can be beta values, intensities etc) hypothetical <- setNames(runif(fset$n), fset$probes) ## map the numeric to file mapFileSet(fset, 's1', hypothetical) ## get data sliceFileSet(fset, 's1', 'cg00000292')
## create two samples fset <- initFileSet('mybetas2', 'HM27', c('s1','s2')) ## a hypothetical numeric array (can be beta values, intensities etc) hypothetical <- setNames(runif(fset$n), fset$probes) ## map the numeric to file mapFileSet(fset, 's1', hypothetical) ## get data sliceFileSet(fset, 's1', 'cg00000292')
Extract slope information from DMLSummary
summaryExtractTest(smry)
summaryExtractTest(smry)
smry |
DMLSummary from DML command |
a table of slope and p-value
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:10,], ~type, meta=data$sampleInfo) slopes <- summaryExtractTest(smry) sesameDataGet_resetEnv()
sesameDataCache() # in case not done yet data <- sesameDataGet('HM450.76.TCGA.matched') smry <- DML(data$betas[1:10,], ~type, meta=data$sampleInfo) slopes <- summaryExtractTest(smry) sesameDataGet_resetEnv()
testEnrichment tests for the enrichment of set of probes (query set) in a number of features (database sets).
testEnrichment( query, databases = NULL, universe = NULL, alternative = "greater", include_genes = FALSE, platform = NULL, silent = FALSE )
testEnrichment( query, databases = NULL, universe = NULL, alternative = "greater", include_genes = FALSE, platform = NULL, silent = FALSE )
query |
Vector of probes of interest (e.g., significant probes) |
databases |
List of vectors corresponding to the database sets of interest with associated meta data as an attribute to each element. Optional. (Default: NA) |
universe |
Vector of probes in the universe set containing all of the probes to be considered in the test. If it is not provided, it will be inferred from the provided platform. (Default: NA). |
alternative |
"two.sided", "greater", or "less" |
include_genes |
include gene link enrichment testing |
platform |
String corresponding to the type of platform to use. Either MM285, EPIC, HM450, or HM27. If it is not provided, it will be inferred from the query set probeIDs (Default: NA). |
silent |
output message? (Default: FALSE) |
A data frame containing features corresponding to the test estimate, p-value, and type of test.
library(SummarizedExperiment) df <- rowData(sesameDataGet('MM285.tissueSignature')) query <- df$Probe_ID[df$branch == "B_cell"] res <- testEnrichment(query, "chromHMM", platform="MM285") sesameDataGet_resetEnv()
library(SummarizedExperiment) df <- rowData(sesameDataGet('MM285.tissueSignature')) query <- df$Probe_ID[df$branch == "B_cell"] res <- testEnrichment(query, "chromHMM", platform="MM285") sesameDataGet_resetEnv()
Estimates log2 Odds ratio
testEnrichmentFisher(query, database, universe, alternative = "greater")
testEnrichmentFisher(query, database, universe, alternative = "greater")
query |
Vector of probes of interest (e.g., significant probes) |
database |
Vectors corresponding to the database set of interest with associated meta data as an attribute to each element. |
universe |
Vector of probes in the universe set containing all of |
alternative |
greater or two.sided (default: greater) the probes to be considered in the test. (Default: NULL) |
A DataFrame with the estimate/statistic, p-value, and name of test for the given results.
Convenient function for testing enrichment of gene linkage
testEnrichmentGene(query, platform = NULL, silent = FALSE, ...)
testEnrichmentGene(query, platform = NULL, silent = FALSE, ...)
query |
probe set of interest |
platform |
string corresponding to the type of platform to use. Either MM285, EPIC, HM450, or HM27. If it is not provided, it will be inferred from the query set probe IDs. |
silent |
whether to output message |
... |
addition argument provided to testEnrichment |
A data frame containing features corresponding to the test estimate, p-value, and type of test etc.
query <- c("cg04707299", "cg13380562", "cg00480749") testEnrichment(query, platform = "EPIC")
query <- c("cg04707299", "cg13380562", "cg00480749") testEnrichment(query, platform = "EPIC")
estimate represent enrichment score and negative estimate indicate a test for depletion
testEnrichmentSEA( query, databases, platform = NULL, silent = FALSE, precise = FALSE, prepPlot = FALSE )
testEnrichmentSEA( query, databases, platform = NULL, silent = FALSE, precise = FALSE, prepPlot = FALSE )
query |
query, if numerical, expect categorical database, if categorical expect numerical database |
databases |
database, numerical or categorical, but needs to be different from query |
platform |
EPIC, MM285, ..., infer if not given |
silent |
suppress message (default: FALSE) |
precise |
whether to compute precise p-value (up to numerical limit) of interest. |
prepPlot |
return the raw enrichment scores and presence vectors for plotting |
A DataFrame with the estimate/statistic, p-value, and name of test for the given results.
query <- KYCG_getDBs("KYCG.MM285.designGroup")[["TSS"]] res <- testEnrichmentSEA(query, "MM285.seqContextN")
query <- KYCG_getDBs("KYCG.MM285.designGroup")[["TSS"]] res <- testEnrichmentSEA(query, "MM285.seqContextN")
testEnrichmentSpearman uses the Spearman statistical test to estimate the association between two continuous variables.
testEnrichmentSpearman(query, database)
testEnrichmentSpearman(query, database)
query |
Vector of probes of interest (e.g., significant probes) |
database |
List of vectors corresponding to the database set of interest with associated meta data as an attribute to each element. |
A DataFrame with the estimate/statistic, p-value, and name of test for the given results.
The function takes one single SigDF
and computes total
intensity of all the in-band measurements by summing methylated and
unmethylated alleles. This function outputs a single numeric for the mean.
totalIntensities(sdf, mask = FALSE)
totalIntensities(sdf, mask = FALSE)
sdf |
a |
mask |
whether to mask probes using mask column |
a vector of M+U signal for each probe
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') intensities <- totalIntensities(sdf)
sesameDataCache() # if not done yet sdf <- sesameDataGet('EPIC.1.SigDF') intensities <- totalIntensities(sdf)
Estimate the fraction of the 2nd component in a 2-component mixture
twoCompsEst2( pop1, pop2, target, use.ave = TRUE, diff_1m2u = NULL, diff_1u2m = NULL )
twoCompsEst2( pop1, pop2, target, use.ave = TRUE, diff_1m2u = NULL, diff_1u2m = NULL )
pop1 |
Reference methylation level matrix for population 1 |
pop2 |
Reference methylation level matrix for population 2 |
target |
Target methylation level matrix to be analyzed |
use.ave |
use population average in selecting differentially methylated probes |
diff_1m2u |
A vector of differentially methylated probes (methylated in population 1 but unmethylated in population 2) |
diff_1u2m |
A vector of differentially methylated probes (unmethylated in population 1 but methylated in population 2) |
Estimate of the 2nd component in the 2-component mixture
also sets attr(,"species")
updateSigDF(sdf, species = NULL, strain = NULL, addr = NULL, verbose = FALSE)
updateSigDF(sdf, species = NULL, strain = NULL, addr = NULL, verbose = FALSE)
sdf |
a |
species |
the species the sample is considered to be |
strain |
the strain the sample is considered to be |
addr |
species-specific address species, optional |
verbose |
print more messages |
a SigDF
with updated color channel and mask
sdf <- sesameDataGet('Mammal40.1.SigDF') sdf_mouse <- updateSigDF(sdf, species="mus_musculus")
sdf <- sesameDataGet('Mammal40.1.SigDF') sdf_mouse <- updateSigDF(sdf, species="mus_musculus")
Visualize the beta value in heatmaps for a given gene. The function takes a gene name which is taken from the UCSC refGene. It searches all the transcripts for the given gene and optionally extend the span by certain number of base pairs. The function also takes a beta value matrix with sample names on the columns and probe names on the rows. The function can also work on different genome builds (default to hg38, can be hg19).
visualizeGene( gene_name, betas, platform = NULL, genome = NULL, upstream = 2000, dwstream = 2000, ... )
visualizeGene( gene_name, betas, platform = NULL, genome = NULL, upstream = 2000, dwstream = 2000, ... )
gene_name |
gene name |
betas |
beta value matrix (row: probes, column: samples) |
platform |
HM450, EPIC, or MM285 (default) |
genome |
hg19, hg38, or mm10 (default) |
upstream |
distance to extend upstream |
dwstream |
distance to extend downstream |
... |
additional options, see visualizeRegion, assemble_plots |
None
betas <- sesameDataGet('HM450.76.TCGA.matched')$betas visualizeGene('ADA', betas, 'HM450')
betas <- sesameDataGet('HM450.76.TCGA.matched')$betas visualizeGene('ADA', betas, 'HM450')
Visualize the beta value in heatmaps for the genomic region containing specified probes. The function works only if specified probes can be spanned by a single genomic region. The region can cover more probes than specified. Hence the plotting heatmap may encompass more probes. The function takes as input a string vector of probe IDs (cg/ch/rs-numbers). if draw is FALSE, the function returns the subset beta value matrix otherwise it returns the grid graphics object.
visualizeProbes( probeNames, betas, platform = NULL, genome = NULL, upstream = 1000, dwstream = 1000, ... )
visualizeProbes( probeNames, betas, platform = NULL, genome = NULL, upstream = 1000, dwstream = 1000, ... )
probeNames |
probe names |
betas |
beta value matrix (row: probes, column: samples) |
platform |
HM450, EPIC or MM285 (default) |
genome |
hg19, hg38 or mm10 (default) |
upstream |
distance to extend upstream |
dwstream |
distance to extend downstream |
... |
additional options, see visualizeRegion and assemble_plots |
None
betas <- sesameDataGet('HM450.76.TCGA.matched')$betas visualizeProbes(c('cg22316575', 'cg16084772', 'cg20622019'), betas, 'HM450')
betas <- sesameDataGet('HM450.76.TCGA.matched')$betas visualizeProbes(c('cg22316575', 'cg16084772', 'cg20622019'), betas, 'HM450')
The function takes a genomic coordinate (chromosome, start and end) and a beta value matrix (probes on the row and samples on the column). It plots the beta values as a heatmap for all probes falling into the genomic region. If 'draw=TRUE' the function returns the plotted grid graphics object. Otherwise, the selected beta value matrix is returned. 'cluster.samples=TRUE/FALSE' controls whether hierarchical clustering is applied to the subset beta value matrix.
visualizeRegion( chrm, beg, end, betas, platform = NULL, genome = NULL, draw = TRUE, cluster.samples = FALSE, na.rm = FALSE, nprobes.max = 1000, txn.types = "protein_coding", txn.font.size = 6, ... )
visualizeRegion( chrm, beg, end, betas, platform = NULL, genome = NULL, draw = TRUE, cluster.samples = FALSE, na.rm = FALSE, nprobes.max = 1000, txn.types = "protein_coding", txn.font.size = 6, ... )
chrm |
chromosome |
beg |
begin of the region |
end |
end of the region |
betas |
beta value matrix (row: probes, column: samples) |
platform |
EPIC, HM450, or MM285 |
genome |
hg38, mm10, ..., will infer if not given. For additional mapping, download the GRanges object from http://zwdzwd.github.io/InfiniumAnnotation and provide the following argument ..., genome = sesameAnno_buildManifestGRanges("downloaded_file"),... to this function. |
draw |
draw figure or return betas |
cluster.samples |
whether to cluster samples |
na.rm |
remove probes with all NA. |
nprobes.max |
maximum number of probes to plot |
txn.types |
default to protein_coding, use NULL for all |
txn.font.size |
transcript name font size |
... |
additional options, see assemble_plots |
graphics or a matrix containing the captured beta values
betas <- sesameDataGet('HM450.76.TCGA.matched')$betas visualizeRegion('chr20', 44648623, 44652152, betas, 'HM450')
betas <- sesameDataGet('HM450.76.TCGA.matched')$betas visualizeRegion('chr20', 44648623, 44652152, betas, 'HM450')
The function takes a CNSegment
object obtained from cnSegmentation
and plot the bin signals and segments (as horizontal lines).
visualizeSegments(seg, to.plot = NULL, genes.to.label = NULL)
visualizeSegments(seg, to.plot = NULL, genes.to.label = NULL)
seg |
a |
to.plot |
chromosome to plot (by default plot all chromosomes) |
genes.to.label |
gene(s) to label |
require ggplot2, scales
plot graphics
sesameDataCache() ## Not run: sdfs <- sesameDataGet('EPICv2.8.SigDF') sdf <- sdfs[["K562_206909630040_R01C01"]] seg <- cnSegmentation(sdf) seg <- cnSegmentation(sdf, return.probe.signals=TRUE) visualizeSegments(seg) visualizeSegments(seg, to.plot=c("chr9","chr22")) visualizeSegments(seg, genes.to.label=c("ABL1","BCR")) ## End(Not run) sesameDataGet_resetEnv()
sesameDataCache() ## Not run: sdfs <- sesameDataGet('EPICv2.8.SigDF') sdf <- sdfs[["K562_206909630040_R01C01"]] seg <- cnSegmentation(sdf) seg <- cnSegmentation(sdf, return.probe.signals=TRUE) visualizeSegments(seg) visualizeSegments(seg, to.plot=c("chr9","chr22")) visualizeSegments(seg, genes.to.label=c("ABL1","BCR")) ## End(Not run) sesameDataGet_resetEnv()