Title: | Robust outlier identification for DNA methylation data |
---|---|
Description: | The package includes some statistical outlier detection methods for epimutations detection in DNA methylation data. The methods included in the package are MANOVA, Multivariate linear models, isolation forest, robust mahalanobis distance, quantile and beta. The methods compare a case sample with a suspected disease against a reference panel (composed of healthy individuals) to identify epimutations in the given case sample. It also contains functions to annotate and visualize the identified epimutations. |
Authors: | Dolors Pelegri-Siso [aut, cre] , Juan R. Gonzalez [aut] , Carlos Ruiz-Arenas [aut] , Carles Hernandez-Ferrer [aut] , Leire Abarrategui [aut] |
Maintainer: | Dolors Pelegri-Siso <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.11.0 |
Built: | 2024-11-19 04:07:17 UTC |
Source: | https://github.com/bioc/epimutacions |
Add ENSEMBL regulatory regions to epimutations
add_ensemble_regulatory(epimutations, build = "37")
add_ensemble_regulatory(epimutations, build = "37")
epimutations |
a data frame object containing
the result from |
build |
the build used to define epimutations coordinates.
By default, it is |
The function returns a data frame object
containing the results of epimutations
or epimutations_one_leave_out
with some additional variables describing regulatory
elements from ENSEMBL.
Note that a single epimutation might overlap with more than one
regulatory region.
In that case, the different regulatory regions are separated by ///
.
ensembl_reg_idRegion identifier from ENSEMBL
ensembl_reg_coordinatesCoordinates for the ENSEMBL regulatory regions
ensembl_reg_typeType of regulatory region
ensembl_reg_tissuesActivity of the regulatory
region per tissue. The different
activation states are separated by /
This function annotates a differentially methylated region
annotate_cpg( data, db, split = ",", epi_col = "cpg_ids", gene_col = "GencodeBasicV12_NAME", feat_col = "Regulatory_Feature_Group", relat_col = "Relation_to_Island", build = "37", omim = TRUE )
annotate_cpg( data, db, split = ",", epi_col = "cpg_ids", gene_col = "GencodeBasicV12_NAME", feat_col = "Regulatory_Feature_Group", relat_col = "Relation_to_Island", build = "37", omim = TRUE )
data |
DataFrame-like object. |
db |
a character string specifying the
Database to use for annotation.
E.g: |
split |
a character string containing the separator for CpG ids.
Default |
epi_col |
CpG ids, should be row names in the data base. |
gene_col |
column name from where to extract gene names.
Default: |
feat_col |
column name from where to extract CpG feature groups.
Default: |
relat_col |
column name from where
to extract relation to island info.
Default: |
build |
The build for bioMart. Default |
omim |
a boolean, if TRUE will annotate OMIMs as well. Takes a bit longer. Default TRUE. |
The function returns a DataFrame-like object annotated.
Information about close genes and regulatory elements for epimutations.
annotate_epimutations( epi_results, db = "IlluminaHumanMethylationEPICanno.ilm10b2.hg19", build = "37", ... )
annotate_epimutations( epi_results, db = "IlluminaHumanMethylationEPICanno.ilm10b2.hg19", build = "37", ... )
epi_results |
a data frame object containing the output
from |
db |
a character string containing the Illumina annotation package used to annotate the CpGs. |
build |
a character string containing the genomic
build where the epimutations are mapped.
The default is GRCh37 ( |
... |
Further arguments passed to |
The function returns the
input object epi_results
with additional columns containing the information about
the genes or overlapping regulatory features.
See annotate_cpg and add_ensemble_regulatory for an in-depth description of these variables.
data(res.epi.manova) #Annotate the epimutations #anno_results <- annotate_epimutations(res.epi.manova)
data(res.epi.manova) #Annotate the epimutations #anno_results <- annotate_epimutations(res.epi.manova)
The function obtains beta values corresponding to the CpGs into DMRs.
betas_from_bump(bump, fd, betas)
betas_from_bump(bump, fd, betas)
bump |
the result from bumphunter. |
fd |
a data frame containing the genomic ranges for each CpGs. |
betas |
a matrix containing the beta values for all CpGs in each sample. |
The function returns a data frame containing the beta values for each sample and CpG into DMR.
Computes the beta values, population mean and 1, 1.5, and 2 standard deviations from the mean of the distribution necessary to plot the epimutations.
betas_sd_mean(gr)
betas_sd_mean(gr)
gr |
a GRanges object obtained from create_GRanges_class function. |
The function returns a list containing the melted beta values, the population mean and 1, 1.5, and 2 standard deviations from the mean of the distribution.
Sets common column names in a given data frame containing the CpGs genomic ranges or a DMR (result of epimutations or epimutations_one_leave_out function).
cols_names(x, cpg_ids_col = FALSE)
cols_names(x, cpg_ids_col = FALSE)
x |
a data frame containing the genomic ranges or a DMR (a row of the results of epimutations or epimutations_one_leave_out function). |
cpg_ids_col |
a boolean, if TRUE the input data frame contains the CpGs names column. |
The function returns a data frame containing the column names to carry out the analysis without any error.
This function makes a GRanges object
from a GenomicRatioSet
.
create_GRanges_class(methy, cpg_ids)
create_GRanges_class(methy, cpg_ids)
methy |
a |
cpg_ids |
a character string specifying the name of the CpGs in the DMR of interest. |
The function returns a GRanges object containing the beta values and the genomic ranges of the CpGs of interest.
epi_beta
method models the DNA methylation data
using a beta distribution.
First, the beta distribution parameters of the
reference population are precomputed and passed to the method.
Then, we compute the probability of observing the methylation values
of the case from the reference beta distribution.
CpGs with p-values smaller than a threshold
pvalue_threshold
and with
a methylation difference with the
mean reference methylation
higher than diff_threshold
are defined as outlier CpGs. Finally,
epimutations are defined as a
group of contiguous outlier CpGs.
epi_beta( beta_params, beta_mean, betas_case, case, controls, betas, annot, pvalue_threshold, diff_threshold, min_cpgs = 3, maxGap )
epi_beta( beta_params, beta_mean, betas_case, case, controls, betas, annot, pvalue_threshold, diff_threshold, min_cpgs = 3, maxGap )
beta_params |
matrix with the parameters of the reference beta distributions for each CpG in the dataset. |
beta_mean |
beta values mean. |
betas_case |
matrix with the methylation values for a case. |
case |
case sample name. |
controls |
control samples names. |
betas |
a matrix containing the beta values for all samples. |
annot |
annotation of the CpGs. |
pvalue_threshold |
minimum p-value to consider a CpG an outlier. |
diff_threshold |
minimum methylation difference between the CpG and the mean methylation to consider a position an outlier. |
min_cpgs |
minimum number of CpGs to consider an epimutation. |
maxGap |
maximum distance between two contiguous CpGs to combine them into an epimutation. |
The function returns a data frame with the candidate regions to be epimutations.
This function identifies regions with CpGs being outliers using isolation.forest approach.
epi_iForest(mixture, case_id, ntrees)
epi_iForest(mixture, case_id, ntrees)
mixture |
beta values matrix. Samples in columns and CpGs in rows. |
case_id |
a character string specifying the name of the case sample. |
ntrees |
number of binary trees to build for the model.
Default is |
The function returns the outlier score for the given case sample.
This function identifies regions with CpGs being outliers using the Minimum Covariance Determinant (MCD) estimator (covMcd) to compute the Mahalanobis distance.
epi_mahdist(mixture, nsamp = c("best", "exact", "deterministic"))
epi_mahdist(mixture, nsamp = c("best", "exact", "deterministic"))
mixture |
beta values matrix. Samples in columns and CpGs in rows. |
nsamp |
the number of subsets used for initial estimates in the MCD.
It can be set as:
|
The implementation of the method here is based on the discussion in this thread of Cross Validated
The function returns the computed Robust Mahalanobis distance.
This function identifies regions with CpGs being outliers using manova approach.
epi_manova(mixture, model, case_id)
epi_manova(mixture, model, case_id)
mixture |
beta values matrix. Samples in columns and CpGs in rows. |
model |
design (or model) matrix. |
case_id |
a character string specifying the name of the case sample. |
The function returns the F statistic, Pillai and P value.
Identifies CpGs with outlier methylation values using methylated Multivariate Linear Model
epi_mlm(mixture, model)
epi_mlm(mixture, model)
mixture |
beta values matrix. Samples in columns and CpGs in rows. |
model |
design (or model) matrix. |
The function returns the F statistic, R2 test statistic and Pillai.
epimutations
and epimutations_one_leave_out
functionsAllow the user to set the values of the parameters to compute the functions epimutations and epimutations_one_leave_out.
epi_parameters( manova = list(pvalue_cutoff = 0.05), mlm = list(pvalue_cutoff = 0.05), iForest = list(outlier_score_cutoff = 0.7, ntrees = 100), mahdist = list(nsamp = "deterministic"), quantile = list(window_sz = 1000, offset_abs = 0.15, qsup = 0.995, qinf = 0.005), beta = list(pvalue_cutoff = 1e-06, diff_threshold = 0.1) )
epi_parameters( manova = list(pvalue_cutoff = 0.05), mlm = list(pvalue_cutoff = 0.05), iForest = list(outlier_score_cutoff = 0.7, ntrees = 100), mahdist = list(nsamp = "deterministic"), quantile = list(window_sz = 1000, offset_abs = 0.15, qsup = 0.995, qinf = 0.005), beta = list(pvalue_cutoff = 1e-06, diff_threshold = 0.1) )
manova , mlm , iForest , mahdist , quantile , beta
|
method selected in the function epimutations. |
pvalue_cutoff |
the threshold p value to select
which CpG regions are outliers in |
outlier_score_cutoff |
The outlier score
threshold to identify outliers CpGs in
isolation forest ( |
ntrees |
number of binary trees to build for the model build by
isolation forest ( |
nsamp |
the number of subsets used for initial
estimates in the Minimum Covariance Determinant
which is used to compute the Robust Mahalanobis
distance ( |
window_sz |
the maximum distance between
CpGs to be considered in the same DMR.
This parameter is used in |
qsup , qinf , offset_abs
|
The upper and lower quantiles (threshold)
to consider a CpG an outlier when using |
diff_threshold |
Minimum methylation difference between the CpG and the mean methylation to consider a position an outlier. |
Invoking epi_parameters()
with
no arguments returns return a list with the default values.
the function returns a list of all set parameters for each method used in epimutations and epimutations_one_leave_out functions.
#Default set of parameters epi_parameters() #change p value for manova method epi_parameters(manova = list("pvalue_cutoff" = 0.01))
#Default set of parameters epi_parameters() #change p value for manova method epi_parameters(manova = list("pvalue_cutoff" = 0.01))
The epi_preprocess
function reads
Illumina methylation sample
sheet for case samples and it merges them with
RGChannelSet reference panel.
The final dataset is normalized using minfi package preprocess methods.
epi_preprocess( cases_dir, reference_panel, pattern = "csv$", normalize = "raw", norm_param = norm_parameters(), verbose = FALSE )
epi_preprocess( cases_dir, reference_panel, pattern = "csv$", normalize = "raw", norm_param = norm_parameters(), verbose = FALSE )
cases_dir |
the base directory from which the search is started. |
reference_panel |
an RGChannelSet object containing the reference panel (controls) samples. |
pattern |
What pattern is used to identify a sample sheet file. |
normalize |
a character string
specifying the selected preprocess method.
For more information see Details or
minfi package user's Guide.
It can be set as: |
norm_param |
the parameters for each preprocessing method. See the function norm_parameters. |
verbose |
logical. If TRUE additional details about the procedure will provide to the user. The default is FALSE. |
The epi_preprocess
function reads Illumina methylation sample
sheet for case samples and it merges them with
RGChannelSet reference panel.
The final dataset is normalized using
different minfi package preprocess methods:
"raw"
: preprocessRaw
"illumina"
: preprocessIllumina
"swan"
: preprocessSWAN
"quantile"
: preprocessQuantile
"noob"
: preprocessNoob
"funnorm"
: preprocessFunnorm
epi_preprocess
function returns a
GenomicRatioSet object
containing case and control (reference panel) samples.
# The reference panel for this example is available in #epimutacionsData (ExperimentHub) package library(ExperimentHub) eh <- ExperimentHub() query(eh, c("epimutacionsData")) reference_panel <- eh[["EH6691"]] cases_dir <- system.file("extdata", package = "epimutacionsData") #Preprocessing epi_preprocess( cases_dir, reference_panel, pattern = "SampleSheet.csv")
# The reference panel for this example is available in #epimutacionsData (ExperimentHub) package library(ExperimentHub) eh <- ExperimentHub() query(eh, c("epimutacionsData")) reference_panel <- eh[["EH6691"]] cases_dir <- system.file("extdata", package = "epimutacionsData") #Preprocessing epi_preprocess( cases_dir, reference_panel, pattern = "SampleSheet.csv")
Identifies CpGs with outlier methylation values using a sliding window approach to compare individual methylation profiles of a single case sample against all other samples from reference panel (controls)
epi_quantile( case, fd, bctr_pmin, bctr_pmax, controls, betas, window_sz = 1000, N = 3, offset_abs = 0.15 )
epi_quantile( case, fd, bctr_pmin, bctr_pmax, controls, betas, window_sz = 1000, N = 3, offset_abs = 0.15 )
case |
beta values for a single case (data.frame). The samples as single column and CpGs in rows (named). |
fd |
feature description as data.frame having at least chromosome and position as columns and and CpGs in rows (named). |
bctr_pmin |
Beta value observed at 0.01 quantile in controls. A beta values has to be lower or equal to this value to be considered an epimutation. |
bctr_pmax |
Beta value observed at 0.99 quantile in controls. A beta values has to be higher or equal to this value to be considered an epimutation. |
controls |
control samples names. |
betas |
a matrix containing the beta values for all samples. |
window_sz |
Maximum distance between a pair of CpGs to defined an region of CpGs as epimutation (default: 1000). |
N |
Minimum number of CpGs, separated in a maximum of window_sz bass, to defined an epimutation (default: 3). |
offset_abs |
Extra enforcement defining an epimutation based on beta values at 0.005 and 0.995 quantiles (default: 0.15). |
The function returns a data frame with the regions candidates to be epimutations.
The function identifies differentially methylated regions in a case sample by comparing it against a control panel.
epimutations( case_samples, control_panel, method = "manova", chr = NULL, start = NULL, end = NULL, epi_params = epi_parameters(), maxGap = 1000, bump_cutoff = 0.1, min_cpg = 3, verbose = TRUE )
epimutations( case_samples, control_panel, method = "manova", chr = NULL, start = NULL, end = NULL, epi_params = epi_parameters(), maxGap = 1000, bump_cutoff = 0.1, min_cpg = 3, verbose = TRUE )
case_samples |
a GenomicRatioSet object containing the case samples. See the constructor function GenomicRatioSet, makeGenomicRatioSetFromMatrix. |
control_panel |
a GenomicRatioSet object containing the control panel (control panel). |
method |
a character string naming the
outlier detection method to be used.
This can be set as: |
chr |
a character string containing the sequence
names to be analysed. The default value is |
start |
an integer specifying the start position.
The default value is |
end |
an integer specifying the end position.
The default value is |
epi_params |
the parameters for each method. See the function epi_parameters. |
maxGap |
the maximum location gap used in bumphunter method. |
bump_cutoff |
a numeric value of the estimate of the genomic profile above the cutoff or below the negative of the cutoff will be used as candidate regions. |
min_cpg |
an integer specifying the minimum CpGs number in a DMR. |
verbose |
logical. If TRUE additional details about the procedure will provide to the user. The default is TRUE. |
The function compares a case sample against a control panel to identify epimutations in the given sample. First, the DMRs are identified using the bumphunter approach. After that, CpGs in those DMRs are tested in order to detect regions with CpGs being outliers. For that, different outlier detection methods can be selected:
Multivariate Analysis of Variance ("manova"
). manova
Multivariate Linear Model ("mlm"
)
Isolation Forest ("iForest"
) isolation.forest
Robust Mahalanobis Distance ("mahdist"
)
covMcd
Quantile distribution ("quantile"
)
Beta ("beta"
)
We defined candidate epimutation regions (found in candRegsGR) based on the 450K array design. As CpGs are not equally distributed along the genome, only CpGs closer to other CpGs can form an epimutation. More information can be found in candRegsGR documentation.
The function returns an object of class tibble containing the outliers regions. The results are composed by the following columns:
epi_id
: systematic name for each epimutation identified.
It provides the name of the used anomaly detection method.
sample
: the name of the sample containing the epimutation.
chromosome
, start
and end
:
indicate the location of the epimutation.
sz
: the window's size of the event.
cpg_n
: the number of CpGs in the epimutation.
cpg_ids
: the names of CpGs in the epimutation.
outlier_score
:
For method manova
it provides the approximation
to F-test and the Pillai score, separated by /
.
For method mlm
it provides the approximation to
F-test and the R2 of the model, separated by /
.
For method iForest
it provides
the magnitude of the outlier score.
For method beta
it provides the mean outlier p-value.
For methods quantile
and
mahdist
it is filled with NA.
outlier_direction
: indicates the direction
of the outlier with "hypomethylation"
and "hypermethylation"
For manova
, mlm
, iForest
, and mahdist
it is computed from the values obtained from bumphunter.
For quantile
it is computed from the location
of the sample in the reference distribution (left vs. right outlier).
For method beta
it return a NA.
pvalue
:
For methods manova
, mlm
, and iForest
it provides the p-value obtained from the model.
For method quantile
, mahdist
and beta
is filled with NA.
adj_pvalue
: for methods with p-value (manova
and
mlm
adjusted p-value with Benjamini-Hochberg based on the total
number of regions detected by Bumphunter.
epi_region_id
: Name of the epimutation region as defined
in candRegsGR
.
CRE
: cREs (cis-Regulatory Elements) as defined by ENCODE
overlapping the epimutation region. Different cREs are separated by ;.
CRE_type
: Type of cREs (cis-Regulatory Elements) as defined
by ENCODE. Different type are separeted by,
and different cREs are separated by ;.
data(GRset) #Find epimutations in GSM2562701 sample of GRset dataset case_samples <- GRset[,11] control_panel <- GRset[,1:10] epimutations(case_samples, control_panel, method = "manova")
data(GRset) #Find epimutations in GSM2562701 sample of GRset dataset case_samples <- GRset[,11] control_panel <- GRset[,1:10] epimutations(case_samples, control_panel, method = "manova")
This function is similar to epimutations with the particularity that when is more than one case sample, the remaining case samples are included as controls.
epimutations_one_leave_out( methy, method = "manova", epi_params = epi_parameters(), BPPARAM = BiocParallel::SerialParam(), verbose = TRUE, ... )
epimutations_one_leave_out( methy, method = "manova", epi_params = epi_parameters(), BPPARAM = BiocParallel::SerialParam(), verbose = TRUE, ... )
methy |
a GenomicRatioSet object containing the samples for the analysis. See the constructor function GenomicRatioSet, makeGenomicRatioSetFromMatrix. |
method |
a character string naming the
outlier detection method to be used.
This can be set as:
|
epi_params |
the parameters for each method. See the function epi_parameters. |
BPPARAM |
( |
verbose |
logical. If TRUE additional details about the procedure will provide to the user. The default is TRUE. |
... |
Further parameters passed to |
The function compares a case sample against a control panel to identify epimutations in the given sample. First, the DMRs are identified using the bumphunter approach. After that, CpGs in those DMRs are tested in order to detect regions with CpGs being outliers. For that, different anomaly detection methods can be selected:
Multivariate Analysis of Variance
("manova"
). manova
Multivariate Linear Model ("mlm"
)
Isolation Forest ("iForest"
)
isolation.forest
Robust Mahalanobis Distance ("mahdist"
)
covMcd
Barbosa ("barbosa"
)
The function returns an object of class tibble containing the outliers regions. The results are composed by the following columns:
epi_id
: the name of the anomaly detection method that
has been used to detect the epimutation
sample
: the name of the sample where the epimutation was found.
chromosome
, start
and end
:
indicate the location of the epimutation.
sz
: the number of base pairs in the region.
cpg_n
: number of CpGs in the region.
cpg_ids
: differentially methylated CpGs names.
outlier_score
:
For method manova
it provides the approximation
to F-test and the Pillai score, separated by /
.
For method mlm
it provides the approximation to
F-test and the R2 of the model, separated by /
.
For method iForest
it provides the
magnitude of the outlier score.
For methods barbosa
and mahdist
is filled with NA.
outlier_significance
:
For methods manova
, mlm
, and iForest
it provides the p-value obtained from the model.
For method barbosa
and mahdist
is filled with NA.
outlier_direction
: indicates the direction of
the outlier with "hypomethylation"
and "hypermethylation"
For manova
, mlm
, iForest
, and mahdist
it is computed from the values obtained from bumphunter.
For barbosa
it is computed from the location of
the sample in the reference distribution (left vs. right outlier).
data(GRset) manova_result <- epimutations_one_leave_out(GRset, method = "manova")
data(GRset) manova_result <- epimutations_one_leave_out(GRset, method = "manova")
Load candidate regions to be epimutations
from epimutacionsData
package in ExperimentHub
.
get_candRegsGR()
get_candRegsGR()
The function returns a GRanges object containing the candidate regions.
This function queries for ENSEMBL regulatory features and collapse them to return a single record.
get_ENSEMBL_data(chromosome, start, end, mart)
get_ENSEMBL_data(chromosome, start, end, mart)
chromosome |
Chromosome of the region |
start |
Start of the region |
end |
End of the region |
mart |
|
data.frame
of one row with the ENSEMBL regulatory
regions overlapping the genomic coordinate.
Model methylation as a beta distribution
getBetaParams(x)
getBetaParams(x)
x |
Matrix of methylation expressed as a beta. CpGs are in columns and samples in rows. |
Beta distribution.
A small GenomicRatioSet object to use in the functions examples containing 10 control samples and a case sample.
data(GRset)
data(GRset)
A GenomicRatioSet object with 4243 CpGs and 11 variables
A GenomicRatioSet object with 4243 CpGs and 11 variables
data(GRset)
data(GRset)
This function collapses the activity status of a given an ENSEMBL regulatory element in different tissues. Notice that tissues identified as inactive will not be reported.
merge_records(tab)
merge_records(tab)
tab |
Results from |
data.frame
of one row after collapsing the
Fits a multivariate linear model and computes test statistics and asymptotic P-values for predictors in a non-parametric manner.
mlm( formula, data, transform = "none", contrasts = NULL, subset = NULL, fit = FALSE )
mlm( formula, data, transform = "none", contrasts = NULL, subset = NULL, fit = FALSE )
formula |
object of class " |
data |
an optional data frame,
list or environment (or object coercible
by |
transform |
transformation
of the response variables: " |
contrasts |
an optional list.
See |
subset |
subset of predictors for which
summary statistics will be
reported. Note that this is different
from the " |
fit |
logical. If |
A Y
matrix is obtained after transforming
(optionally) and centering
the original response variables.
Then, the multivariate fit obtained by
lm
can be used to
compute sums of squares, pseudo-F statistics and asymptotic
P-values for the terms specified by the formula
in a non-parametric manner.
mlm
returns an object
of class
"MLM"
,
a list containing:
call |
the matched call. |
aov.tab |
ANOVA table with Df, Sum Sq, Mean Sq, F values, partial R-squared and P-values. |
precision |
the precision in P-value computation. |
transform |
the transformation applied to the response variables. |
na.omit |
incomplete cases removed
(see |
fit |
if |
Diego Garrido-Martín
norm_parameters
function allows the user
to set the values of the parameters to compute the functions
epi_preprocess
.
norm_parameters( illumina = list(bg.correct = TRUE, normalize = c("controls", "no"), reference = 1), quantile = list(fixOutliers = TRUE, removeBadSamples = FALSE, badSampleCutoff = 10.5, quantileNormalize = TRUE, stratified = TRUE, mergeManifest = FALSE, sex = NULL), noob = list(offset = 15, dyeCorr = TRUE, dyeMethod = c("single", "reference")), funnorm = list(nPCs = 2, sex = NULL, bgCorr = TRUE, dyeCorr = TRUE, keepCN = FALSE) )
norm_parameters( illumina = list(bg.correct = TRUE, normalize = c("controls", "no"), reference = 1), quantile = list(fixOutliers = TRUE, removeBadSamples = FALSE, badSampleCutoff = 10.5, quantileNormalize = TRUE, stratified = TRUE, mergeManifest = FALSE, sex = NULL), noob = list(offset = 15, dyeCorr = TRUE, dyeMethod = c("single", "reference")), funnorm = list(nPCs = 2, sex = NULL, bgCorr = TRUE, dyeCorr = TRUE, keepCN = FALSE) )
illumina , quantile , noob , funnorm
|
preprocess method selected in the function epi_preprocess. |
bg.correct |
logical. If TRUE background
correction will be performed in |
normalize |
logical. If TRUE control
normalization will be performed in
|
reference |
numeric.
The reference array for control normalization in
|
fixOutliers |
logical. If TRUE low outlier
Meth and Unmeth signals will be fixed
in |
removeBadSamples |
logical. If TRUE bad samples will be removed. |
badSampleCutoff |
a numeric specifying
the cutoff to label samples
as 'bad' in |
quantileNormalize |
logical. If TRUE quantile
normalization will be performed in
|
stratified |
logical.
If TRUE quantile normalization will be performed
within region strata in |
mergeManifest |
logical. If TRUE the
information in the associated manifest
package will be merged into the output
object in |
offset |
a numeric specifying an offset for
the normexp background correction
in |
dyeCorr |
logial. Dye correction will
be done in |
dyeMethod |
specify the dye
bias correction to be done, single sample
approach or a reference array in |
nPCs |
numeric specifying
the number of principal components
from the control probes PCA in
|
sex |
an optional numeric vector
containing the sex of the samples in
|
bgCorr |
logical.
If TRUE NOOB background correction will be done prior
to functional normalization.
in |
keepCN |
logical. If TRUE copy number estimates
will be kept in |
Invoking epi_parameters()
with no
arguments returns a list with the
default values for each normalization parameter.
the function returns a list of all
set parameters for each normalization method used in
epi_peprocess
.
#Default set of parameters norm_parameters() #change p value for manova method norm_parameters(illumina = list("bg.correct" = FALSE))
#Default set of parameters norm_parameters() #change p value for manova method norm_parameters(illumina = list("bg.correct" = FALSE))
This function plots a given epimutation and UCSC annotations for the specified genomic region.
plot_epimutations( dmr, methy, genome = "hg19", genes_annot = FALSE, regulation = FALSE, from = NULL, to = NULL )
plot_epimutations( dmr, methy, genome = "hg19", genes_annot = FALSE, regulation = FALSE, from = NULL, to = NULL )
dmr |
epimutation obtained as a result of epimutations function. |
methy |
a GenomicRatioSet object containing the information of control and case samples used for the analysis in the epimutations function. See the constructor function GenomicRatioSet, makeGenomicRatioSetFromMatrix. |
genome |
a character string
specifying the genome of reference.
It can be set as |
genes_annot |
a boolean. If TRUE gene annotations are plotted. Default is FALSE. |
regulation |
a boolean.
If TRUE UCSC annotations
for CpG Islands, H3K27Ac, H3K4Me3
and H3K27Me3 are plotted. The default is FALSE.
The running process when |
from , to
|
scalar, specifying the
range of genomic coordinates
for the plot of gene annotation region.
If |
The tracks are plotted vertically. Each track is separated by different background colour and a section title. The colours and titles are preset and cannot be set by the user.
Note that if you want to see the UCSC annotations maybe you need to take a bigger genomic region.
The function returns a plot divided in two parts:
ggplot graph including the individual with the epimutation in red, the control samples in dashed black lines and population mean in blue. Grey shaded regions indicate 1, 1.5 and 2 standard deviations from the mean of the distribution.
UCSC gene annotations for the specified genomic
region (if genes == TRUE
)
UCSC annotations for CpG Islands, H3K27Ac,
H3K4Me3 and H3K27Me3 (if regulation == TRUE
)
data(GRset) data(res.epi.manova) plot_epimutations(res.epi.manova[1,], GRset)
data(GRset) data(res.epi.manova) plot_epimutations(res.epi.manova[1,], GRset)
Process data from ENSEMBL to combine results from the same regulatory elements in a unique record.
process_ENSEMBL_results(ensembl_res)
process_ENSEMBL_results(ensembl_res)
ensembl_res |
Results from |
data.frame
of one row after collapsing
the input ENSEMBL regulatory regions
Creates a data frame containing the genomic regions, statistics and direction for the DMRs.
res_iForest(bump, sts, outlier_score_cutoff)
res_iForest(bump, sts, outlier_score_cutoff)
bump |
a DMR obtained from bumphunter (i.e. a row from bumphunter method result). |
sts |
the outlier score from epi_iForest function results. |
outlier_score_cutoff |
numeric specifying the outlier score cut off |
The function returns a data frame containing the following information for each DMR:
genomic ranges
DMR base pairs
number and name of CpGs in DMR
statistics:
Outlier score
Outlier significance
Outlier direction
Sample name
For more information about the output see epimutations.
Creates a data frame containing the genomic regions, statistics and direction for the DMRs.
res_mahdist(case, bump, outliers)
res_mahdist(case, bump, outliers)
case |
a character string specifying the case sample name. |
bump |
a DMR obtained from bumphunter (i.e. a row from bumphunter method result). |
outliers |
the robust distance computed by epi_mahdist function results. |
The function returns a data frame containing the following information for each DMR:
genomic ranges
DMR base pairs
number and name of CpGs in DMR
statistics:
Outlier score
Outlier significance
Outlier direction
Sample name
For more information about the output see epimutations.
Creates a data frame containing the genomic regions, statistics and direction for the DMRs.
res_manova(bump, sts)
res_manova(bump, sts)
bump |
a DMR obtained from bumphunter (i.e. a row from bumphunter method result). |
sts |
F statistic, Pillai and P value from epi_manova function results. |
The function returns a data frame containing the following information for each DMR:
genomic ranges
DMR base pairs
number and name of CpGs in DMR
statistics:
Outlier score
Outlier significance
Outlier direction
Sample name
For more information about the output see epimutations.
Creates a data frame containing the genomic regions, statistics and direction for the DMRs.
res_mlm(bump, sts)
res_mlm(bump, sts)
bump |
a DMR obtained from bumphunter (i.e. a row from bumphunter method result). |
sts |
the F statistic, R2 test statistic and Pillai obtained as a result of epi_mlm function. |
The function returns a data frame containing the following information for each DMR:
genomic ranges
DMR base pairs
number and name of CpGs in DMR
statistics:
Outlier score
Outlier significance
Outlier direction
Sample name
For more information about the output see epimutations.
A data frame containing the results of epimutations
function using
"manova"
methods for GRset
dataset. For more information
see the example of epimutations
function.
data(res.epi.manova)
data(res.epi.manova)
A data frame with 16 variables and 6 epimutations.
A data frame with 16 variables and 6 epimutations.
data(res.epi.manova)
data(res.epi.manova)
UCSC gene annotations for a given genome assembly.
UCSC_annotation(genome = "hg19")
UCSC_annotation(genome = "hg19")
genome |
genome asambly. Can be set as:
|
The function returns gene annotations for the specified genome assembly.
UCSC annotations for CpG Islands, H3K27Ac and H3K4Me3 for a given genome assembly and genomic coordinates.
UCSC_regulation(genome, chr, from, to)
UCSC_regulation(genome, chr, from, to)
genome |
genome asambly. Can be set as:
|
chr |
a character string containing the sequence names to be analysed. |
from , to
|
scalar, specifying the range of genomic coordinates.
Note that |
UCSC_regulation
returns
a list containing CpG Islands, H3K27Ac and H3K4Me3 tacks.