Title: | Fast and efficient summarization of generic bedGraph files from Bisufite sequencing |
---|---|
Description: | Bedgraph files generated by Bisulfite pipelines often come in various flavors. Critical downstream step requires summarization of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions. |
Authors: | Anand Mayakonda [aut, cre] , Reka Toth [aut] , Rajbir Batra [ctb], Clarissa Feuerstein-Akgöz [ctb], Joschka Hey [ctb], Maximilian Schönung [ctb], Pavlo Lutsik [ctb] |
Maintainer: | Anand Mayakonda <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.21.0 |
Built: | 2024-10-30 08:52:29 UTC |
Source: | https://github.com/bioc/methrix |
Combine methrix objects
combine_methrix(m1, m2, by = c("row", "col"))
combine_methrix(m1, m2, by = c("row", "col"))
m1 |
Frist |
m2 |
Second |
by |
The direction of combine. 'column' (cbind) combines samples with same regions, 'row' combines different regions, e.g. different chromosomes. |
Takes two methrix
objects and combines them row- or column-wise
An object of class methrix
Converts HDF5 methrix object to standard in-memory object.
convert_HDF5_methrix(m = NULL)
convert_HDF5_methrix(m = NULL)
m |
An object of class |
Takes a methrix
object and returns with the same object with in-memory assay slots.
An object of class methrix
data(methrix_data) m2 <- convert_methrix(m=methrix_data) m <- convert_HDF5_methrix(m=m2)
data(methrix_data) m2 <- convert_methrix(m=methrix_data) m <- convert_HDF5_methrix(m=m2)
Converts an in-memory object to an on-disk HDF5 object.
convert_methrix(m = NULL)
convert_methrix(m = NULL)
m |
An object of class |
Takes a methrix
object and returns with the same object with delayed array assay slots
with HDF5 backend. Might take long time!
An object of class methrix
, HDF5 format
data(methrix_data) m2 <- convert_methrix(m=methrix_data)
data(methrix_data) m2 <- convert_methrix(m=methrix_data)
Filter matrices by coverage
coverage_filter( m, cov_thr = 1, min_samples = 1, prop_samples = 0, group = NULL, n_chunks = 1, n_cores = 1 )
coverage_filter( m, cov_thr = 1, min_samples = 1, prop_samples = 0, group = NULL, n_chunks = 1, n_cores = 1 )
m |
|
cov_thr |
minimum coverage required to call a loci covered |
min_samples |
Minimum number of samples that should have a loci with coverage >= |
prop_samples |
Minimum proportion of samples that should have a loci with coverage >= |
group |
a column name from sample annotation that defines groups. In this case, the number of min_samples will be tested group-wise. |
n_chunks |
Number of chunks to split the |
n_cores |
Number of parallel instances. |
Takes methrix
object and filters CpGs based on coverage statistics
An object of class methrix
data('methrix_data') #keep only CpGs which are covered by at-least 1 read across 3 samples coverage_filter(m = methrix_data, cov_thr = 1, min_samples = 3)
data('methrix_data') #keep only CpGs which are covered by at-least 1 read across 3 samples coverage_filter(m = methrix_data, cov_thr = 1, min_samples = 3)
Extracts all CpGs from a genome
extract_CPGs(ref_genome = NULL)
extract_CPGs(ref_genome = NULL)
ref_genome |
BSgenome object or name of the installed BSgenome package. Example: BSgenome.Hsapiens.UCSC.hg19 |
a list of data.table containing number of CpG's and contig lengths
## Not run: hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19') ## End(Not run)
## Not run: hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19') ## End(Not run)
Extract methylation or coverage matrices
get_matrix(m, type = "M", add_loci = FALSE, in_granges = FALSE)
get_matrix(m, type = "M", add_loci = FALSE, in_granges = FALSE)
m |
|
type |
can be |
add_loci |
Default FALSE. If TRUE adds CpG position info to the matrix and returns as a data.table |
in_granges |
Do you want the outcome in |
Takes methrix
object and returns user specified methylation
or coverage
matrix
Coverage or Methylation matrix
data('methrix_data') #Get methylation matrix get_matrix(m = methrix_data, type = 'M') #Get methylation matrix along with loci get_matrix(m = methrix_data, type = 'M', add_loci = TRUE) #' #Get methylation data as a GRanges object get_matrix(m = methrix_data, type = 'M', add_loci = TRUE, in_granges=TRUE)
data('methrix_data') #Get methylation matrix get_matrix(m = methrix_data, type = 'M') #Get methylation matrix along with loci get_matrix(m = methrix_data, type = 'M', add_loci = TRUE) #' #Get methylation data as a GRanges object get_matrix(m = methrix_data, type = 'M', add_loci = TRUE, in_granges=TRUE)
Extract and summarize methylation or coverage info by regions of interest
get_region_summary( m, regions = NULL, type = "M", how = "mean", overlap_type = "within", na_rm = TRUE, elementMetadata.col = NULL, verbose = TRUE, n_chunks = 1, n_cores = 1 )
get_region_summary( m, regions = NULL, type = "M", how = "mean", overlap_type = "within", na_rm = TRUE, elementMetadata.col = NULL, verbose = TRUE, n_chunks = 1, n_cores = 1 )
m |
|
regions |
genomic regions to be summarized. Could be a data.table with 3 columns (chr, start, end) or a |
type |
matrix which needs to be summarized. Coule be ‘M', 'C'. Default ’M' |
how |
mathematical function by which regions should be summarized. Can be one of the following: mean, sum, max, min. Default 'mean' |
overlap_type |
defines the type of the overlap of the CpG sites with the target region. Default value is 'within'. For detailed description,
see the |
na_rm |
Remove NA's? Default |
elementMetadata.col |
columns in |
verbose |
Default TRUE |
n_chunks |
Number of chunks to split the |
n_cores |
Number of parallel instances. |
Takes methrix
object and summarizes regions
a coverage or methylation matrix
data('methrix_data') get_region_summary(m = methrix_data, regions = data.table(chr = 'chr21', start = 27867971, end = 27868103), type = 'M', how = 'mean')
data('methrix_data') get_region_summary(m = methrix_data, regions = data.table(chr = 'chr21', start = 27867971, end = 27868103), type = 'M', how = 'mean')
Estimate descriptive statistics
get_stats(m, per_chr = TRUE)
get_stats(m, per_chr = TRUE)
m |
|
per_chr |
Estimate stats per chromosome. Default TRUE |
Calculate descriptive statistics
data.table of summary stats
data('methrix_data') get_stats(methrix_data)
data('methrix_data') get_stats(methrix_data)
Loads HDF5 methrix object
load_HDF5_methrix(dir = NULL, ...)
load_HDF5_methrix(dir = NULL, ...)
dir |
The directory to read in from. Default NULL |
... |
Parameters to pass to loadHDF5SummarizedExperiment |
Takes directory with a previously saved HDF5Array format methrix
object and loads it
An object of class methrix
data('methrix_data') methrix_data_h5 <- convert_methrix(m=methrix_data) target_dir = paste0(getwd(), '/temp1/') save_HDF5_methrix(methrix_data_h5, dir = target_dir, replace = TRUE) load_HDF5_methrix(target_dir)
data('methrix_data') methrix_data_h5 <- convert_methrix(m=methrix_data) target_dir = paste0(getwd(), '/temp1/') save_HDF5_methrix(methrix_data_h5, dir = target_dir, replace = TRUE) load_HDF5_methrix(target_dir)
Masks too high or too low coverage
mask_methrix(m, low_count = NULL, high_quantile = 0.99, n_cores = 1)
mask_methrix(m, low_count = NULL, high_quantile = 0.99, n_cores = 1)
m |
|
low_count |
The minimal coverage allowed. Everything below, will get masked. Default = NULL, nothing gets masked. |
high_quantile |
The quantile limit of coverage. Quantiles are calculated for each sample and everything that belongs to a higher quantile than the defined will be masked. Default = 0.99. |
n_cores |
Number of parallel instances. Can only be used if |
Takes methrix
object and masks sites with too high or too low coverage
by putting NA for coverage and beta value. The sites will remain in the object.
An object of class methrix
data('methrix_data') mask_methrix(m = methrix_data, low_count = 5, high_quantile = 0.99 )
data('methrix_data') mask_methrix(m = methrix_data, low_count = 5, high_quantile = 0.99 )
This is a subset of original 'bsseqData' converted to 'methrix' containing Whole-genome bisulfite sequencing data (WGBS) for colon cancer on chromosome 21 and 22.
data('methrix_data')
data('methrix_data')
An object of class 'methrix'
Hansen, K. D. et al. (2011) Increased methylation variation in epigenetic domains across cancer types. Nature Genetics 43, 768-775.
data('methrix_data') methrix_data
data('methrix_data') methrix_data
Principal Component Analysis
methrix_pca( m, var = "top", top_var = 1000, ranges = NULL, pheno = NULL, do_plot = TRUE, n_pc = 2 )
methrix_pca( m, var = "top", top_var = 1000, ranges = NULL, pheno = NULL, do_plot = TRUE, n_pc = 2 )
m |
Input |
var |
Choose between random CpG sites ('rand') or most variable CpGs ('top'). |
top_var |
Number of variable CpGs to use. Default 1000 Set it to NULL to use all CpGs (which is not recommended due to memory requirements). This option is mutually exclusive with |
ranges |
genomic regions to be summarized. Could be a data.table with 3 columns (chr, start, end) or a |
pheno |
Column name of colData(m). Default NULL. Will be used as a factor to color different groups |
do_plot |
Should a plot be generated? |
n_pc |
Default 2. |
PCA results
data('methrix_data') methrix_pca(methrix_data, do_plot = FALSE)
data('methrix_data') methrix_pca(methrix_data, do_plot = FALSE)
Creates a detailed interative html summary report from Methrix object. If the directory contains required files (from previous run), it directly proceeds to generate html report.
methrix_report( meth, output_dir = NULL, recal_stats = FALSE, plot_beta_dist = TRUE, beta_nCpG = 10000, prefix = NULL, n_thr = 4 )
methrix_report( meth, output_dir = NULL, recal_stats = FALSE, plot_beta_dist = TRUE, beta_nCpG = 10000, prefix = NULL, n_thr = 4 )
meth |
|
output_dir |
Output directory name where the files should be saved. If |
recal_stats |
Whether summary statistics should be recalculated? If you are using subsetted methrix object set this to TRUE. |
plot_beta_dist |
Default TRUE. Can be time consuming. |
beta_nCpG |
Number of CpGs rto use for estimating beta value distribution. Default 10000 |
prefix |
If provided, the name of the report and the intermediate files will start with the prefix. |
n_thr |
Default 4. Only used if |
an interactive html report
## Not run: data('methrix_data') methrix::methrix_report(meth = methrix_data) ## End(Not run)
## Not run: data('methrix_data') methrix::methrix_report(meth = methrix_data) ## End(Not run)
S4 class Methrix
assays
A list of tw omatrices containing 'Methylation' and 'Coverage' information
elementMetadata
A DataFrame describing rows in correspoding assay matrices.
colData
genome: the name of the BSgenome that was used to extract CpGs, isHDF5: is it stored in HDF5 Array format
metadata
a list of meta data associated with the assays
NAMES
NULL
methrix
to bsseq
objectConvert methrix
to bsseq
object
methrix2bsseq(m)
methrix2bsseq(m)
m |
|
Takes methrix
object and returns a bsseq
object
An object of class bsseq
## Not run: data('methrix_data') methrix2bsseq(m = methrix_data) ## End(Not run)
## Not run: data('methrix_data') methrix2bsseq(m = methrix_data) ## End(Not run)
Order mathrix object by SD
order_by_sd(m)
order_by_sd(m)
m |
|
Takes methrix
object and reorganizes the data by standard deviation
An object of class methrix
data('methrix_data') order_by_sd(m = methrix_data)
data('methrix_data') order_by_sd(m = methrix_data)
Coverage QC Plots
plot_coverage( m, type = c("hist", "dens"), pheno = NULL, perGroup = FALSE, lim = 100, size.lim = 1e+06, col_palette = "RdYlGn" )
plot_coverage( m, type = c("hist", "dens"), pheno = NULL, perGroup = FALSE, lim = 100, size.lim = 1e+06, col_palette = "RdYlGn" )
m |
Input |
type |
Choose between 'hist' (histogram) or 'dens' (density plot). |
pheno |
Column name of colData(m). Will be used as a factor to color different groups in the plot. |
perGroup |
Color the plots in a sample-wise manner? |
lim |
Maximum coverage value to be plotted. |
size.lim |
The maximum number of observarions (sites*samples) to use. If the dataset is larger that this, random sites will be selected from the genome. |
col_palette |
Name of the RColorBrewer palette to use for plotting. |
ggplot2 object
data('methrix_data') plot_coverage(m = methrix_data)
data('methrix_data') plot_coverage(m = methrix_data)
-ValuesDensity Plot of -Values
plot_density( m, ranges = NULL, n_cpgs = 25000, pheno = NULL, col_palette = "RdYlGn" )
plot_density( m, ranges = NULL, n_cpgs = 25000, pheno = NULL, col_palette = "RdYlGn" )
m |
Input |
ranges |
genomic regions to be summarized. Could be a data.table with 3 columns (chr, start, end) or a |
n_cpgs |
Use these many random CpGs for plotting. Default 25000. Set it to |
pheno |
Column name of colData(m). Will be used as a factor to color different groups in the violin plot. |
col_palette |
Name of the RColorBrewer palette to use for plotting. |
ggplot2 object
data('methrix_data') plot_density(m = methrix_data)
data('methrix_data') plot_density(m = methrix_data)
Plot PCA results
plot_pca( pca_res, m = NULL, col_anno = NULL, shape_anno = NULL, pc_x = "PC1", pc_y = "PC2", show_labels = FALSE )
plot_pca( pca_res, m = NULL, col_anno = NULL, shape_anno = NULL, pc_x = "PC1", pc_y = "PC2", show_labels = FALSE )
pca_res |
Results from |
m |
optinal methrix object. Default NULL |
col_anno |
Column name of colData(m). Default NULL. Will be used as a factor to color different groups. Required |
shape_anno |
Column name of colData(m). Default NULL. Will be used as a factor to shape different groups. Required |
pc_x |
Default 'PC1' |
pc_y |
Default 'PC2' |
show_labels |
Default FLASE |
ggplot2 object
data('methrix_data') mpc = methrix_pca(methrix_data, do_plot = FALSE) plot_pca(mpc)
data('methrix_data') mpc = methrix_pca(methrix_data, do_plot = FALSE) plot_pca(mpc)
Plot descriptive statistics
plot_stats( plot_dat, what = "M", stat = "mean", ignore_chr = NULL, samples = NULL, n_col = NULL, n_row = NULL )
plot_stats( plot_dat, what = "M", stat = "mean", ignore_chr = NULL, samples = NULL, n_col = NULL, n_row = NULL )
plot_dat |
results from |
what |
Can be |
stat |
Can be |
ignore_chr |
Chromsomes to ignore. Default |
samples |
Use only these samples. Default |
n_col |
number of columns. Passed to 'facet_wrap' |
n_row |
number of rows. Passed to 'facet_wrap' |
plot descriptive statistics results from get_stats
ggplot2 object
data('methrix_data') gs = get_stats(methrix_data) plot_stats(gs)
data('methrix_data') gs = get_stats(methrix_data) plot_stats(gs)
-ValuesViolin Plot for -Values
plot_violin( m, ranges = NULL, n_cpgs = 25000, pheno = NULL, col_palette = "RdYlGn" )
plot_violin( m, ranges = NULL, n_cpgs = 25000, pheno = NULL, col_palette = "RdYlGn" )
m |
Input |
ranges |
genomic regions to be summarized. Could be a data.table with 3 columns (chr, start, end) or a |
n_cpgs |
Use these many random CpGs for plotting. Default 25000. Set it to |
pheno |
Column name of colData(m). Will be used as a factor to color different groups in the violin plot. |
col_palette |
Name of the RColorBrewer palette to use for plotting. |
ggplot2 object
data('methrix_data') plot_violin(m = methrix_data)
data('methrix_data') plot_violin(m = methrix_data)
Versatile BedGraph reader.
read_bedgraphs( files = NULL, pipeline = NULL, zero_based = TRUE, stranded = FALSE, collapse_strands = FALSE, ref_cpgs = NULL, ref_build = NULL, contigs = NULL, vect = FALSE, vect_batch_size = NULL, coldata = NULL, chr_idx = NULL, start_idx = NULL, end_idx = NULL, beta_idx = NULL, M_idx = NULL, U_idx = NULL, strand_idx = NULL, cov_idx = NULL, synced_coordinates = FALSE, n_threads = 1, h5 = FALSE, h5_dir = NULL, h5temp = NULL, verbose = TRUE )
read_bedgraphs( files = NULL, pipeline = NULL, zero_based = TRUE, stranded = FALSE, collapse_strands = FALSE, ref_cpgs = NULL, ref_build = NULL, contigs = NULL, vect = FALSE, vect_batch_size = NULL, coldata = NULL, chr_idx = NULL, start_idx = NULL, end_idx = NULL, beta_idx = NULL, M_idx = NULL, U_idx = NULL, strand_idx = NULL, cov_idx = NULL, synced_coordinates = FALSE, n_threads = 1, h5 = FALSE, h5_dir = NULL, h5temp = NULL, verbose = TRUE )
files |
bedgraph files. |
pipeline |
Default NULL. Currently supports "Bismark_cov", "MethylDackel", "MethylcTools", "BisSNP", "BSseeker2_CGmap" If not known use idx arguments for manual column assignments. |
zero_based |
Are bedgraph regions zero based ? Default TRUE |
stranded |
Default FALSE |
collapse_strands |
If TRUE collapses CpGs on different crick strand into watson. Deafult FALSE |
ref_cpgs |
BSgenome object, or name of the installed BSgenome package, or an output from |
ref_build |
reference genome for bedgraphs. Default NULL. Only used for additional details. Doesnt affect in any way. |
contigs |
contigs to restrict genomic CpGs to. Default all autosomes and allosomes - ignoring extra contigs. |
vect |
To use vectorized code. Default FALSE. Set to TRUE if you don't have large number of BedGraph files. |
vect_batch_size |
Default NULL. Process samples in batches. Applicable only when vect = TRUE |
coldata |
An optional DataFrame describing the samples. Row names, if present, become the column names of the matrix. If NULL, then a DataFrame will be created with basename of files used as the row names. |
chr_idx |
column index for chromosome in bedgraph files |
start_idx |
column index for start position in bedgraph files |
end_idx |
column index for end position in bedgraph files |
beta_idx |
column index for beta values in bedgraph files |
M_idx |
column index for read counts supporting Methylation in bedgraph files |
U_idx |
column index for read counts supporting Un-methylation in bedgraph files |
strand_idx |
column index for strand information in bedgraph files |
cov_idx |
column index for total-coverage in bedgraph files |
synced_coordinates |
Are the start and end coordinates of a stranded bedgraph are synchronized between + and - strands? Possible values: FALSE (default), TRUE if the start coordinates are the start coordinates of the C on the plus strand. |
n_threads |
number of threads to use. Default 1. Be-careful - there is a linear increase in memory usage with number of threads. This option is does not work with Windows OS. |
h5 |
Should the coverage and methylation matrices be stored as 'HDF5Array' |
h5_dir |
directory to store H5 based object |
h5temp |
temporary directory to store hdf5 |
verbose |
Be little chatty ? Default TRUE. |
Reads BedGraph files and generates methylation and coverage matrices. Optionally arrays can be serialized as on-disk HDFS5 arrays.
An object of class methrix
## Not run: bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = '*\\.bedGraph\\.gz$', full.names = TRUE) hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19') meth = methrix::read_bedgraphs( files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4, stranded = FALSE, zero_based = FALSE, collapse_strands = FALSE) ## End(Not run)
## Not run: bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = '*\\.bedGraph\\.gz$', full.names = TRUE) hg19_cpgs = methrix::extract_CPGs(ref_genome = 'BSgenome.Hsapiens.UCSC.hg19') meth = methrix::read_bedgraphs( files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4, stranded = FALSE, zero_based = FALSE, collapse_strands = FALSE) ## End(Not run)
Filter matrices by region
region_filter(m, regions, type = "within")
region_filter(m, regions, type = "within")
m |
|
regions |
genomic regions to filter-out. Could be a data.table with 3 columns (chr, start, end) or a |
type |
defines the type of the overlap of the CpG sites with the target regions. Default value is 'within'. For detailed description,
see the |
Takes methrix
object and filters CpGs based on supplied regions in data.table or GRanges format
An object of class methrix
data('methrix_data') region_filter(m = methrix_data, regions = data.table(chr = 'chr21', start = 27867971, end = 27868103))
data('methrix_data') region_filter(m = methrix_data, regions = data.table(chr = 'chr21', start = 27867971, end = 27868103))
Removes CpG sites from the object if they overlap with common SNPs
remove_snps( m, populations = NULL, maf_threshold = 0.01, reduce_filtering = FALSE, forced = FALSE, keep = FALSE, n_chunks = 1, n_cores = 1 )
remove_snps( m, populations = NULL, maf_threshold = 0.01, reduce_filtering = FALSE, forced = FALSE, keep = FALSE, n_chunks = 1, n_cores = 1 )
m |
|
populations |
Populations to use. Default is all. |
maf_threshold |
The frequency threshold, above which the SNPs will be removed. Default is 0.01 |
reduce_filtering |
If TRUE, the SNPs with a MAF < 0.1 will be evaluated and only the highly variable ones will be removed. Default FALSE. |
forced |
the reduce_filtering is not recommended with less than 10 samples, but can be forced. Default is FALSE. |
keep |
Do you want to keep the sites that were filtered out? In this case, the function will return with a list of wo methrix objects. |
n_chunks |
Number of chunks to split the |
n_cores |
Number of parallel instances. Can only be used if input data is in HDF5 format. |
Takes methrix
object and removes common SNPs. SNPs overlapping with a CpG site and have a minor allele frequency (MAF)
above a threshold in any of the populations used will be selected and the corresponding CpG sites will be removed from the methrix
object.
With the reduce_filtering option, SNPs with MAP < 0.1 will be further evaluated. If they show low variance in the dataset, there is probably no genotype
variability in the population, therefore the corresponding CpG site won't be removed. Please keep in mind that variance thresholds are
methrix object or a list of methrix objects
data('methrix_data') remove_snps(m = methrix_data, maf_threshold=0.01)
data('methrix_data') remove_snps(m = methrix_data, maf_threshold=0.01)
Remove loci that are uncovered across all samples
remove_uncovered(m)
remove_uncovered(m)
m |
|
Takes methrix
object and removes loci that are uncovered across all samples
An object of class methrix
data('methrix_data') remove_uncovered(m = methrix_data)
data('methrix_data') remove_uncovered(m = methrix_data)
Saves HDF5 methrix object
save_HDF5_methrix(m = NULL, dir = NULL, replace = FALSE, ...)
save_HDF5_methrix(m = NULL, dir = NULL, replace = FALSE, ...)
m |
|
dir |
The directory to use. Created, if not existing. Default NULL |
replace |
Should it overwrite the pre-existing data? FALSE by default. |
... |
Parameters to pass to saveHDF5SummarizedExperiment |
Takes methrix
object and saves it
Nothing
data('methrix_data') methrix_data_h5 <- convert_methrix(m=methrix_data) target_dir = paste0(getwd(), '/temp/') save_HDF5_methrix(methrix_data_h5, dir = target_dir, replace = TRUE)
data('methrix_data') methrix_data_h5 <- convert_methrix(m=methrix_data) target_dir = paste0(getwd(), '/temp/') save_HDF5_methrix(methrix_data_h5, dir = target_dir, replace = TRUE)
methrix
object based on given conditions.Subsets methrix
object based on given conditions.
subset_methrix( m, regions = NULL, contigs = NULL, samples = NULL, overlap_type = "within" )
subset_methrix( m, regions = NULL, contigs = NULL, samples = NULL, overlap_type = "within" )
m |
|
regions |
genomic regions to subset by. Could be a data.table with 3 columns (chr, start, end) or a |
contigs |
chromosome names to subset by |
samples |
sample names to subset by |
overlap_type |
defines the type of the overlap of the CpG sites with the target region. Default value is 'within'. For detailed description,
see the |
Takes methrix
object and filters CpGs based on coverage statistics
An object of class methrix
data('methrix_data') #Subset to chromosome 1 subset_methrix(methrix_data, contigs = 'chr21')
data('methrix_data') #Subset to chromosome 1 subset_methrix(methrix_data, contigs = 'chr21')
Writes bedGraphs from methrix object
write_bedgraphs( m, output_dir = NULL, rm_NA = TRUE, force = FALSE, n_thr = 4, compress = TRUE, SeqStyle = "UCSC", multiBed = NULL, metilene = FALSE, phenoCol = NULL, add_coverage = FALSE )
write_bedgraphs( m, output_dir = NULL, rm_NA = TRUE, force = FALSE, n_thr = 4, compress = TRUE, SeqStyle = "UCSC", multiBed = NULL, metilene = FALSE, phenoCol = NULL, add_coverage = FALSE )
m |
|
output_dir |
Output directory name where the files should be saved.
If |
rm_NA |
remove NAs |
force |
forces to create files if they are existing |
n_thr |
Default 4. |
compress |
Whether to compress the output. Default TRUE |
SeqStyle |
Default 'UCSC' with 'chr' prefix. |
multiBed |
Default NULL. If provided a filename, a single bedGraph file with all samples included is generated. |
metilene |
Default FALSE. If TRUE outputs bedgraphs in 'metilene' format that can be directly used for DMR calling with 'metilene'. This option works only when |
phenoCol |
Default NULL. 'condition' column from colData. Only applicable if |
add_coverage |
Should the output file contain information on coverage? Default FALSE |
writes bedgraph files to output
data('methrix_data') write_bedgraphs(m = methrix_data, output_dir = './temp') #Export to metline format for DMR calling with metline write_bedgraphs(m = methrix_data, output_dir = "./temp", rm_NA = FALSE, metilene = TRUE,multiBed = "metline_ip", phenoCol = "Condition")
data('methrix_data') write_bedgraphs(m = methrix_data, output_dir = './temp') #Export to metline format for DMR calling with metline write_bedgraphs(m = methrix_data, output_dir = "./temp", rm_NA = FALSE, metilene = TRUE,multiBed = "metline_ip", phenoCol = "Condition")
Exports methrix object as bigWigs
write_bigwigs(m, output_dir = getwd(), samp_names = NULL)
write_bigwigs(m, output_dir = getwd(), samp_names = NULL)
m |
|
output_dir |
Output directory name where the files should be saved. Default getwd() |
samp_names |
sample names to export |
data('methrix_data') write_bigwigs(m = methrix_data, output_dir = './temp')
data('methrix_data') write_bigwigs(m = methrix_data, output_dir = './temp')