Title: | Finds DAMEs - Differential Allelicly MEthylated regions |
---|---|
Description: | 'DAMEfinder' offers functionality for taking methtuple or bismark outputs to calculate ASM scores and compute DAMEs. It also offers nice visualization of methyl-circle plots. |
Authors: | Stephany Orjuela [aut, cre] , Dania Machlab [aut], Mark Robinson [aut] |
Maintainer: | Stephany Orjuela <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.19.0 |
Built: | 2024-10-30 05:23:19 UTC |
Source: | https://github.com/bioc/DAMEfinder |
This function takes in a list of samples resulting from the read_tuples function and returns a SummarizedExperiment of Allele-Specific Methylation (ASM) scores, where each row is a tuple and each column is a sample.
calc_asm( sampleList, beta = 0.5, a = 0.2, transform = modulus_sqrt, coverage = 5, verbose = TRUE )
calc_asm( sampleList, beta = 0.5, a = 0.2, transform = modulus_sqrt, coverage = 5, verbose = TRUE )
sampleList |
List of samples returned from |
beta |
The beta parameter used to calculate the weight in the ASM score.
|
a |
The distance from 0.5 allowed, where 0.5 is a perfect MM:UU balance for a tuple. In the default mode this value is set to 0.2, and we account for the instances where the balance is between 0.3 and 0.7. |
transform |
Transform the calculated tuple ASM scores. We use the modulus square root function which outputs the square root, while preserving the original sign. |
coverage |
Remove tuples with total reads below coverage. Default = 5. |
verbose |
If the function should be verbose. Default = TRUE. |
Calculates ASM score for a list of samples in the output format of the result of read_tuples This functions uses the following other functions: process, calcScore, calcWeight.
A SummarizedExperiment
of ASM scores where the rows are all
the tuples and the columns the sample names.
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) tuple_files <- list.files(DATA_PATH_DIR, '.tsv.gz') tuple_files <- get_data_path(tuple_files) ASM <- read_tuples(tuple_files, c('CRC1', 'NORM1')) ASMscore <- calc_asm(ASM)
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) tuple_files <- list.files(DATA_PATH_DIR, '.tsv.gz') tuple_files <- get_data_path(tuple_files) ASM <- read_tuples(tuple_files, c('CRC1', 'NORM1')) ASMscore <- calc_asm(ASM)
Combines all the GRangeslist
generated in extract_bams
into a RangedSummarizedExperiment
object, and calculates
SNP-based allele-specific methylation.
calc_derivedasm(sampleList, cores = 1, verbose = TRUE)
calc_derivedasm(sampleList, cores = 1, verbose = TRUE)
sampleList |
List of samples returned from |
cores |
Number of cores to thread. |
verbose |
If the function should be verbose. |
RangedSummarizedExperiment
containing in assays:
der.ASM: matrix with SNP-based ASM
snp.table: Matrix with SNP associated to the CpG site.
ref.cov: Coverage of the 'reference' allele.
alt.cov: Coevarage of the 'alternative' allele.
ref.meth: Methylated reads from the 'reference' allele.
alt.meth: Methylated reads from the 'alternative' allele.
data(extractbams_output) derASM <- calc_derivedasm(extractbams_output[c(1,2)], cores = 1, verbose = FALSE)
data(extractbams_output) derASM <- calc_derivedasm(extractbams_output[c(1,2)], cores = 1, verbose = FALSE)
This function calculates the log odds ratio for a CpG tuple:
(MM*UU)/(UM*MU)
, where 'M' stands for methylated and 'U' for
unmethylated. 'MM' reflects the count for instances the CpG pair is
methylated at both positions. The higher the MM and UU counts for that CpG
pair, the higher the log odds ratio.
calc_logodds(s, eps = 1)
calc_logodds(s, eps = 1)
s |
A data frame that contains the MM,UU,UM, and MU counts for each CpG
tuple for a particular sample. It is the resulting object of the
|
eps |
Count added to each of the MM,UU,UM and MU counts to avoid dividing by zero for example. The default is set to 1. |
The same object is returned with an additional column for the log odds ratio.
This function calculates the ASM score for every tuple in a given sample. The
ASM score is a multiplication of the log odds ratio by a weight that reflects
the extent of allele-specific methylation. This weight is obtained with the
calc_weight
function.
calc_score(df, beta = 0.5, a = 0.2)
calc_score(df, beta = 0.5, a = 0.2)
df |
data frame of a sample containing all information per tuple (MM,UU,UM and MU counts, as well as the log odds ratio per tuple) needed for the ASM score. |
beta |
parameter for the |
a |
parameter for the calc_weight function. The weight will be the probability that the MM/(MM+UU) ratio lies between 0.5-a and 0.5+a. |
This function returns an allele-specific methylation (ASM) score for every given tuple in a sample. The ASM score is a product of the log odds ratio and a weight reflecting a measure of allele-specificity using the MM and UU counts.
The same object with an additional column for the ASM score.
This function calculates a weight which reflects MM to UU balance, where M
stands for methylated and U for unmethylated. Given the MM and UU counts for
a particular tuple, the weight is obtained using the link{pbeta}
function.
calc_weight(MM, UU, beta = 0.5, a = 0.2)
calc_weight(MM, UU, beta = 0.5, a = 0.2)
MM |
The read counts for where pos1 and pos2 of the tuple were both methylated. |
UU |
The read counts for where pos1 and pos2 of the tuple were both unmethylated. |
beta |
parameter for the beta distribution. In B(alpha,beta), we set alpha=beta=0.5 by default. |
a |
parameter for how far from 0.5 we go as a measure of allele-specific methylation. The weight is the probability that the MM:(MM+UU) ratio is between 0.5-a and 0.5+a. The default is set to 0.2. |
For a given tuple with MM and UU counts, the weight that reflects allele-scpecificity is calculated as follows:
Prior:
where
and
.
represents our prior belief which is that
tuples are either fully methylated or fully unmethylated, rather than
allele-specifically methylated which is a much rarer event.
Likelihood:
where x is our observation (the MM and UU counts).
Posterior:
where . This posterior also follows a beta
distribution
A number that reflects allele-specificity given MM and UU counts for a CpG pair. This is used as a weight that is multiplied by the log odds ratio to give the final ASM score of that tuple.
#calc_weight(MM=50, UU=50) #0.9999716
#calc_weight(MM=20, UU=60) #0.1646916
Plot score tracks
dame_track( dame, window = 0, positions = 0, derASM = NULL, ASM = NULL, colvec = NULL, plotSNP = FALSE )
dame_track( dame, window = 0, positions = 0, derASM = NULL, ASM = NULL, colvec = NULL, plotSNP = FALSE )
dame |
GRanges object containing a region of interest, or detected with find_dames |
window |
Number of CpG sites outside (up or down-stream) of the DAME should be plotted. Default = 0. |
positions |
Number of bp sites outside (up or down-stream) of the DAME should be plotted. Default = 0. |
derASM |
SummarizedExperiment object obtained from calc_derivedasm (Filtering should be done by the user) |
ASM |
SummarizedExperiment object obtained from calc_asm (Filtering should be done by the user) |
colvec |
Vector of colors (mainly useful for the SNP plot, because I add it with cowplot, so I don't export a ggplot, optional) |
plotSNP |
whether to add the SNP track, only if derASM is specified. Default = FALSE |
Plot
library(GenomicRanges) DAME <- GRanges(19, IRanges(306443,310272)) data('readtuples_output') ASM <- calc_asm(readtuples_output) SummarizedExperiment::colData(ASM)$group <- c(rep('CRC',3),rep('NORM',2)) SummarizedExperiment::colData(ASM)$samples <- colnames(ASM) dame_track(dame = DAME, ASM = ASM)
library(GenomicRanges) DAME <- GRanges(19, IRanges(306443,310272)) data('readtuples_output') ASM <- calc_asm(readtuples_output) SummarizedExperiment::colData(ASM)$group <- c(rep('CRC',3),rep('NORM',2)) SummarizedExperiment::colData(ASM)$samples <- colnames(ASM) dame_track(dame = DAME, ASM = ASM)
Plot means per group of score tracks.
dame_track_mean( dame, window = 0, positions = 0, derASM = NULL, ASM = NULL, colvec = NULL )
dame_track_mean( dame, window = 0, positions = 0, derASM = NULL, ASM = NULL, colvec = NULL )
dame |
GRanges object containing a region of interest, or detected with find_dames |
window |
Number of CpG sites outside (up or down-stream) of the DAME should be plotted. Default = 0. |
positions |
Number of bp sites outside (up or down-stream) of the DAME should be plotted. Default = 0. |
derASM |
SummarizedExperiment object obtained from calc_derivedasm (Filtering should be done by the user) |
ASM |
SummarizedExperiment object obtained from calc_asm (Filtering should be done by the user) |
colvec |
Vector of colors (mainly useful for the SNP plot, because I add it with cowplot, so I don't export a ggplot, optional) |
Plot
library(GenomicRanges) DAME <- GRanges(19, IRanges(306443,310272)) data('readtuples_output') ASM <- calc_asm(readtuples_output) SummarizedExperiment::colData(ASM)$group <- c(rep('CRC',3),rep('NORM',2)) SummarizedExperiment::colData(ASM)$samples <- colnames(ASM) dame_track_mean(dame = DAME, ASM = ASM)
library(GenomicRanges) DAME <- GRanges(19, IRanges(306443,310272)) data('readtuples_output') ASM <- calc_asm(readtuples_output) SummarizedExperiment::colData(ASM)$group <- c(rep('CRC',3),rep('NORM',2)) SummarizedExperiment::colData(ASM)$samples <- colnames(ASM) dame_track_mean(dame = DAME, ASM = ASM)
The package allows the user to extract an ASM score in two ways: either from
a bismark
bam file(s) and VCF file(s), or from the output from
methtuple
. Either way the final output is a list of regions with
diferential allele-specific methylated between groups of samples of interest.
The package also provides functions to visualize ASM at the read level or the
score level
calc_asm
extracts ASM for pairs of CpG
sites from a methtuple file, calc_derivedasm
extracts ASM at each
CpG site linked to a SNP from the VCF file. Both functions generate a
RangedSummarizedExperiment
, which is the input for the main function
find_dames
, that generates a data.frame
with regions
exhibiting differential ASM between a number of samples.
Stephany Orjuela [email protected]
Dania Machlab
Mark D Robinson [email protected]
The function takes a bam (from bismark) and vcf file for each sample. For each SNP contained in the vcfile it calculates the proportion of methylated reads for each CpG site at each allele. At the end it returns (saves to working directory) a GRanges list, where each GRanges contains all the CpG sites overlapping the reads containing a specific SNP.
extract_bams( bamFiles, vcfFiles, sampleNames, referenceFile, coverage = 4, cores = 1, verbose = TRUE )
extract_bams( bamFiles, vcfFiles, sampleNames, referenceFile, coverage = 4, cores = 1, verbose = TRUE )
bamFiles |
List of bam files. |
vcfFiles |
List of vcf files. |
sampleNames |
Names of files in the list. |
referenceFile |
fasta file used to generate the bam files. Or
|
coverage |
Minimum number of reads covering a CpG site on each allele. Default = 2. |
cores |
Number of cores to use. See package parallel for description of core. Default = 1. |
verbose |
Default = TRUE |
A list of GRanges for each sample. Each list is saved in a separate .rds file.
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) bamFiles <- get_data_path('NORM1_chr19_trim.bam') vcfFiles <- get_data_path('NORM1.chr19.trim.vcf') sampleNames <- 'NORM1' #referenceFile suppressPackageStartupMessages({library(BSgenome.Hsapiens.UCSC.hg19)}) genome <- BSgenome.Hsapiens.UCSC.hg19 seqnames(genome) <- gsub("chr","",seqnames(genome)) dna <- DNAStringSet(genome[[19]], use.names = TRUE) names(dna) <- 19 GRanges_list <- extract_bams(bamFiles, vcfFiles, sampleNames, dna)
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) bamFiles <- get_data_path('NORM1_chr19_trim.bam') vcfFiles <- get_data_path('NORM1.chr19.trim.vcf') sampleNames <- 'NORM1' #referenceFile suppressPackageStartupMessages({library(BSgenome.Hsapiens.UCSC.hg19)}) genome <- BSgenome.Hsapiens.UCSC.hg19 seqnames(genome) <- gsub("chr","",seqnames(genome)) dna <- DNAStringSet(genome[[19]], use.names = TRUE) names(dna) <- 19 GRanges_list <- extract_bams(bamFiles, vcfFiles, sampleNames, dna)
4 Patients from a previous study (Parker et al, 2018.) with colorectal cancer were sequenced and the normal and cancerous tissue of each patient was obtained. The data includes a subset of chromosome 19.
extractbams_output
extractbams_output
A large list with 8 elements. Each element is a list of
GRanges
for each sample. Each GRanges in the list includes the
location of the CpG sites contained in the reads for each SNP. The GRanges
metadata table contains:
cov.ref
Number of reads of "reference" allele in that SNP
cov.alt
Number of reads of "alternative" allele in that SNP
meth.ref
Number of methylated reads of "reference" allele in that SNP
cov.ref
Number of methylated reads of "alternative" allele in that SNP
snp
The SNP containing the reads
For further details, see https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6949/ sample names in in ArrayExpress do not necessarily match names given here!
This function finds Differential Allele-specific MEthylated regions (DAMEs).
It uses the regionFinder
function from bumphunter
, and
asigns p-values either empirically or using the Simes method.
find_dames( sa, design, coef = 2, contrast = NULL, smooth = TRUE, Q = 0.5, pvalAssign = "simes", maxGap = 20, verbose = TRUE, maxPerms = 10, method = "ls", trend = FALSE, ... )
find_dames( sa, design, coef = 2, contrast = NULL, smooth = TRUE, Q = 0.5, pvalAssign = "simes", maxGap = 20, verbose = TRUE, maxPerms = 10, method = "ls", trend = FALSE, ... )
sa |
A |
design |
A design matrix created with |
coef |
Column in |
contrast |
a contrast matrix, generated with
|
smooth |
Whether smoothing should be applied to the t-Statistics. Default = TRUE. |
Q |
The percentile set to get a cutoff value K. K is the value on the
Qth quantile of the absolute values of the given (smoothed) t-statistics.
Only necessary if |
pvalAssign |
Choose method to assign pvalues, either 'simes' (default)
or 'empirical'. This second one performs |
maxGap |
Maximum gap between CpGs in a cluster (in bp). NOTE: Regions can be as small as 1 bp. Default = 20. |
verbose |
If the function should be verbose. Default = TRUE. |
maxPerms |
Maximum possible permutations generated. Only necessary if
|
method |
The method to be used in limma's |
trend |
Passed to |
... |
Arguments passed to |
The simes method has higher power to detect DAMEs, but the consistency in
signal across a region is better controlled with the empirical method, since
it uses regionFinder
and getSegments
to find regions with
t-statistics above a cuttof (controled with parameter Q
), whereas
with the 'simes' option, we initially detects clusters of CpG sites/tuples,
and then test if at least 1 differential site/tuple is present in the
cluster.
We recommend trying out different maxGap
and Q
parameters,
since the size and the effect-size of obtained DAMEs change with these
parameters.
A data frame of detected DAMEs ordered by the p-value. Each row
is a DAME and the following information is provided in the columns
(some column names change depending on the pvalAssign
choice):
chr: on which chromosome the DAME is found.
start: The start position of the DAME.
end: The end position of the DAME.
pvalSimes: p-value calculated with the Simes method.
pvalEmp: Empirical p-value obtained from permuting covariate of interest.
sumTstat: Sum of t-stats per segment/cluster.
meanTstat: Mean of t-stats per segment/cluster.
segmentL: Size of segmented cluster (from getSegments
).
clusterL: Size of original cluster (from clusterMaker
).
FDR: Adjusted p-value using the method of Benjamini, Hochberg. (from
p.adjust
).
numup: Number of sites with ASM increase in cluster (only for Simes).
numdown: Number of sites with ASM decrease in cluster (only for Simes).
data(readtuples_output) ASM <- calc_asm(readtuples_output) grp <- factor(c(rep('CRC',3),rep('NORM',2)), levels = c('NORM', 'CRC')) mod <- model.matrix(~grp) dames <- find_dames(ASM, mod, verbose = FALSE)
data(readtuples_output) ASM <- calc_asm(readtuples_output) grp <- factor(c(rep('CRC',3),rep('NORM',2)), levels = c('NORM', 'CRC')) mod <- model.matrix(~grp) dames <- find_dames(ASM, mod, verbose = FALSE)
This function calculates a moderated t-Statistic per site or tuple using
limma
's lmFit
and eBayes
functions. It
then smoothes the obtained t-Statistics using bumphunter
's
smoother
function.
get_tstats( sa, design, contrast = NULL, method = "ls", trend = FALSE, smooth = FALSE, maxGap = 20, coef = 2, verbose = TRUE, filter = TRUE, ... )
get_tstats( sa, design, contrast = NULL, method = "ls", trend = FALSE, smooth = FALSE, maxGap = 20, coef = 2, verbose = TRUE, filter = TRUE, ... )
sa |
A SummarizedExperiment containing ASM values where each row and column correspond to a tuple/site and sample respectively. |
design |
a design matrix created with |
contrast |
a contrast matrix, generated with
|
method |
The method to be used in limma's |
trend |
Passed to |
smooth |
Whether smoothing should be applied to the t-Statistics. Default = FALSE. If TRUE, wherever smoothing is not possible, the un-smoothed t-stat is used instead. |
maxGap |
The maximum allowed gap between genomic positions for clustering of genomic regions to be used in smoothing. Default = 20. |
coef |
Column in model.matrix specifying the parameter to estimate.
Default = 2. If |
verbose |
Set verbose. Default = TRUE. |
filter |
Remove empty tstats. Default = TRUE. |
... |
Arguments passed to |
The smoothing is done on genomic clusters consisting of CpGs that are close
to each other. In the case of tuples, the midpoint of the two genomic
positions in each tuple is used as the genomic position of that tuple, to
perform the smoothing.The function takes a RangedSummarizedExperiment
generated by calc_derivedasm
or calc_asm
containing ASM across samples, and the index of control and treatment
samples.
A vector of t-Statistics within the
RangedSummarizedExperiment
.
data(readtuples_output) ASM <- calc_asm(readtuples_output) grp <- factor(c(rep('CRC',3),rep('NORM',2)), levels = c('NORM', 'CRC')) mod <- model.matrix(~grp) tstats <- get_tstats(ASM, mod)
data(readtuples_output) ASM <- calc_asm(readtuples_output) grp <- factor(c(rep('CRC',3),rep('NORM',2)), levels = c('NORM', 'CRC')) mod <- model.matrix(~grp) tstats <- get_tstats(ASM, mod)
Draws CpG site methylation status as points, in reads containing a specific SNP. Generates one plot per bam file.
methyl_circle_plot( snp, vcfFile, bamFile, refFile, build = "hg19", dame = NULL, letterSize = 2.5, pointSize = 3, sampleName = "sample1", cpgsite = NULL, sampleReads = FALSE, numReads = 20 )
methyl_circle_plot( snp, vcfFile, bamFile, refFile, build = "hg19", dame = NULL, letterSize = 2.5, pointSize = 3, sampleName = "sample1", cpgsite = NULL, sampleReads = FALSE, numReads = 20 )
snp |
GRanges object containing SNP location. |
vcfFile |
vcf file. |
bamFile |
bismark bam file path. |
refFile |
fasta reference file path. Or |
build |
genome build used. default = "hg19" |
dame |
(optional) GRanges object containing a region to plot. |
letterSize |
Size of alleles drawn in plot. Default = 2.5. |
pointSize |
Size of methylation circles. Default = 3. |
sampleName |
FIX?: this is to save the vcf file to not generate it every time you run the function. |
cpgsite |
(optional) GRanges object containing a single CpG site location of interest. |
sampleReads |
Whether a subset of reads should be plotted. Default = FALSE. |
numReads |
Number of reads to plot per allele, if sampleReads is TRUE. Default = 20 |
Plot
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) bam_files <- get_data_path('NORM1_chr19_trim.bam') vcf_files <- get_data_path('NORM1.chr19.trim.vcf') sample_names <- 'NORM1' #reference_file suppressPackageStartupMessages({library(BSgenome.Hsapiens.UCSC.hg19)}) genome <- BSgenome.Hsapiens.UCSC.hg19 seqnames(genome) <- gsub("chr","",seqnames(genome)) dna <- DNAStringSet(genome[[19]], use.names = TRUE) names(dna) <- 19 snp <- GenomicRanges::GRanges(19, IRanges::IRanges(292082, width = 1)) methyl_circle_plot(snp = snp, vcfFile = vcf_files, bamFile = bam_files, refFile = dna, sampleName = sample_names)
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) bam_files <- get_data_path('NORM1_chr19_trim.bam') vcf_files <- get_data_path('NORM1.chr19.trim.vcf') sample_names <- 'NORM1' #reference_file suppressPackageStartupMessages({library(BSgenome.Hsapiens.UCSC.hg19)}) genome <- BSgenome.Hsapiens.UCSC.hg19 seqnames(genome) <- gsub("chr","",seqnames(genome)) dna <- DNAStringSet(genome[[19]], use.names = TRUE) names(dna) <- 19 snp <- GenomicRanges::GRanges(19, IRanges::IRanges(292082, width = 1)) methyl_circle_plot(snp = snp, vcfFile = vcf_files, bamFile = bam_files, refFile = dna, sampleName = sample_names)
Draws CpG site methylation status as points, in reads containing a specific CpG site. Generates one plot per bam file.
methyl_circle_plotCpG( cpgsite = cpgsite, bamFile = bamFile, pointSize = 3, refFile = refFile, dame = NULL, order = FALSE, sampleName = NULL, sampleReads = FALSE, numReads = 20 )
methyl_circle_plotCpG( cpgsite = cpgsite, bamFile = bamFile, pointSize = 3, refFile = refFile, dame = NULL, order = FALSE, sampleName = NULL, sampleReads = FALSE, numReads = 20 )
cpgsite |
GRanges object containing a single CpG site location of interest |
bamFile |
bismark bam file path |
pointSize |
Size of methylation circles. Default = 3. |
refFile |
fasta reference file path |
dame |
(optional) GRanges object containing a region to plot |
order |
Whether reads should be sorted by methylation status. Default= False. |
sampleName |
Plot title. |
sampleReads |
Whether a subset of reads should be plotted. Default = FALSE. |
numReads |
Number of reads to plot, if sampleReads is TRUE. Default = 20 |
Plot
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) bam_files <- get_data_path('NORM1_chr19_trim.bam') sample_names <- 'NORM1' #reference_file suppressPackageStartupMessages({library(BSgenome.Hsapiens.UCSC.hg19)}) genome <- BSgenome.Hsapiens.UCSC.hg19 seqnames(genome) <- gsub("chr","",seqnames(genome)) dna <- DNAStringSet(genome[[19]], use.names = TRUE) names(dna) <- 19 cpg <- GenomicRanges::GRanges(19, IRanges::IRanges(292082, width = 1)) methyl_circle_plotCpG(cpgsite = cpg, bamFile = bam_files, refFile = dna)
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) bam_files <- get_data_path('NORM1_chr19_trim.bam') sample_names <- 'NORM1' #reference_file suppressPackageStartupMessages({library(BSgenome.Hsapiens.UCSC.hg19)}) genome <- BSgenome.Hsapiens.UCSC.hg19 seqnames(genome) <- gsub("chr","",seqnames(genome)) dna <- DNAStringSet(genome[[19]], use.names = TRUE) names(dna) <- 19 cpg <- GenomicRanges::GRanges(19, IRanges::IRanges(292082, width = 1)) methyl_circle_plotCpG(cpgsite = cpg, bamFile = bam_files, refFile = dna)
Same as plotMDS
, except for an arc-sine transformation of the
methylation proportions.
methyl_MDS_plot(x, group, top = 1000, coverage = 5, adj = 0.02, pointSize = 4)
methyl_MDS_plot(x, group, top = 1000, coverage = 5, adj = 0.02, pointSize = 4)
x |
|
group |
Vector of group or any other labels, same length as number of samples. |
top |
Number of top CpG sites used to calculate pairwise distances. |
coverage |
Minimum number of reads covering a CpG site on each allele. Default = 5. |
adj |
Text adjustment in y-axis. Default = 0.2. |
pointSize |
Default = 4. |
Two-dimensional MDS plot.
data(readtuples_output) ASM <- calc_asm(readtuples_output) grp <- factor(c(rep('CRC',3),rep('NORM',2)), levels = c('NORM', 'CRC')) methyl_MDS_plot(ASM, grp)
data(readtuples_output) ASM <- calc_asm(readtuples_output) grp <- factor(c(rep('CRC',3),rep('NORM',2)), levels = c('NORM', 'CRC')) methyl_MDS_plot(ASM, grp)
Function to calculate signed square root (aka modulus square root).
modulus_sqrt(values)
modulus_sqrt(values)
values |
Vector or matrix of ASM scores where each column is a sample. These values are transformed with a square root transformation that (doesn't) preserve the sign. |
Vector or matrix of transformed scores.
This function reads in a list of files obtained from the methtuple tool. It filters out tuples based on the set minimum coverage (min_cov) and the maximum allowed distance (maxGap) between two genomic positions in a tuple.
read_tuples(files, sampleNames, minCoverage = 2, maxGap = 20, verbose = TRUE)
read_tuples(files, sampleNames, minCoverage = 2, maxGap = 20, verbose = TRUE)
files |
List of methtuple files. |
sampleNames |
Names of files in the list. |
minCoverage |
The minimum coverage per tuple. Tuples with a coverage < minCoverage are filtered out. Default = 2. |
maxGap |
The maximum allowed distance between two positions in a tuple. Only distances that are <= maxGap are kept. Default = 150 base pairs. |
verbose |
If the function should be verbose. |
A list of data frames, where each data frame corresponds to one file.
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) tuple_files <- list.files(DATA_PATH_DIR, '.tsv.gz') tuple_files <- get_data_path(tuple_files) ASM <- read_tuples(tuple_files, c('CRC1', 'NORM1'))
DATA_PATH_DIR <- system.file('extdata', '.', package = 'DAMEfinder') get_data_path <- function(file_name) file.path(DATA_PATH_DIR, file_name) tuple_files <- list.files(DATA_PATH_DIR, '.tsv.gz') tuple_files <- get_data_path(tuple_files) ASM <- read_tuples(tuple_files, c('CRC1', 'NORM1'))
3 Patients from a previous study (Parker et al, 2018.) with colorectal cancer were sequenced and the normal and cancerous tissue of each patient was obtained. The data includes a subset of chromosome 19. Here one normal sample is not included.
readtuples_output
readtuples_output
A large list with 5 elements. Each element is a tibble
with
the coordinates of the pairs of CpG sites (tuples). Rest of the tibble
contains:
MM
Number of reads with both CpG sites methylated
MU
Number of reads with first CpG site methylated
UM
Number of reads with second CpG site methylated
UU
Number of reads with both CpG sites unmethylated
cov
Coverage, total reads at tuple
inter_dist
Distance in bp between CpG sites
For further details, see https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6949/ sample names in in ArrayExpress do not necessarily match names given here!