Package 'spiky' reference manual

Title:	Spike-in calibration for cell-free MeDIP
Description:	spiky implements methods and model generation for cfMeDIP (cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.
Authors:	Samantha Wilson [aut], Lauren Harmon [aut], Tim Triche [aut, cre]
Maintainer:	Tim Triche <[email protected]>
License:	GPL-2
Version:	1.13.0
Built:	2025-03-19 04:21:37 UTC
Source:	https://github.com/bioc/spiky

decode fragment identifiers for spike-in standards

Description

given a vector of fragment identifiers like ⁠160_2_35⁠ or ⁠80b_1C_35G-2⁠, encoded typically as lengthInBp_numberOfCpGs_GCpercent, and optionally a database of spike-in sequences corresponding to those fragments, add those columns to the source data (along with, if present in the database, other metadata such as standard concentrations, GC fraction, etc.) and return i an updated DataFrame.

Usage

add_frag_info(x, frag_grp = "frag_grp", spike = NULL)
add_frag_info(x, frag_grp = "frag_grp", spike = NULL)

Arguments

`x`	data.frame with a column of spike information (see above)
`frag_grp`	column name for the spike contig information (`frag_grp`)
`spike`	optional database of spike-in properties (none)

Value

     the data.frame x, augmented with metadata columns

Examples

data(spike_cram_counts) 
data(spike, package="spiky") 
spike <- subset(spike, methylated == 1)
add_frag_info(spike_cram_counts, spike=spike)

data(spike_cram_counts) 
data(spike, package="spiky") 
spike <- subset(spike, methylated == 1)
add_frag_info(spike_cram_counts, spike=spike)

create a tiled representation of a genome from the BAM/CRAM file

Description

This function replaces a bedtools call: bedtools intersect -wao -a fragments.bed -b hg38_300bp_windows.bed > data.bed

Usage

bam_to_bins(x, width = 300, param = NULL, which = IRangesList(), ...)
bam_to_bins(x, width = 300, param = NULL, which = IRangesList(), ...)

Arguments

`x`	a BAM or CRAM filename (or a BamFile object)
`width`	the width of the bins to tile (default is 300)
`param`	optional ScanBamParam (whence we attempt to extract `which`)
`which`	an optional GRanges restricting the bins to certain locations
`...`	additional arguments to pass on to seqinfo_from_header

Details

The idea is to skip the BED creation step for most runs, and just do it once. In order to count reads in bins, we need bins. In order to have bins, we need to know how long the chromosomes are. In order to have a BAM or CRAM file, we need to have those same lengths. This function takes advantage of all of the above to create binned ranges. Note that a very recent branch of Rsamtools is required for CRAM file bins.

Value

   a GRangesList with y-base-pair-wide bins tiled across it

Examples

library(Rsamtools) 
fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
bam_to_bins(fl) 

library(Rsamtools) 
fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
bam_to_bins(fl)

Binned estimation of picomoles of DNA present in cfMeDIP assays

Description

Given the results of model_glm_pmol and predict_pmol, adjust the predictions to reflect picomoles of captured DNA overlapping a given bin in the genome.

Usage

bin_pmol(x)
bin_pmol(x)

Arguments

`x`	results from predict_pmol (a data.frame or GRanges)

Value

 the same object, but with a column `adjusted_pred_con`

Examples

data(spike, package="spiky")
data(spike_res, package="spiky")
data(genomic_res,package="spiky")
fit <- model_glm_pmol(covg_to_df(spike_res, spike=spike),spike=spike)
pred <- predict_pmol(fit, genomic_res, ret="df")
bin_pmol(pred)

data(spike, package="spiky")
data(spike_res, package="spiky")
data(genomic_res,package="spiky")
fit <- model_glm_pmol(covg_to_df(spike_res, spike=spike),spike=spike)
pred <- predict_pmol(fit, genomic_res, ret="df")
bin_pmol(pred)

Convert Pairs to GRanges

Description

Convert Pairs to GRanges

Usage

convertPairedGRtoGR(pairs)
convertPairedGRtoGR(pairs)

Arguments

pairs

the Pairs object

Value

       a GRanges

reshape `scan_spiked_bam` results into data.frames for model_glm_pmol

Description

reshape scan_spiked_bam results into data.frames for model_glm_pmol

Usage

covg_to_df(spike_gr, spike, meth = TRUE, ID = NULL)
covg_to_df(spike_gr, spike, meth = TRUE, ID = NULL)

Arguments

`spike_gr`	GRanges of spike contigs (e.g. output object from scan_spiked_bam, scan_spike_contigs, or scan_spike_bedpe)
`spike`	spike database (as from data(spike, package="spiky"))
`meth`	only keep methylated spike reads? (TRUE; if FALSE, sum both)
`ID`	an identifier for this sample, if running several (autogenerate)

Value

 a data.frame with columns 'frag_grp', 'id', and 'read_count'

Examples

data(spike, package="spiky")
data(spike_res, package="spiky")
subsetted <- covg_to_df(spike_res, spike=spike, meth=TRUE)
summed <- covg_to_df(spike_res, spike=spike, meth=FALSE)
round((summed$read_count - subsetted$read_count) / summed$read_count, 3)

data(spike, package="spiky")
data(spike_res, package="spiky")
subsetted <- covg_to_df(spike_res, spike=spike, meth=TRUE)
summed <- covg_to_df(spike_res, spike=spike, meth=FALSE)
round((summed$read_count - subsetted$read_count) / summed$read_count, 3)

spike-in counts for two samples, as a wide data.frame

Description

A data.frame with spike-in results from control samples in the manuscript. This maps 1:1 onto spike_read_counts using reshape2::melt.

Usage

data(dedup)
data(dedup)

Format

A data.frame object with

frag_grp: the encoded spike contig name: basepairs_CpGs_GCpercent
read_count_6547: read coverage for this spike in sample 6547
read_count_6548: read coverage for this spike in sample 6548

Source

This data was created using inst/script/loadDedup.R

find spike-in seqlevels in an object `x`, where !is.null(seqinfo(x))

Description

Find the spike-like contigs in a BAM with both natural and spiked contigs. This started out as glue in some other functions and got refactored out.

Usage

find_spike_contigs(x, spike)
find_spike_contigs(x, spike)

Arguments

`x`	something with seqlevels
`spike`	a DataFrame with spike-in information

Details

The indices have an attribute "mappings", which is a character vector such that attr(find_spike_contigs(x), "mappings") == standardized for all contig names in the CRAM/BAM/whatever, and standardized is the rowname in spike that corresponds to the original contig name.

Value

   indices of which contigs in seqlevels(x) are spike-in contigs

Examples

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
si <- seqinfo_from_header(sb)
data(spike, package="spiky")
find_spike_contigs(si, spike=spike)

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
si <- seqinfo_from_header(sb)
data(spike, package="spiky")
find_spike_contigs(si, spike=spike)

various mitochondrial genomes sometimes used as endogenous spike-ins

Description

A DataFrame with species, genome, accession, and sequence for GenBank mitochondrial genome depositions. No concentration provided; add if needed.

Usage

data(genbank_mito)
data(genbank_mito)

Format

A DataFrame object with

species: the species whence the record came, as a character string
genome: the genome assembly whence the mtDNA, as a character string
accession: the genbank accession, as a character string
sequence: genome sequence, as a DNAStringSet

Source

www.ncbi.nlm.nih.gov/genbank/

for CRAM files, a FASTA reference is required to decode; this builds that

Description

A FASTA reference is not always needed, so long as .crai indices are available for all contigs in the CRAM. See spike_counts for a fast and convenient alternative that extracts spike coverage from index stats. However, spike_counts has its own issues, and it's better to use fragments.

Usage

generate_spike_fasta(bam, spike, assembly = NULL, fa = "spike_contigs.fa")
generate_spike_fasta(bam, spike, assembly = NULL, fa = "spike_contigs.fa")

Arguments

`bam`	a BAM or CRAM file, hopefully with an index
`spike`	the spike contig database (mandatory as of 0.9.99)
`assembly`	optional BSgenome or seqinfo with reference contigs (NULL)
`fa`	the filename for the resulting FASTA ("spikes.fa")

Details

If the contigs in a CRAM have even slightly different names from those in the reference, decoding will fail. In some cases there are multiple names for a given contig (which raises the question of whether to condense them), and thus the same reference sequence decodes multiple contig names.

This function generates an appropriate spike reference for a BAM or CRAM, using BAM/CRAM headers to figure out which references are used for which.

At the moment, CRAM support in Rsamtools only exists in the GitHub branch:

BiocManager::install("Bioconductor/Rsamtools@cram")

Using other versions of Rsamtools will yield an error on CRAM files.

Note that for merged genomic + spike reference BAMs/CRAMs, this function will only attempt to generate a FASTA for the spike contigs, not reference. If your reference contigs are screwed up, talk to your sequencing people, and keep better track of the FASTA reference against which you compress!

Value

     invisibly, a DNAStringSet as exported to `fa`

Examples


library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
outFasta <- paste(system.file("extdata", package="spiky", mustWork=TRUE),"/spike_contigs.fa",sep="")
show(generate_spike_fasta(sb, spike=spike,fa=outFasta))

library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
outFasta <- paste(system.file("extdata", package="spiky", mustWork=TRUE),"/spike_contigs.fa",sep="")
show(generate_spike_fasta(sb, spike=spike,fa=outFasta))

A Granges object with genomic coverage from chr21q22, binned every 300bp for the genomic contigs then averaged across the bin. (In other words, the default output of scan_genomic_contigs or scan_genomic_bedpe, restricted to a small enough set of genomic regions to be practical for examples.) This represents what most users will want to generate from their own genomic BAMs or BEDPEs, and is used repeatedly in downstream examples throughout the package.

Description

A Granges object with genomic coverage from chr21q22, binned every 300bp for the genomic contigs then averaged across the bin. (In other words, the default output of scan_genomic_contigs or scan_genomic_bedpe, restricted to a small enough set of genomic regions to be practical for examples.) This represents what most users will want to generate from their own genomic BAMs or BEDPEs, and is used repeatedly in downstream examples throughout the package.

Usage

data(genomic_res)
data(genomic_res)

Format

A GRanges of coverage results with one metadata column, coverage

Source

Generated using scan_genomic_bedpe or scan_genomic_contigs on an example bedpe or bam containing chr21q22 contigs.

refactored out of rename_spikes and rename_spike_seqlevels

Description

A common task between generate_spike_fasta, rename_spikes, and rename_spike_seqlevels is to determine what the largest common subset of characters between existing contig names and stored standardized contigs might be. This function eases that task.

Usage

get_base_name(contig_names, sep = "_")
get_base_name(contig_names, sep = "_")

Arguments

`contig_names`	the names of contigs
`sep`	separator character in contig names ("_")

Value

               a vector of elements 1:3 from each contig name

Examples

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
bh <- scanBamHeader(BamFile(sb))
orig_contigs <- names(bh$targets)
get_base_name(orig_contigs)

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
bh <- scanBamHeader(BamFile(sb))
orig_contigs <- names(bh$targets)
get_base_name(orig_contigs)

tabulate read coverage in predefined bins

Description

refactored out of scan_spiked_bam

Usage

get_binned_coverage(bins, covg)
get_binned_coverage(bins, covg)

Arguments

`bins`	the GRanges with bins
`covg`	the coverage result (an RleList)

Value

   a GRanges of summarized coverage

Examples

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                   mustWork=TRUE)
data(spike,package="spiky")
si <- seqinfo_from_header(sb)
genome(si) <- "spike"
mgr <- get_merged_gr(si,spike=spike)
fl <- scanBamFlag(isDuplicate=FALSE, isPaired=TRUE, isProperPair=TRUE)
bp <- ScanBamParam(flag=fl)
bamMapqFilter(bp) <- 20

covg <- get_spiked_coverage(sb, bp=bp, gr=mgr)
get_binned_coverage(bins=GRanges(), covg=covg)

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                   mustWork=TRUE)
data(spike,package="spiky")
si <- seqinfo_from_header(sb)
genome(si) <- "spike"
mgr <- get_merged_gr(si,spike=spike)
fl <- scanBamFlag(isDuplicate=FALSE, isPaired=TRUE, isProperPair=TRUE)
bp <- ScanBamParam(flag=fl)
bamMapqFilter(bp) <- 20

covg <- get_spiked_coverage(sb, bp=bp, gr=mgr)
get_binned_coverage(bins=GRanges(), covg=covg)

get a GRanges of (by default, standard) chromosomes from seqinfo

Description

refactored from scan_spiked_bam to clarify information flow

Usage

get_merged_gr(si, spike, standard = TRUE)
get_merged_gr(si, spike, standard = TRUE)

Arguments

`si`	seqinfo, usually from a BAM/CRAM file with spike contigs
`spike`	database of spike-in standard sequence features (spike)
`standard`	trim to standard chromosomes? (TRUE)

Details

By default, get_merged_gr will return a GRanges with "standardized" genomic and spike contig names (i.e. genomic chr1-22, X, Y, M, and the canonical spike names in data(spike, package="spiky")).

The constraint to "standard" chromosomes on genomic contigs can be removed by setting standard to FALSE in the function arguments.

Value

     GRanges with two genomes: the organism assembly and "spike"

Examples

sb <- system.file("extdata", "example.spike.bam", package="spiky", 
                  mustWork=TRUE) 
si <- seqinfo_from_header(sb) 
genome(si) <- "spike" # no genomic contigs
data(spike, package="spiky")
get_merged_gr(si, spike=spike) # note canonicalized spikes

sb <- system.file("extdata", "example.spike.bam", package="spiky", 
                  mustWork=TRUE) 
si <- seqinfo_from_header(sb) 
genome(si) <- "spike" # no genomic contigs
data(spike, package="spiky")
get_merged_gr(si, spike=spike) # note canonicalized spikes

get the (max, median, or mean) coverage for spike-in contigs from a BAM/CRAM

Description

get the (max, median, or mean) coverage for spike-in contigs from a BAM/CRAM

Usage

get_spike_depth(covg, spike_gr = NULL, spike = NULL, how = c("max", "mean"))
get_spike_depth(covg, spike_gr = NULL, spike = NULL, how = c("max", "mean"))

Arguments

`covg`	the coverage RleList
`spike_gr`	the spike-in GRanges (default: figure out from seqinfo)
`spike`	information about the spikes (default: load `spike`)
`how`	how to summarize the per-spike coverage (max)

Value

     a GRanges with summarized coverage and features for each

Examples

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
data(spike, package="spiky")
si <- seqinfo_from_header(sb)
genome(si) <- "spike"
mgr <- get_merged_gr(si,spike=spike)

fl <- scanBamFlag(isDuplicate=FALSE, isPaired=TRUE, isProperPair=TRUE)
bp <- ScanBamParam(flag=fl)
bamMapqFilter(bp) <- 20

covg <- get_spiked_coverage(sb, bp=bp, gr=mgr)
get_spike_depth(covg, spike_gr=mgr, spike=spike)

sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
data(spike, package="spiky")
si <- seqinfo_from_header(sb)
genome(si) <- "spike"
mgr <- get_merged_gr(si,spike=spike)

fl <- scanBamFlag(isDuplicate=FALSE, isPaired=TRUE, isProperPair=TRUE)
bp <- ScanBamParam(flag=fl)
bamMapqFilter(bp) <- 20

covg <- get_spiked_coverage(sb, bp=bp, gr=mgr)
get_spike_depth(covg, spike_gr=mgr, spike=spike)

tabulate coverage across assembly and spike contig subset in natural order

Description

FIXME: this is wicked slow, ask Herve if a faster version exists

Usage

get_spiked_coverage(bf, bp, gr)
get_spiked_coverage(bf, bp, gr)

Arguments

`bf`	the BamFile object
`bp`	the ScanBamParam object
`gr`	the GRanges with sorted seqlevels

Details

Refactored from scan_spiked_bam, this is a very simple wrapper

Value

a list of Rles

Examples

sb <- system.file("extdata", "example.spike.bam", package="spiky", 
                  mustWork=TRUE)
si <- seqinfo_from_header(sb) 
genome(si) <- "spike"
data(spike, package="spiky")
mgr <- get_merged_gr(si, spike=spike) # note canonicalized spikes

fl <- scanBamFlag(isDuplicate=FALSE, isPaired=TRUE, isProperPair=TRUE)
bp <- ScanBamParam(flag=fl)
bamMapqFilter(bp) <- 20
get_spiked_coverage(sb, bp=bp, gr=mgr)

sb <- system.file("extdata", "example.spike.bam", package="spiky", 
                  mustWork=TRUE)
si <- seqinfo_from_header(sb) 
genome(si) <- "spike"
data(spike, package="spiky")
mgr <- get_merged_gr(si, spike=spike) # note canonicalized spikes

fl <- scanBamFlag(isDuplicate=FALSE, isPaired=TRUE, isProperPair=TRUE)
bp <- ScanBamParam(flag=fl)
bamMapqFilter(bp) <- 20
get_spiked_coverage(sb, bp=bp, gr=mgr)

simple contig kmer comparisons

Description

simple contig kmer comparisons

Usage

kmax(km, normalize = TRUE)
kmax(km, normalize = TRUE)

Arguments

`km`	kmer summary
`normalize`	normalize (divide by row sums)? (TRUE)

Value

       the most common kmers for each contig, across all contigs

Examples



data(genbank_mito, package="spiky")
mtk6 <- kmers(genbank_mito, k=6)
rownames(mtk6) <- paste0(rownames(mtk6), "_MT")
kmax(mtk6)

data(phage, package="spiky")
phk6 <- kmers(phage, k=6)
kmax(phk6, normalize=FALSE)

stopifnot(identical(colnames(phk6), colnames(mtk6)))
k6 <- rbind(mtk6, phk6)
kmax(k6)

data(genbank_mito, package="spiky")
mtk6 <- kmers(genbank_mito, k=6)
rownames(mtk6) <- paste0(rownames(mtk6), "_MT")
kmax(mtk6)

data(phage, package="spiky")
phk6 <- kmers(phage, k=6)
kmax(phk6, normalize=FALSE)

stopifnot(identical(colnames(phk6), colnames(mtk6)))
k6 <- rbind(mtk6, phk6)
kmax(k6)

oligonucleotideFrequency, but less letters and more convenient.

Description

oligonucleotideFrequency, but less letters and more convenient.

Usage

kmers(x, k = 6)
kmers(x, k = 6)

Arguments

`x`	BSgenome, DFrame with `sequence` column, or DNAStringSet
`k`	the length of the kmers (default is 6)

Details

The companion kmax function finds the maximum frequency kmer for each contig and plots all of them together for comparison purposes.

Value

       a matrix of contigs (rows) by kmer frequencies (columns)

Examples


data(genbank_mito, package="spiky") 
mtk6 <- kmers(genbank_mito, k=6)
kmax(mtk6)

data(phage, package="spiky") 
phk6 <- kmers(phage, k=6)
kmax(phk6)

data(genbank_mito, package="spiky") 
mtk6 <- kmers(genbank_mito, k=6)
kmax(mtk6)

data(phage, package="spiky") 
phk6 <- kmers(phage, k=6)
kmax(phk6)

compute methylation specificity for spike-in standards

Description

In a cfMeDIP experiment, the yield of methylated fragments should be >95% (ideally 98-99%) due to the nature of the assay.

Usage

methylation_specificity(spike_gr, spike)
methylation_specificity(spike_gr, spike)

Arguments

`spike_gr`	GRanges of spike contigs (e.g. output object from scan_spiked_bam, scan_spike_contigs, or scan_spike_bedpe)
`spike`	spike contig database, if needed (e.g. data(spike))

Value

         list with median and mean coverage across spike contigs

Examples

data(genomic_res)
data(spike_res)
data(spike, package="spiky")
methylation_specificity(spike_res, spike=spike)

data(genomic_res)
data(spike_res)
data(spike, package="spiky")
methylation_specificity(spike_res, spike=spike)

Build a Bayesian additive model from spike-ins to correct bias in *-seq

Description

Build a Bayesian additive model from spike-ins to correct bias in *-seq

Usage

model_bam_standards(x, conc = NULL, fm = NULL, ...)
model_bam_standards(x, conc = NULL, fm = NULL, ...)

Arguments

`x`	data with assorted feature information (GCfrac, CpGs, etc)
`conc`	concentration for each spike (must be provided!)
`fm`	model formula (conc ~ read_count + fraglen + GCfrac + CpGs_3)
`...`	other arguments to pass to `bamlss`

Value

   the model fit for the data

Examples


library(bamlss)
data(spike_cram_counts,package="spiky")
data(spike,package="spiky")
scc <- add_frag_info(spike_cram_counts, spike=spike)
scc$conc <- scc$conc * 0.9 # adjust for dilution
scc$CpGs_3 <- scc$CpGs ^ (1/3)
fit0 <- model_bam_standards(scc,
                            fm=conc ~ read_count + fraglen)
fit1 <- model_bam_standards(scc,
                            fm=conc ~ read_count + fraglen + GCfrac + CpGs_3)
DIC(fit0, fit1)

library(bamlss)
data(spike_cram_counts,package="spiky")
data(spike,package="spiky")
scc <- add_frag_info(spike_cram_counts, spike=spike)
scc$conc <- scc$conc * 0.9 # adjust for dilution
scc$CpGs_3 <- scc$CpGs ^ (1/3)
fit0 <- model_bam_standards(scc,
                            fm=conc ~ read_count + fraglen)
fit1 <- model_bam_standards(scc,
                            fm=conc ~ read_count + fraglen + GCfrac + CpGs_3)
DIC(fit0, fit1)

Build a generalized linear model from spike-ins to correct bias in cfMeDIP

Description

formerly '2020_model_glm_fmol'. Note that everything in x can be had from a BAM/CRAM with spike contigs named as frag_grp (len_CpGs_GC) in the index and in fact that is what scan_spiked_bam now does.

Usage

model_glm_pmol(x, spike, conc = NULL, ...)
model_glm_pmol(x, spike, conc = NULL, ...)

Arguments

`x`	data w/frag_grp, id, and read_count; or scan_spiked_bam result
`spike`	spike database, e.g. `data(spike, package='spiky')`
`conc`	concentration for each spike (will be referenced if NULL)
`...`	other arguments to pass to `glm` (e.g. `family`)

Value

   the model fit for the data

Examples


data(spike, package="spiky")

data(spike_read_counts, package="spiky")
fit1 <- model_glm_pmol(spike_read_counts, spike=spike)

data(spike_res) # scan_spiked_bam result
fit2 <- model_glm_pmol(spike_res, spike=spike)

data(spike, package="spiky")

data(spike_read_counts, package="spiky")
fit1 <- model_glm_pmol(spike_read_counts, spike=spike)

data(spike_res) # scan_spiked_bam result
fit2 <- model_glm_pmol(spike_res, spike=spike)

parse out the forward and reverse UMIs and contig for a BED/BAM

Description

parse out the forward and reverse UMIs and contig for a BED/BAM

Usage

parse_spike_UMI(UMI, pos = NULL, seqs = NULL)
parse_spike_UMI(UMI, pos = NULL, seqs = NULL)

Arguments

`UMI`	a vector of UMIs
`pos`	optional vector of positions (else all are set to 1)
`seqs`	optional vector of read sequences (else widths default to 96)

Value

a GRanges

lambda and phiX phage sequences, sometimes used as spike-ins

Description

A DataFrame with sequence, methylated, CpGs, GCfrac, and OECpG for phages

Usage

data(phage)
data(phage)

Format

A DataFrame object with

sequence: genome sequence, as a DNAStringSet
methylated: whether CpGs are methylated, as an integer
CpGs: the number of CpGs in the phage genome, as an integer
GCfrac: the GC fraction of the phage genome, as a numeric
OECpG: the observed / expected CpG fraction, as a numeric

Source

www.ncbi.nlm.nih.gov/genbank/

predict picomoles of DNA from a fit and read counts (coverage)

Description

FIXME: this could be made MUCH faster by precomputing CpG/GC stats per bin

Usage

predict_pmol(
  fit,
  genomic_gr,
  bsgenome = NULL,
  ret = c("gr", "df"),
  slide = FALSE
)
predict_pmol(
  fit,
  genomic_gr,
  bsgenome = NULL,
  ret = c("gr", "df"),
  slide = FALSE
)

Arguments

`fit`	result of model_glm_pmol
`genomic_gr`	the genomic data / new data
`bsgenome`	BSgenome name (if null, will guess from genomic_gr)
`ret`	return a data.frame ("df") or GRanges ("gr")? ("gr")
`slide`	compute a sliding window estimate for GCfrac (1/3 width)?

Details

Using GRanges as the return value is (perhaps counterintuitively) much faster than the data.frame, since the sequence of the bins gets converted from a BSgenome representation to characters in the latter (it is implied by the bin start, stop, and genome when left as a GRanges).

Value

object with read count, fraglen, GC%, CpG**(1/3), and concentration

Examples


data(spike_res)
data(genomic_res)
data(spike, package="spiky")
fit <- model_glm_pmol(covg_to_df(spike_res, spike=spike),spike=spike)
preddf <- predict_pmol(fit, genomic_res, ret="df")
pred <- predict_pmol(fit, genomic_res, ret="gr")
bin_pmol(pred)

data(spike_res)
data(genomic_res)
data(spike, package="spiky")
fit <- model_glm_pmol(covg_to_df(spike_res, spike=spike),spike=spike)
preddf <- predict_pmol(fit, genomic_res, ret="df")
pred <- predict_pmol(fit, genomic_res, ret="gr")
bin_pmol(pred)

QC, QA, and processing for a new spike database

Description

Sequence feature verification: never trust anyone, least of all yourself.

Usage

process_spikes(fasta, methylated = 0, ...)
process_spikes(fasta, methylated = 0, ...)

Arguments

`fasta`	fasta file (or GRanges or DataFrame) w/spike sequences
`methylated`	whether CpGs in each are methylated (0 or 1, default 0)
`...`	additional arguments, e.g. kernels (currently unused)

Details

GCfrac is the GC content of spikes as a proportion instead of a percent. OECpG is (observed/expected) CpGs (expectation is 25% of GC dinucleotides).

Value

         a DataFrame suitable for downstream processing

Examples


data(spike)
spikes <- system.file("extdata", "spikes.fa", package="spiky", mustWork=TRUE)
spikemeth <- spike$methylated
process_spikes(spikes, spikemeth)

data(phage)
phages <- system.file("extdata", "phages.fa", package="spiky", mustWork=TRUE)
identical(process_spikes(phage), phage)
identical(phage, process_spikes(phage))

data(genbank_mito)
(mt <- process_spikes(genbank_mito)) # see also genbank_mito.R
gb_mito <- system.file("extdata", "genbank_mito.R", package="spiky")


data(spike)
spikes <- system.file("extdata", "spikes.fa", package="spiky", mustWork=TRUE)
spikemeth <- spike$methylated
process_spikes(spikes, spikemeth)

data(phage)
phages <- system.file("extdata", "phages.fa", package="spiky", mustWork=TRUE)
identical(process_spikes(phage), phage)
identical(phage, process_spikes(phage))

data(genbank_mito)
(mt <- process_spikes(genbank_mito)) # see also genbank_mito.R
gb_mito <- system.file("extdata", "genbank_mito.R", package="spiky")

read a BEDPE file into Pairs of GRanges (as if a GAlignmentPairs or similar)

Description

read a BEDPE file into Pairs of GRanges (as if a GAlignmentPairs or similar)

Usage

read_bedpe(
  x,
  ...,
  stranded = FALSE,
  fraglen = TRUE,
  optional = FALSE,
  keep = FALSE
)
read_bedpe(
  x,
  ...,
  stranded = FALSE,
  fraglen = TRUE,
  optional = FALSE,
  keep = FALSE
)

Arguments

`x`	a Tabixed BEDPE file, or a TabixFile of one
`...`	additional arguments to pass to scanTabix internally
`stranded`	Is the data stranded? (FALSE)
`fraglen`	compute the fragment length? (TRUE)
`optional`	scan the optional columns (name, score, strand1)? (FALSE)
`keep`	keep additional columns? (FALSE)

Details

    BEDPE import in R is a shambles. This is a bandaid on a GSW.
             See the \href{https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format}{BEDPE format definition} for full details.
             In short, for a pair of ranges 1 and 2, we have fields
             chrom1, start1, end1, chrom2, start2, end2, and (optionally)
             name, score, strand1, strand2, plus any other user defined
             fields that may be included (these are not yet supported
             by read_bedpe). For example, two valid BEDPE lines are:

             chr1  100   200   chr5  5000  5100  bedpe_example1  30
             chr9  900  5000   chr9  3000  3800  bedpe_example2  99  +  -

Value

     a Pairs of GRanges, perhaps with $score or $fraglen

Examples

## Not run: 
  bedpe <- "GSM5067076_2020_A64_bedpe.bed.gz"
  WT1_hg38 <- GRanges("chr11", IRanges(32387775, 32435564), "-")
  read_bedpe(bedpe, param=WT1_hg38)

## End(Not run)

## Not run: 
  bedpe <- "GSM5067076_2020_A64_bedpe.bed.gz"
  WT1_hg38 <- GRanges("chr11", IRanges(32387775, 32435564), "-")
  read_bedpe(bedpe, param=WT1_hg38)

## End(Not run)

for spike-in contigs in GRanges, match to standardized spike seqlevels

Description

This function is essentially the opposite of rename_spikes, except that it works well on GRanges/GAlignments from or for merged genome+spike BAMs. If spike contigs are found, it will assign genome='spike' to those, while changing the seqlevels to standardized names that match rownames(spike).

Usage

rename_spike_seqlevels(x, spike = NULL)
rename_spike_seqlevels(x, spike = NULL)

Arguments

`x`	something with seqlevels (GRanges, GAlignments, Seqinfo...)
`spike`	a DataFrame where spike$sequence is a DNAStringSet (or NULL)

Value

     x, but with standardized spike seqlevels and genomes

for BAM/CRAM files with renamed contigs, we need to rename `spike` rows

Description

This function does that.

Usage

rename_spikes(x, spike)
rename_spikes(x, spike)

Arguments

`x`	a BAM/CRAM file, hopefully with an index
`spike`	a DataFrame where spike$sequence is a DNAStringSet

Value

     a DataFrame with renamed contigs (rows)

Scan genomic BEDPE

Description

Scan genomic BEDPE

Usage

scan_genomic_bedpe(
  bedpe,
  bin = TRUE,
  binwidth = 300L,
  bins = NULL,
  standard = TRUE,
  genome = "hg38"
)
scan_genomic_bedpe(
  bedpe,
  bin = TRUE,
  binwidth = 300L,
  bins = NULL,
  standard = TRUE,
  genome = "hg38"
)

Arguments

`bedpe`	the BEDPE file path, or output from read_bedpe()
`bin`	Bin reads? (TRUE)
`binwidth`	width of the bins for chromosomal tiling (300)
`bins`	a pre-tiled GRanges for binning coverage (NULL)
`standard`	restrict non-spike contigs to "standard" chromosomes? (TRUE)
`genome`	Name of genome (default hg38)

Value

     a GRanges with coverage

Examples


fl <- system.file("extdata", "example_chr21_bedpe.bed.gz", package="spiky",mustWork=TRUE)
scan_genomic_bedpe(fl) # will warn user about spike contigs


fl <- system.file("extdata", "example_chr21_bedpe.bed.gz", package="spiky",mustWork=TRUE)
scan_genomic_bedpe(fl) # will warn user about spike contigs

scan genomic contigs in a BAM/CRAM file

Description

The default workflow for spiky is roughly as follows:

Usage

scan_genomic_contigs(
  bam,
  spike,
  param = NULL,
  bin = TRUE,
  binwidth = 300L,
  bins = NULL,
  standard = TRUE,
  genome = "hg38",
  ...
)
scan_genomic_contigs(
  bam,
  spike,
  param = NULL,
  bin = TRUE,
  binwidth = 300L,
  bins = NULL,
  standard = TRUE,
  genome = "hg38",
  ...
)

Arguments

`bam`	the BAM or CRAM filename, or a vector of them
`spike`	the spike-in reference database (e.g. data(spike))
`param`	a ScanBamParam object specifying which reads to count (NULL)
`bin`	Bin reads? (TRUE)
`binwidth`	width of the bins for chromosomal tiling (300)
`bins`	a pre-tiled GRanges for binning coverage (NULL)
`standard`	restrict non-spike contigs to "standard" chromosomes? (TRUE)
`genome`	Name of genome (default hg38)
`...`	additional arguments to pass to scanBamFlag()

Details

Identify and quantify the spike-in contigs in an experiment.
Fit a model for sequence-based abundance artifacts using the spike-ins.
Quantify raw fragment abundance on genomic contigs, and adjust per step 2.

scan_genomic_contigs addresses the first half of step 3. The assumption is that anything which isn't a spike contig, is a genomic contig. This isn't necessarily true, so the user can also supply a ScanBamParam object for the param argument and restrict scanning to whatever contigs they wish, which also allows for non-default MAPQ, pairing, and quality filters.

If multiple BAM or CRAM filenames are provided, all indices will be checked before attempting to run through any of the files.

Value

     a CompressedGRangesList with bin- and spike-level coverage

Examples


library(Rsamtools)
data(spike, package="spiky")

fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)
scan_genomic_contigs(fl, spike=spike,standard=FALSE) # will warn user about spike contigs

sb <- system.file("extdata", "example_chr21.bam", package="spiky",
                  mustWork=TRUE)
scan_genomic_contigs(sb, spike=spike) # will warn user about genomic contigs

library(Rsamtools)
data(spike, package="spiky")

fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)
scan_genomic_contigs(fl, spike=spike,standard=FALSE) # will warn user about spike contigs

sb <- system.file("extdata", "example_chr21.bam", package="spiky",
                  mustWork=TRUE)
scan_genomic_contigs(sb, spike=spike) # will warn user about genomic contigs

tabulate methylation specificity for multiple spike-in BAM/CRAM files

Description

Methylation specificity is here defined as methylated_spike_covg/spike_covg

Usage

scan_methylation_specificity(files, spike, sep = "_")
scan_methylation_specificity(files, spike, sep = "_")

Arguments

`files`	a vector of BAM/CRAM file names
`spike`	a spike-in database
`sep`	the separator for spike-in contig names ("_")

Value

   a matrix with columns "mean" and "median"

Examples

data(spike) 
library(GenomicRanges)
sb <- system.file("extdata", "example.spike.bam", package="spiky", 
                  mustWork=TRUE)
scan_methylation_specificity(sb, spike=spike)

data(spike) 
library(GenomicRanges)
sb <- system.file("extdata", "example.spike.bam", package="spiky", 
                  mustWork=TRUE)
scan_methylation_specificity(sb, spike=spike)

Scan spikes BEDPE

Description

Scan spikes BEDPE

Usage

scan_spike_bedpe(bedpe, spike, how = "max")
scan_spike_bedpe(bedpe, spike, how = "max")

Arguments

`bedpe`	the BEDPE file path, or output from read_bedpe()
`spike`	information about the spikes (default: load `spike`)
`how`	how to summarize the per-spike coverage (max)

Value

     a GRanges with coverage

Examples

data(spike, package="spiky")
fl <- system.file("extdata", "example_spike_bedpe.bed.gz", package="spiky",mustWork=TRUE)
scan_spike_bedpe(fl,spike=spike) # will warn user about spike contigs

data(spike, package="spiky")
fl <- system.file("extdata", "example_spike_bedpe.bed.gz", package="spiky",mustWork=TRUE)
scan_spike_bedpe(fl,spike=spike) # will warn user about spike contigs

pretty much what it says: scan spike contigs from a BAM or CRAM file

Description

default workflow is

Usage

scan_spike_contigs(bam, spike, how = "max", param = NULL, mc.cores = 16, ...)
scan_spike_contigs(bam, spike, how = "max", param = NULL, mc.cores = 16, ...)

Arguments

`bam`	the BAM or CRAM filename, or a vector of such filenames
`spike`	the spike-in reference database (e.g. data(spike))
`how`	how to summarize the per-spike coverage (max)
`param`	a ScanBamParam object, or NULL (will default to MAPQ=20 etc)
`mc.cores`	Number of cores to run on (default 16)
`...`	additional arguments to pass to scanBamFlag()

Details

scan spike contigs and count fragments per contig or per bin.
fit the appropriate model for adjusting genomic contigs based on spikes.
scan and adjust binned fragment tallies along genomic contigs per above.

scan_spike_contigs implements step 1.

If multiple BAM or CRAM filenames are provided, all indices will be checked before attempting to run through any of the files.

Value

     a CompressedGRangesList with bin- and spike-level coverage

Examples

library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE) # switch to a CRAM
res <- scan_spike_contigs(sb, spike=spike) # use default ScanBamParam
summary(res)

library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE) # switch to a CRAM
res <- scan_spike_contigs(sb, spike=spike) # use default ScanBamParam
summary(res)

run spike_counts on BAM/CRAM files and shape the results for model_glm_pmol

Description

Typically one will want to fit a correction model to multiple samples. This function eases this task by merging the output of spike_counts into a data.frame that model_glm_pmol can directly fit.

Usage

scan_spike_counts(files, spike, methylated = 1, sep = "_")
scan_spike_counts(files, spike, methylated = 1, sep = "_")

Arguments

`files`	a vector of BAM/CRAM file names
`spike`	a spike-in database
`methylated`	a logical (0/1) to include only methylated fragments
`sep`	the separator for spike-in contig names ("_")

Value

   a data.frame with columns "frag_grp", "id", and "read_count"

Examples

data(spike)
library(GenomicRanges)
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
scan_spike_counts(sb, spike=spike)
fit <- model_glm_pmol(scan_spike_counts(sb, spike=spike),spike=spike)

data(spike)
library(GenomicRanges)
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
scan_spike_counts(sb, spike=spike)
fit <- model_glm_pmol(scan_spike_counts(sb, spike=spike),spike=spike)

pretty much what it says: scan standard chroms + spike contigs from a BAM

Description

Note: behind the scenes, this is being refactored into scan_spike_contigs and scan_genomic_contigs. Once that is done, perhaps before release, the default workflow will switch to

Usage

scan_spiked_bam(
  bam,
  spike,
  mapq = 20,
  binwidth = 300L,
  bins = NULL,
  how = c("max", "mean"),
  dupe = FALSE,
  paired = TRUE,
  standard = TRUE,
  ...
)
scan_spiked_bam(
  bam,
  spike,
  mapq = 20,
  binwidth = 300L,
  bins = NULL,
  how = c("max", "mean"),
  dupe = FALSE,
  paired = TRUE,
  standard = TRUE,
  ...
)

Arguments

`bam`	the BAM file
`spike`	the spike-in reference database (e.g. data(spike))
`mapq`	minimum mapq value to count a pair (20)
`binwidth`	width of the bins for chromosomal tiling (300)
`bins`	a pre-tiled GRanges for binning coverage (NULL)
`how`	how to record spike read coverage (max or mean)? (max)
`dupe`	unique (FALSE), duplicte (TRUE), or all (NA) reads? (FALSE)
`paired`	restrict coverage to that from properly paired reads? (TRUE)
`standard`	restrict non-spike contigs to "standard" chromosomes? (TRUE)
`...`	additional arguments to pass to scanBamFlag()

Details

scan spike contigs and count fragments per contig or per bin.
fit the appropriate model for adjusting genomic contigs based on spikes.
scan and adjust binned fragment tallies along genomic contigs per above.

This approach decouples binning schemes from model generation (using spikes) and model-based adjustment (using genomic fragment counts), decreasing code complexity while increasing the opportunities for caching & parallelization.

For a more realistic example (not run), one might do something like:

data(spike, package="spiky"); bam <- "2021_ctl.hg38_withSpikes.bam"; ssb_res <- scan_spiked_bam(bam, mapq=20, spike=spike);

An extract from the resulting ssb_res object is available via

data(ssb_res, package="spiky");

The full ssb_res is a GRangesList object with 300bp-binned coverage on the standard (chr1-22, chrX, chrY, chrM) chromosomes (as determined by the GenomeInfoDb::standardChromosomes() function against the assembly defined in the BAM or CRAM file, by default; if desired, a user can scan all genomic contigs by setting standard=FALSE when calling the function). By default, the mean base-level coverage of genomic bins is reported, and the maximum spike-level coverage is reported, though this can also be adjusted as needed. The results then inform the reliability of measurements from replicate samples in multiple labs, as well as the adjusted quantitative coverage in each bin once the absolute quantity of captured cell-free methylated DNA has been fit by model_glm_pmol and predict_pmol. In some sense, this function converts BAMs/CRAMs into usable data structures for high-throughput standardized cfMeDIP experiments.

The data extract used in other examples is the same as the full version, with the sole difference being that genomic bins are limited to chr22.

Value

     a CompressedGRangesList with bin- and spike-level coverage

Examples

library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
res <- scan_spiked_bam(sb, spike=spike, bins=GRanges())
summary(res$spikes$coverage)

library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
res <- scan_spiked_bam(sb, spike=spike, bins=GRanges())
summary(res$spikes$coverage)

create seqinfo (and thus a standard chromosome filter) from a BAM header

Description

create seqinfo (and thus a standard chromosome filter) from a BAM header

Usage

seqinfo_from_header(x, gen = NA, std = FALSE, ret = c("si", "gr"))
seqinfo_from_header(x, gen = NA, std = FALSE, ret = c("si", "gr"))

Arguments

`x`	the BAM file or its header
`gen`	genome of the BAM file, if known (NULL; autodetect)
`std`	standard chromosomes only? (FALSE; will be empty if spikes)
`ret`	return Seqinfo ("si", the default) or GRanges ("gr")? ("si")

Details

Setting std=TRUE on a spike-in BAM will produce an empty result.

Value

   Seqinfo object or GRanges (or `as(seqinfo, "GRanges")`)

Examples


  library(Rsamtools)
  fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)

  hdr <- scanBamHeader(BamFile(fl))
  si <- seqinfo_from_header(hdr)
  gr <- seqinfo_from_header(fl, ret="gr")
  stopifnot(identical(gr, as(si, "GRanges")))

  std_si <- seqinfo_from_header(fl, std=TRUE)
  seqlevels(std_si)

  # for comparison with below
  data(spike, package="spiky")
  spike

  sp <- system.file("extdata", "example.spike.bam", package="spiky")
  sp_gr <- seqinfo_from_header(sp, ret="gr")
  sp_gr

library(Rsamtools)
  fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)

  hdr <- scanBamHeader(BamFile(fl))
  si <- seqinfo_from_header(hdr)
  gr <- seqinfo_from_header(fl, ret="gr")
  stopifnot(identical(gr, as(si, "GRanges")))

  std_si <- seqinfo_from_header(fl, std=TRUE)
  seqlevels(std_si)

  # for comparison with below
  data(spike, package="spiky")
  spike

  sp <- system.file("extdata", "example.spike.bam", package="spiky")
  sp_gr <- seqinfo_from_header(sp, ret="gr")
  sp_gr

spike-in contig properties for Sam's cfMeDIP spikes

Description

A DataFrame with sequence, concentration, and other properties of Sam's synthetic cfMeDIP spike-in controls. The row names redudantly encode some of these properties, such as the number of CpGs in the spike-in sequence.

Usage

data(spike)
data(spike)

Format

A DataFrame object with

sequence: contig sequence, as a DNAStringSet
methylated: are the CpGs in this spike-in methylated? 0 or 1
CpGs: number of CpG dinucleotides in the spike, from 1 to 16
fmol: femtomolar concentration of the spike-in for standard mix
molmass: molar mass of spike-in sequence

Source

https://doi.org/10.1101/2021.02.12.430289

Bland-Altman plot for cfMeDIP spike standards

Description

Bland-Altman plot for cfMeDIP spike standards

Usage

spike_bland_altman_plot(fit)
spike_bland_altman_plot(fit)

Arguments

fit

a model fit, from predict_pmol (?)

Value

             a ggplot2 object

Examples


data(spike_res)
data(spike, package="spiky")
fit <- model_glm_pmol(covg_to_df(spike_res, spike=spike),spike=spike)
ba_plot <- spike_bland_altman_plot(fit)

data(spike_res)
data(spike, package="spiky")
fit <- model_glm_pmol(covg_to_df(spike_res, spike=spike),spike=spike)
ba_plot <- spike_bland_altman_plot(fit)

use the index of a spiked BAM/CRAM file for spike contig coverage

Description

It dawned on me one day that we don't even have to bother reading the file if we have an index for a spiked BAM/CRAM result, since any fragments that map properly to the spike contigs are generated from synthetic templates. This function takes an index and a spike database (usually a DataFrame) as inputs and provides a rough coverage estimate over "rehabilitated" contig names (i.e., canonicalized contigs mapping to the database) as its output.

Usage

spike_counts(
  bam,
  spike,
  sep = "_",
  ref = "spike",
  verbose = FALSE,
  dump_idx = FALSE
)
spike_counts(
  bam,
  spike,
  sep = "_",
  ref = "spike",
  verbose = FALSE,
  dump_idx = FALSE
)

Arguments

`bam`	the BAM or CRAM file (MUST HAVE AN INDEX)
`spike`	a data.frame, DataFrame, or similar with spikes
`sep`	separator character in contig names ("_")
`ref`	reference name for spike genome ("spike")
`verbose`	be verbose? (FALSE)
`dump_idx`	dump the renamed idxstats to aggregate? (FALSE)

Details

The argument spike has no default since we are attempting to refactor the spike-in databases into their own data packages and allow more general use.

Value

               a GRanges of spike-in contig read counts

Examples

data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE) 
spike_counts(sb, spike=spike)

data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE) 
spike_counts(sb, spike=spike)

spike-in counts, as a long data.frame

Description

A data.frame with spike-in results from CRAM files (generated from scan_spike_counts(CRAMs, spike=spike))

Usage

data(spike_cram_counts)
data(spike_cram_counts)

Format

A data.frame object with

frag_grp: the encoded spike contig name: basepairs_CpGs_GCpercent
id: subject from whom cfMeDIP spike reads (column 3) were counted
read_count: read coverage for this spike in this subject (column 2)

Source

Generated from scan_spike_counts(CRAMs, spike=spike) using example CRAMs containing spike contigs

spike-in counts, as a long data.frame

Description

A data.frame with spike-in results from control samples in the manuscript. This maps 1:1 onto dedup using reshape2::melt.

Usage

data(spike_read_counts)
data(spike_read_counts)

Format

A data.frame object with

frag_grp: the encoded spike contig name: basepairs_CpGs_GCpercent
id: subject from whom cfMeDIP spike reads (column 3) were counted
read_count: read coverage for this spike in this subject (column 2)

Source

This data was created using inst/script/loadDedup.R

A Granges object with spike-in sequence coverage, and summarized for each spike contig as (the default) `max` coverage. (In other words, the default output of scan_spike_contigs or scan_spike_bedpe) This represents what most users will want to generate from their own spike-in BAMs or BEDPEs, and is used repeatedly in downstream examples throughout the package.

Description

A Granges object with spike-in sequence coverage, and summarized for each spike contig as (the default) max coverage. (In other words, the default output of scan_spike_contigs or scan_spike_bedpe) This represents what most users will want to generate from their own spike-in BAMs or BEDPEs, and is used repeatedly in downstream examples throughout the package.

Usage

data(spike_res)
data(spike_res)

Format

A GRanges of coverage results with one metadata column, coverage

Source

Generated using scan_spike_bedpe or scan_spike_contigs on an example bedpe or bam containing spike contigs.

A handful of methods that I've always felt were missing

Description

Particularly, simple methods to plot coverage results.

Usage

## S4 method for signature 'Rle,ANY'
plot(x, y, ...)

## S4 method for signature 'SimpleRleList,ANY'
plot(x, y, ...)
## S4 method for signature 'Rle,ANY'
plot(x, y, ...)

## S4 method for signature 'SimpleRleList,ANY'
plot(x, y, ...)

Arguments

`x`	an Rle or RleList, usually
`y`	not usedan Rle or RleList, usually
`...`	other params such as `ylim` passed to `barplot`

Details

selectMethod("plot", "Rle") and also selectMethod("plot", "RleList") too.

Value

             invisibly, the plot details

scan_spiked_bam results from a merged cfMeDIP CRAM file (chr22 and spikes)

Description

A CompressedGRangesList object with genomic (chr22) and spikes coverage, binned every 300bp for the genomic contigs then averaged across the bin, and summarized for each spike contig as (the default) max coverage. (In other words, the default output of scan_spiked_bam, restricted to a small enough set of genomic regions to be practical for examples.) This represents what most users will want to generate from their own merged BAMs or CRAMs, and is used repeatedly in downstream examples throughout the package.

Usage

data(ssb_res)
data(ssb_res)

Format

A CompressedGRangesList of coverage results, containing

genomic: a GRanges with one metadata column, coverage
spikes: a GRanges with one metadata column, coverage

Source

Generated using scan_spiked_bam on an example bam containing chr22 and spike contigs.

a test GRanges with UMI'ed genomic sequences used as controls

Description

Sources and overlap widths of various read sequences in a test CRAM.

Usage

data(testGR)
data(testGR)

Format

A GRanges object with an mcols() DataFrame containing

UMI1: the unique molecular identifier on the forward read
UMI2: the unique molecular identifier on the reverse read
seq: the sequence of the fragment
name: the name of the fragment
score: whether the fragment passes filters (always 1)

Source

Generated using inst/script/loadTest.R

Tile the assembly-based contigs of a merged assembly/spike GRanges.

Description

refactored out of scan_spiked_bam for more explicit information flow

Usage

tile_bins(gr, binwidth = 300L)
tile_bins(gr, binwidth = 300L)

Arguments

`gr`	the GRanges
`binwidth`	bin width to tile (default is 300)

Value

       a GRanges of bins

Examples


bam <- system.file("extdata", "ex1.bam", package="Rsamtools",
                   mustWork=TRUE)
gr <- as(seqinfo_from_header(bam), "GRanges")
genome(gr) <- "notspike"
tile_bins(gr)


bam <- system.file("extdata", "ex1.bam", package="Rsamtools",
                   mustWork=TRUE)
gr <- as(seqinfo_from_header(bam), "GRanges")
genome(gr) <- "notspike"
tile_bins(gr)

Package 'spiky'

Help Index

decode fragment identifiers for spike-in standards

Description

Usage

Arguments

Value

Examples

create a tiled representation of a genome from the BAM/CRAM file

Description

Usage

Arguments

Details

Value

See Also

Examples

Binned estimation of picomoles of DNA present in cfMeDIP assays

Description

Usage

Arguments

Value

See Also

Examples

Convert Pairs to GRanges

Description

Usage

Arguments

Value

reshape scan_spiked_bam results into data.frames for model_glm_pmol

Description

Usage

Arguments

Value

See Also

Examples

spike-in counts for two samples, as a wide data.frame

Description

Usage

Format

Source

find spike-in seqlevels in an object x, where !is.null(seqinfo(x))

Description

Usage

Arguments

Details

Value

See Also

Examples

various mitochondrial genomes sometimes used as endogenous spike-ins

Description

Usage

Format

Source

for CRAM files, a FASTA reference is required to decode; this builds that

Description

Usage

Arguments

Details

Value

See Also

Examples

Description

Usage

Format

Source

refactored out of rename_spikes and rename_spike_seqlevels

Description

Usage

Arguments

Value

Examples

tabulate read coverage in predefined bins

Description

Usage

Arguments

Value

See Also

Examples

get a GRanges of (by default, standard) chromosomes from seqinfo

Description

reshape `scan_spiked_bam` results into data.frames for model_glm_pmol

find spike-in seqlevels in an object `x`, where !is.null(seqinfo(x))