| Title: | Automated analysis of CRISPR experiments |
|---|---|
| Description: | `amplican` performs alignment of the amplicon reads, normalizes gathered data, calculates multiple statistics (e.g. cut rates, frameshifts) and presents results in form of aggregated reports. Data and statistics can be broken down by experiments, barcodes, user defined groups, guides and amplicons allowing for quick identification of potential problems. |
| Authors: | Kornel Labun [aut], Eivind Valen [cph, cre] |
| Maintainer: | Eivind Valen <[email protected]> |
| License: | GPL-3 |
| Version: | 1.35.0 |
| Built: | 2026-05-29 08:49:08 UTC |
| Source: | https://github.com/bioc/amplican |
Main goals:
Flexible pipeline for analysis of the CRISPR Mi-Seq or Hi-Seq data.
Compatible with GRanges and data.table style.
Precise quantification of mutation rates.
Prepare automatic reports as .Rmd files that are flexible and open for manipulation.
Provide specialized plots for deletions, insertions, mismatches, variants, heterogeneity of the reads.
To learn more about amplican, start with the vignettes:
browseVignettes(package = "amplican")
Maintainer: Eivind Valen [email protected] [copyright holder]
Authors:
Kornel Labun [email protected]
Useful links:
Usefull and needed for barcode reports.
amplican_print_reads(forward, reverse)amplican_print_reads(forward, reverse)
forward |
(character or vector of characters) Forward reads. |
reverse |
(character or vector of characters) Will be reverse complemented before alignment. |
Vector with alignments ready to be printed.
# load example data unassigned_file <- system.file('extdata', 'results', 'alignments', 'unassigned_reads.csv', package = 'amplican') unassigned <- data.table::setDF(data.table::fread(unassigned_file)) # sort by frequency unassigned <- unassigned[order(unassigned$BarcodeFrequency, decreasing = TRUE), ] # print alignment of most frequent unassigned reads cat(amplican_print_reads(unassigned[1, 'Forward'], unassigned[1, 'Reverse']), sep = "\n")# load example data unassigned_file <- system.file('extdata', 'results', 'alignments', 'unassigned_reads.csv', package = 'amplican') unassigned <- data.table::setDF(data.table::fread(unassigned_file)) # sort by frequency unassigned <- unassigned[order(unassigned$BarcodeFrequency, decreasing = TRUE), ] # print alignment of most frequent unassigned reads cat(amplican_print_reads(unassigned[1, 'Forward'], unassigned[1, 'Reverse']), sep = "\n")
amplicanAlign takes a configuration files, fastq reads and output
directory to prepare alignments and summary. It uses global Needleman-Wunsch
algorithm with parameters optimized for CRISPR experiment. After alignments,
object of AlignmentsExperimentSet is returned that allows for
coercion into GRanges (plus is for forward and minus for reverse reads).
It is also possible to output alignments in other, additional formats.
amplicanAlign( config, fastq_folder, use_parallel = FALSE, average_quality = 30, min_quality = 20, filter_n = FALSE, batch_size = 1e+06, scoring_matrix = pwalign::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 0, donor_mismatch = 3, donor_strict = FALSE, temp_folder = NULL, sample = 0, seed = 0 )amplicanAlign( config, fastq_folder, use_parallel = FALSE, average_quality = 30, min_quality = 20, filter_n = FALSE, batch_size = 1e+06, scoring_matrix = pwalign::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 0, donor_mismatch = 3, donor_strict = FALSE, temp_folder = NULL, sample = 0, seed = 0 )
config |
(string) The path to your configuration file. For example:
|
fastq_folder |
(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file. |
use_parallel |
(boolean) Set to TRUE, if you have registered multicore back-end. |
average_quality |
(numeric) The FASTQ file have a quality for each
nucleotide, depending on sequencing technology there exist many formats.
This package uses |
min_quality |
(numeric) Similar as in average_quality, but depicts
the minimum quality for ALL nucleotides in given read. If one of nucleotides
has quality BELLOW |
filter_n |
(boolean) Whether to filter out reads containing N base. |
batch_size |
(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files. |
scoring_matrix |
(matrix) Default is 'NUC44'. Pass desired matrix using
|
gap_opening |
(numeric) The opening gap score. |
gap_extension |
(numeric) The gap extension score. |
fastqfiles |
(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:
|
primer_mismatch |
(numeric) Decide how many mismatches are allowed
during primer matching of the reads, that groups reads by experiments.
When |
donor_mismatch |
(numeric) Maximum number of width-1 events (single-base
mismatches or single-base indels) that are allowed to overlap the
donor-vs-amplicon event positions. Only events whose coordinates fall within
the donor-event window are counted; mismatches or indels elsewhere in the read
are invisible to this threshold. The higher the value the more permissive the
HDR calling. Set to 0 to require the donor region to match perfectly (not
recommended in practice due to sequencing error rate). Only used when a donor
template is provided and |
donor_strict |
(logical) Applies the strict event-presence algorithm for
HDR detection via |
temp_folder |
Folder where we will save temporary files (not deleted after succesfull run) |
sample |
(numeric) if user specifies 'sample' > 0, we will sample only 'sample' reads instead of reading full file, then we will process as normal. |
seed |
(numeric) random seed used for sampling, only used when 'sample' > 0. |
(AlignmentsExperimentSet) Check AlignmentsExperimentSet
class for details. You can use lookupAlignment to examine
alignments visually.
Other analysis steps:
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") aln <- amplicanAlign(config, fastq_folder) aln# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") aln <- amplicanAlign(config, fastq_folder) aln
When forward and reverse reads are in agreement on the events (eg. deletion)
amplicanConsensus will mark forward event as TRUE indicating that he
represents consensus.
In cases where forward and reverse read agree only partially, for example,
they share the same start of the deletion, but they have different end
amplicanConsensus will pick the version of
read with higher alignment score, in situation where both of the reads
overlap expected cut site, otherwise both events will be rejected and marked
FALSE. When there are events only on one of the strands they will be
rejected.
amplicanConsensus(aln, cfgT, overlaps = "overlaps", promiscuous = TRUE)amplicanConsensus(aln, cfgT, overlaps = "overlaps", promiscuous = TRUE)
aln |
(data.frame) Contains relevant events in GRanges style. |
cfgT |
(data.frame) Should be table containing at least positions of primers in the amplicons and their identifiers |
overlaps |
(character) Specifies which metadata column of |
promiscuous |
(boolean) Allows to relax consensus rules. When TRUE will allow Indels that are not confirmed by the other strand (when both are used). |
In situation where you have only forward or only reverse reads don't use this function and assign all TRUE to all of your events.
Consensus out of the forward + reverse reads is required for
amplicanSummary, and amplicanConsensus requires
amplicanOverlap.
(bolean vector) Where TRUE means that given event represents consensus out of forward and reverse reads.
Other analysis steps:
amplicanAlign(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$consensus == amplicanConsensus(aln, cfgT))file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$consensus == amplicanConsensus(aln, cfgT))
Very often alignments return deletions that are not real deletions, but
rather artifact of incomplete reads eg.:
ACTGAAAAA------- <- this "deletion" should be filtered ACTG----ACTGACTG
We call them Events Overlapping Primers and filter them together with reads that are potentially PRIMER DIMERS. This filter will also remove all events coming from reads with low alignment score - potential Off-targets.
amplicanFilter(aln, cfgT, PRIMER_DIMER)amplicanFilter(aln, cfgT, PRIMER_DIMER)
aln |
(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end. |
cfgT |
(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon. |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
(aln) Reduced by events classified as PRIMER DIMER or overlapping primers.
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanFilter(aln, cfgT, 30)file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanFilter(aln, cfgT, 30)
Translate coordinates of GRanges events so that they can be
relative to the amplicon. As point zero we assume first left sided UPPER case
letter in the
amplicon. Be weary that events for amplicons without expected cut sites are
filtered. Don't use this function, if you don't have expected cut sites
specified and don't use any of the metaplots.
amplicanMap(aln, cfgT)amplicanMap(aln, cfgT)
aln |
(data.frame) List of events to map to the relative coordinates. |
cfgT |
(data.frame) config table |
(GRanges) Same as events, but the coordinates are
relative to the expected cut sites.
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
# example config config <- read.csv(system.file("extdata", "config.csv", package = "amplican")) # example events events <- read.csv(system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican")) # make events relative to the UPPER case amplicanMap(events, config)# example config config <- read.csv(system.file("extdata", "config.csv", package = "amplican")) # example events events <- read.csv(system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican")) # make events relative to the UPPER case amplicanMap(events, config)
This function can adjust events for small differences between
known annotations (amplicon sequences) and real DNA of the strain that
was sequenced. Events from the control are grouped by add and
their frequencies are calculated in respect to number of total reads in that
groups. In next step events from the control are filtered according to
min_freq, all events below are treated as sequencing errors and
rejected. Finally, all events that can be found in treatment group that find
their exact match (by non skipped columns) in control group are removed.
All events from control group are returned back.
amplicanNormalize( aln, cfgT, add = c("guideRNA", "Group"), skip = c("counts", "score", "seqnames", "read_id", "strand", "overlaps", "consensus"), min_freq = 0.01 )amplicanNormalize( aln, cfgT, add = c("guideRNA", "Group"), skip = c("counts", "score", "seqnames", "read_id", "strand", "overlaps", "consensus"), min_freq = 0.01 )
aln |
(data.frame) Contains events from alignments. |
cfgT |
(data.frame) Config table with information about experiments. |
add |
(character vector or NULL) Columns from cfgT that should be included
in event table for normalization matching. Defaults to c("guideRNA", "Group")
, which means that only those events created by the same guideRNA in the same
Group will be removed if found in Control. Pass |
skip |
(character vector) Specifies which columns of aln to skip. |
min_freq |
(numeric) All events from control group below this frequency will be not included in filtering. Use this to filter out background noise and sequencing errors. |
(data.frame) Same as aln, but events are normalized. Events from Control are not changed. Additionally columns from add are added to the data.frame.
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
aln <- data.frame(seqnames = 1:5, start = 1, end = 2, width = 2, counts = 101:105) cfgT <- data.frame(ID = 1:5, guideRNA = rep("ACTG", 5), Reads_Filtered = c(2, 2, 3, 3, 4), Group = c("A", "A", "B", "B", "B"), Control = c(TRUE, FALSE, TRUE, FALSE, FALSE)) # all events are same as in the control group, therefore are filtered out # events from control groups stay amplicanNormalize(aln, cfgT) # events that are different from control group are preserved aln[2, "start"] <- 3 amplicanNormalize(aln, cfgT)aln <- data.frame(seqnames = 1:5, start = 1, end = 2, width = 2, counts = 101:105) cfgT <- data.frame(ID = 1:5, guideRNA = rep("ACTG", 5), Reads_Filtered = c(2, 2, 3, 3, 4), Group = c("A", "A", "B", "B", "B"), Control = c(TRUE, FALSE, TRUE, FALSE, FALSE)) # all events are same as in the control group, therefore are filtered out # events from control groups stay amplicanNormalize(aln, cfgT) # events that are different from control group are preserved aln[2, "start"] <- 3 amplicanNormalize(aln, cfgT)
To determine which deletions, insertions and mismatches (events) are probably created by CRISPR we check whether they overlap expected cut sites. Expected cut sites should be specified in UPPER CASE letters in the amplicon sequences.
amplicanOverlap(aln, cfgT, cut_buffer = 5, relative = FALSE)amplicanOverlap(aln, cfgT, cut_buffer = 5, relative = FALSE)
aln |
(data.frame) Contains relevant events in GRanges style. |
cfgT |
(data.frame) Contains amplicon sequences. |
cut_buffer |
(numeric) Number of bases that should expand 5' and 3' of the specified expected cut sites. |
relative |
(boolean) Sets whether events are relative to the position of the target site. |
(bolean vector) Where TRUE means that given event overlaps cut site.
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$overlaps == amplicanOverlap(aln, cfgT))file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$overlaps == amplicanOverlap(aln, cfgT))
amplicanPipeline is convenient wrapper around all functionality of the
package with the most robust settings. It will generate all results in the
result_folder and also knit prepared reports into 'reports' folder.
amplicanPipeline( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = pwalign::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE, sample = 0, seed = 0 )amplicanPipeline( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = pwalign::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE, sample = 0, seed = 0 )
config |
(string) The path to your configuration file. For example:
|
fastq_folder |
(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file. |
results_folder |
(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions. |
knit_reports |
(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted |
write_alignments_format |
(character vector) Whether
|
average_quality |
(numeric) The FASTQ file have a quality for each
nucleotide, depending on sequencing technology there exist many formats.
This package uses |
min_quality |
(numeric) Similar as in average_quality, but depicts
the minimum quality for ALL nucleotides in given read. If one of nucleotides
has quality BELLOW |
filter_n |
(boolean) Whether to filter out reads containing N base. |
batch_size |
(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files. |
use_parallel |
(boolean) Set to TRUE, if you have registered multicore back-end. |
scoring_matrix |
(matrix) Default is 'NUC44'. Pass desired matrix using
|
gap_opening |
(numeric) The opening gap score. |
gap_extension |
(numeric) The gap extension score. |
fastqfiles |
(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:
|
primer_mismatch |
(numeric) Decide how many mismatches are allowed
during primer matching of the reads, that groups reads by experiments.
When |
donor_mismatch |
(numeric) Maximum number of width-1 events (single-base
mismatches or single-base indels) that are allowed to overlap the
donor-vs-amplicon event positions. Only events whose coordinates fall within
the donor-event window are counted; mismatches or indels elsewhere in the read
are invisible to this threshold. The higher the value the more permissive the
HDR calling. Set to 0 to require the donor region to match perfectly (not
recommended in practice due to sequencing error rate). Only used when a donor
template is provided and |
donor_strict |
(logical) Applies the strict event-presence algorithm for
HDR detection via |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
event_filter |
(logical) Whether detection of offtarget reads, should be enabled. |
cut_buffer |
The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions. |
promiscuous_consensus |
(boolean) Whether rules of
|
normalize |
(character vector or NULL) If column 'Control' in config table
has all FALSE/0 values then normalization is skipped. Otherwise,
normalization is strict, which means events that are
found in 'Control' TRUE group will be removed in 'Control' FALSE group.
This parameter by default uses columns 'guideRNA' and 'Group' to impose
additional restrictions on normalized events eg. only events created by the
same 'guideRNA' in the same 'Group' will be normalized. Pass |
min_freq |
(numeric) All events below this frequency are treated as
sequencing errors and rejected. This parameter is used during normalization
through |
continue |
(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed. |
sample |
(numeric) if user specifies 'sample' > 0, we will sample only 'sample' reads instead of reading full file, then we will process as normal. |
seed |
(numeric) random seed used for sampling, only used when 'sample' > 0. |
(invisible) results_folder path
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipelineConservative(),
amplicanReport(),
amplicanSummarize()
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") # output folder results_folder <- tempdir() # full analysis, not knitting files automatically amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") # output folder results_folder <- tempdir() # full analysis, not knitting files automatically amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)
amplicanPipelineIndexHopping is identical as amplicanPipeline except that
default min_freq threshold is set to 0.15. Setting this threshold
higher will decrease risks of inadequate normalization in cases of potential
Index Hopping, potentially decreasing precision of true editing rate calling.
Index Hopping can be mitigated with use of unique dual indexing pooling
combinations. However, in cases when you might expect Index Hopping to occur
you should use this function instead of amplicanPipeline.
amplicanPipelineConservative( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = pwalign::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE, sample = 0, seed = 0 )amplicanPipelineConservative( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = pwalign::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE, sample = 0, seed = 0 )
config |
(string) The path to your configuration file. For example:
|
fastq_folder |
(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file. |
results_folder |
(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions. |
knit_reports |
(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted |
write_alignments_format |
(character vector) Whether
|
average_quality |
(numeric) The FASTQ file have a quality for each
nucleotide, depending on sequencing technology there exist many formats.
This package uses |
min_quality |
(numeric) Similar as in average_quality, but depicts
the minimum quality for ALL nucleotides in given read. If one of nucleotides
has quality BELLOW |
filter_n |
(boolean) Whether to filter out reads containing N base. |
batch_size |
(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files. |
use_parallel |
(boolean) Set to TRUE, if you have registered multicore back-end. |
scoring_matrix |
(matrix) Default is 'NUC44'. Pass desired matrix using
|
gap_opening |
(numeric) The opening gap score. |
gap_extension |
(numeric) The gap extension score. |
fastqfiles |
(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:
|
primer_mismatch |
(numeric) Decide how many mismatches are allowed
during primer matching of the reads, that groups reads by experiments.
When |
donor_mismatch |
(numeric) Maximum number of width-1 events (single-base
mismatches or single-base indels) that are allowed to overlap the
donor-vs-amplicon event positions. Only events whose coordinates fall within
the donor-event window are counted; mismatches or indels elsewhere in the read
are invisible to this threshold. The higher the value the more permissive the
HDR calling. Set to 0 to require the donor region to match perfectly (not
recommended in practice due to sequencing error rate). Only used when a donor
template is provided and |
donor_strict |
(logical) Applies the strict event-presence algorithm for
HDR detection via |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
event_filter |
(logical) Whether detection of offtarget reads, should be enabled. |
cut_buffer |
The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions. |
promiscuous_consensus |
(boolean) Whether rules of
|
normalize |
(character vector or NULL) If column 'Control' in config table
has all FALSE/0 values then normalization is skipped. Otherwise,
normalization is strict, which means events that are
found in 'Control' TRUE group will be removed in 'Control' FALSE group.
This parameter by default uses columns 'guideRNA' and 'Group' to impose
additional restrictions on normalized events eg. only events created by the
same 'guideRNA' in the same 'Group' will be normalized. Pass |
min_freq |
(numeric) All events below this frequency are treated as
sequencing errors and rejected. This parameter is used during normalization
through |
continue |
(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed. |
sample |
(numeric) if user specifies 'sample' > 0, we will sample only 'sample' reads instead of reading full file, then we will process as normal. |
seed |
(numeric) random seed used for sampling, only used when 'sample' > 0. |
result_folder and also knit prepared reports into 'reports' folder.
(invisible) results_folder path
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanReport(),
amplicanSummarize()
amplicanReport takes a configuration file, fastq reads and output directory to prepare summaries as an editable .Rmd file. You can specify whether you want to make summaries based on ID, Barcode, Group or even guideRNA and Amplicon. This function automatically knits all reports after creation. If you want to postpone knitting and edit reports, use .Rmd templates to create your own version of reports instead of this function.
amplicanReport( results_folder, levels = c("id", "barcode", "group", "guide", "amplicon", "summary"), report_files = c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index"), cut_buffer = 5, xlab_spacing = 4, top = 5, knit_reports = TRUE )amplicanReport( results_folder, levels = c("id", "barcode", "group", "guide", "amplicon", "summary"), report_files = c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index"), cut_buffer = 5, xlab_spacing = 4, top = 5, knit_reports = TRUE )
results_folder |
(string) Folder containing results from the
|
levels |
(vector) Possible values are: "id", "barcode", "group", "guide", "amplicon", "summary". You can also input more than one value eg. c("id", "barcode") will create two separate reports for each level. |
report_files |
(vector) You can supply your own names of the files. For each of the levels there has to be one file name. Files are created in current working directory by default. |
cut_buffer |
(numeric) Default 5. A number of bases that is used around the specified cut site. |
xlab_spacing |
(numeric) Default is 4. Spacing of the ticks on the x axis of plots. |
top |
(numeric) Default is 5. How many of the top most frequent unassigned reads to report? It is only relevant when you used forward and reverse reads. We align them to each other as we could not specify correct amplicon. |
knit_reports |
(boolean) Whether to knit reports automatically. |
(string) Path to the folder with results.
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanSummarize()
results_folder <- tempdir() amplicanReport(results_folder, report_files = file.path(results_folder, c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index")), knit_reports = FALSE)results_folder <- tempdir() amplicanReport(results_folder, report_files = file.path(results_folder, c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index")), knit_reports = FALSE)
Before using this function make sure events are filtered to represent
consensus with amplicanConsensus, if you use both forward and
reverse reads. If you want to calculate metrics over expected cut site,
filter events using amplicanOverlap.
amplicanSummarize(aln, cfgT)amplicanSummarize(aln, cfgT)
aln |
(data.frame) Contains events from the alignments. |
cfgT |
(data.frame) Config file with the experiments details. |
Adds columns to cfgT:
Count of reads identified as Homology Directed Repair events.
Count of reads containing at least one deletion.
Count of reads containing at least one insertion.
Count of reads with any edit (insertion, deletion, or HDR).
Count of reads with a frameshift (net indel length is not a multiple of 3).
(data.frame) As cfgT, but with extra columns.
Other analysis steps:
amplicanAlign(),
amplicanConsensus(),
amplicanFilter(),
amplicanMap(),
amplicanNormalize(),
amplicanOverlap(),
amplicanPipeline(),
amplicanPipelineConservative(),
amplicanReport()
file_path <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanSummarize(aln, cfgT)file_path <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanSummarize(aln, cfgT)
Transform extended CIGAR strings into GRanges representation
with events of deletions, insertions and mismatches.
cigarsToEvents( cigars, aln_pos_start, query_seq, ref, read_id, mapq, seqnames, strands, counts )cigarsToEvents( cigars, aln_pos_start, query_seq, ref, read_id, mapq, seqnames, strands, counts )
cigars |
(character) Extended CIGARS. |
aln_pos_start |
(integer) Pos of CIGARS. |
query_seq |
(character) Aligned query sequences. |
ref |
(character) Reference sequences used for alignment. |
read_id |
(numeric) Read id for assignment for each of the CIGARS. |
mapq |
(numeric) Maping scores. |
seqnames |
(character) Names of the sequences, potentially ids of the reference sequences. |
strands |
(character) Strands to assign. |
counts |
(integer) Vector of cigar counts, if data collapsed. |
(GRanges) Same as events.
Generate all combinations along string seq swapping m
characters at a time with letters defined in dictionary letters.
Allows, for instance, to create a list of possible primers with two
mismatches.
comb_along(seq, m = 2, letters = c("A", "C", "T", "G"))comb_along(seq, m = 2, letters = c("A", "C", "T", "G"))
seq |
(character) input character to permutate |
m |
(integer) number of elements to permutate at each step |
letters |
(character vector) dictionary source for combinations of elements |
(character vector) all unique combinations of permutated string
comb_along("AC") comb_along("AAA", 1) comb_along("AAA") comb_along("AAA", 3) comb_along("AAAAAAAAAA")comb_along("AC") comb_along("AAA", 1) comb_along("AAA") comb_along("AAA", 3) comb_along("AAAAAAAAAA")
Very often alignments return deletions that are not real deletions, but
rather artifact of incomplete reads eg.:
ACTGAAAAA------- <- this "deletion" should be filtered ACTG----ACTGACTG
findEOP(aln, cfgT)findEOP(aln, cfgT)
aln |
(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end. |
cfgT |
(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon. |
(logical vector) where TRUE indicates events that are overlapping primers
Other filters:
findLQR(),
findPD()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findEOP(aln, cfgT)file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findEOP(aln, cfgT)
Will try to detect off-targets and low quality alignments (outliers). It
tries k-means clustering on normalized number of events per read and read
alignment score. If there are 3 clusters (decided based on silhouette
criterion) cluster with high event count and low alignment score will be
marked for filtering. When there is less than 1000
scores in aln it will filter nothing.
findLQR(aln)findLQR(aln)
aln |
(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end, score. |
(logical vector) where TRUE indicates events that are potential off-targets or low quality alignments.
Other filters:
findEOP(),
findPD()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) aln <- aln[seqnames == "ID_1"] # for first experiment findLQR(aln)file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) aln <- aln[seqnames == "ID_1"] # for first experiment findLQR(aln)
Use to filter reads that are most likely PRIMER DIMERS.
findPD(aln, cfgT, PRIMER_DIMER = 30)findPD(aln, cfgT, PRIMER_DIMER = 30)
aln |
(data.frame) Should contain events from alignments in
|
cfgT |
(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon. |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
(logical) Where TRUE indicates event classified as PRIMER DIMER
Other filters:
findEOP(),
findLQR()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findPD(aln, cfgT)file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findPD(aln, cfgT)
This set of functionality is copied from ggforce package due to dependency issues on Bioconductor and is used internally (not exported) only. This set of geoms makes it possible to connect points creating either quadratic or cubic beziers. bezier works by calculating points along the bezier and connecting these to draw the curve.
stat_bezier( mapping = NULL, data = NULL, geom = "path", position = "identity", na.rm = FALSE, show.legend = NA, n = 100, inherit.aes = TRUE, ... ) geom_bezier( mapping = NULL, data = NULL, stat = "bezier", position = "identity", arrow = NULL, lineend = "butt", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, n = 100, ... )stat_bezier( mapping = NULL, data = NULL, geom = "path", position = "identity", na.rm = FALSE, show.legend = NA, n = 100, inherit.aes = TRUE, ... ) geom_bezier( mapping = NULL, data = NULL, stat = "bezier", position = "identity", arrow = NULL, lineend = "butt", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, n = 100, ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
The geometric object to use to display the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
n |
The number of points to create for each segment |
inherit.aes |
If |
... |
Other arguments passed on to
|
stat |
The statistical transformation to use on the data for this layer.
When using a
|
arrow |
Arrow specification, as created by |
lineend |
Line end style (round, butt, square). |
Input data is understood as a sequence of data points the first being the start point, then followed by one or two control points and then the end point. More than 4 and less than 3 points per group will throw an error.
geom_link, geom_link2 and geom_lin0 understand the following aesthetics (required aesthetics are in bold):
- **x** - **y** - color - linewidth - linetype - alpha - lineend
The interpolated point coordinates
The progression along the interpolation mapped between 0 and 1
Thomas Lin Pedersen
beziers <- data.frame( x = c(1, 2, 3, 4, 4, 6, 6), y = c(0, 2, 0, 0, 2, 2, 0), type = rep(c('cubic', 'quadratic'), c(3, 4)), point = c('end', 'control', 'end', 'end', 'control', 'control', 'end') ) help_lines <- data.frame( x = c(1, 3, 4, 6), xend = c(2, 2, 4, 6), y = 0, yend = 2 ) ggplot2::ggplot() + ggplot2::geom_segment( ggplot2::aes(x = x, xend = xend, y = y, yend = yend), data = help_lines, arrow = ggplot2::arrow(length = ggplot2::unit(c(0, 0, 0.5, 0.5), 'cm')), colour = 'grey') + amplican:::geom_bezier(ggplot2::aes(x= x, y = y, group = type, linetype = type), data = beziers) + ggplot2::geom_point(ggplot2::aes(x = x, y = y, colour = point), data = beziers)beziers <- data.frame( x = c(1, 2, 3, 4, 4, 6, 6), y = c(0, 2, 0, 0, 2, 2, 0), type = rep(c('cubic', 'quadratic'), c(3, 4)), point = c('end', 'control', 'end', 'end', 'control', 'control', 'end') ) help_lines <- data.frame( x = c(1, 3, 4, 6), xend = c(2, 2, 4, 6), y = 0, yend = 2 ) ggplot2::ggplot() + ggplot2::geom_segment( ggplot2::aes(x = x, xend = xend, y = y, yend = yend), data = help_lines, arrow = ggplot2::arrow(length = ggplot2::unit(c(0, 0, 0.5, 0.5), 'cm')), colour = 'grey') + amplican:::geom_bezier(ggplot2::aes(x= x, y = y, group = type, linetype = type), data = beziers) + ggplot2::geom_point(ggplot2::aes(x = x, y = y, colour = point), data = beziers)
Transforms aligned strings into GRanges representation with events of deletions, insertions and mismatches. Subject should come from one amplicon sequence, after alignment to many sequences (patterns).
getEvents( pattern, subject, scores, ID = "NA", ampl_shift = 1L, ampl_start = 1L, strand_info = "+" )getEvents( pattern, subject, scores, ID = "NA", ampl_shift = 1L, ampl_start = 1L, strand_info = "+" )
pattern |
(character) Aligned pattern. |
subject |
(character) Aligned subject. |
scores |
(integer) Alignment scores of the pattern and subject. |
ID |
(character) Will be used as seqnames of output GRanges. |
ampl_shift |
(numeric) Possible shift of the amplicons. |
ampl_start |
(numeric) Real amplicon starts.
|
strand_info |
(character) Strands to assign. |
(GRanges) Same as events.
This is the strict counterpart to is_hdr. A read is marked HDR
if and only if every event that distinguishes the donor from the amplicon
is present verbatim (same start, end, width,
originally, replacement, and type) in that read's
consensus events.
is_hdr_strict( aln, cfgT, scoring_matrix, gap_opening = 25, gap_extension = 0, donor_mismatch = Inf, cut_buffer = 5 )is_hdr_strict( aln, cfgT, scoring_matrix, gap_opening = 25, gap_extension = 0, donor_mismatch = Inf, cut_buffer = 5 )
aln |
(data.table) Consensus-filtered, shifted, and normalised events
table (must contain columns |
cfgT |
(data.table or data.frame) Config table with at least columns
|
scoring_matrix |
Substitution matrix passed to
|
gap_opening |
(numeric) Gap-opening penalty. Default 25. |
gap_extension |
(numeric) Gap-extension penalty. Default 0. |
donor_mismatch |
(numeric) Maximum number of extra consensus events
(beyond the required donor events) allowed within the amplicon UPPERCASE
window. Set to |
cut_buffer |
(numeric) Number of bases to expand the UPPERCASE window
on each side. Same semantics as in |
Key behaviours:
Only rows where consensus == TRUE are used to match donor events.
However, the readType flag is written back to all rows
sharing the same read_id (including non-consensus rows).
When donor_mismatch = Inf (default), additional events in the
read (noise mismatches, extra indels) do not disqualify a
read — only the absence of a required donor event does.
When donor_mismatch is finite (e.g. 0), extra consensus events
within the amplicon UPPERCASE window (expanded by
cut_buffer) are counted. If more than donor_mismatch
extra events fall in that window, the read is rejected. Events
outside the window (e.g. near primers) are ignored.
A different substitution at the same position as a donor event (e.g.
the donor has G->A but the read has G->C) is rejected because the
replacement column differs in the inner-join merge.
If the donor and amplicon are identical (no events), or if no read
events match any donor event, the function returns aln unchanged.
(data.table) Same as aln on entry, but readType is set
to TRUE for every row whose read_id contains all donor events
and at most donor_mismatch additional events in the window.
is_hdr for the permissive, score-based alternative.
This function finds a primer that can be partially truncated at the 5' end. It handles both full matches inside the read and partial matches at the beginning of the read.
locate_pr_start(reads, primer, m = 3, min_overlap = ceiling(nchar(primer)/2))locate_pr_start(reads, primer, m = 3, min_overlap = ceiling(nchar(primer)/2))
reads |
A character vector of DNA sequences. |
primer |
A single character string for the primer sequence. |
m |
The maximum number of allowed mismatches. |
min_overlap |
The minimum number of base pairs the primer must overlap with the read. A good value is often around half the primer length. |
A numeric vector of the EARLIEST start position for a valid match, with NA for no match.
This function plots deletions in relation to the amplicons for given
selection vector that groups values by given config group.
All reads should
already be converted to their relative position to their respective amplicon
using amplicanMap.
Top plot is for the forward reads and bottom plot is for reverse reads.
metaplot_deletions(alnmt, config, group, selection, over = "overlaps")metaplot_deletions(alnmt, config, group, selection, over = "overlaps")
alnmt |
(data.frame) Loaded alignment information from events_filtered_shifted_normalized.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
group |
(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection. |
selection |
(string or vector of strings) Values from config column specified in group argument. |
over |
(string) Specify which column contains overlaps with
expected cut sites generated by |
(deletions metaplot) ggplot2 object of deletions metaplot
Other specialized plots:
metaplot_insertions(),
metaplot_mismatches(),
plot_cuts(),
plot_deletions(),
plot_heterogeneity(),
plot_insertions(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_deletions(alignments[alignments$consensus, ], config, "Group", "Betty")#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_deletions(alignments[alignments$consensus, ], config, "Group", "Betty")
This function plots insertions in relation to the amplicons for given
selection vector that groups values by given config group.
All reads should
already be converted to their relative position to their respective amplicon
using amplicanMap.
Top plot is for the forward reads and bottom plot is for reverse reads.
metaplot_insertions(alnmt, config, group, selection)metaplot_insertions(alnmt, config, group, selection)
alnmt |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
group |
(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection. |
selection |
(string or vector of strings) Values from config column specified in group argument. |
(insertions metaplot) ggplot2 object of insertions metaplot
Other specialized plots:
metaplot_deletions(),
metaplot_mismatches(),
plot_cuts(),
plot_deletions(),
plot_heterogeneity(),
plot_insertions(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_insertions(alignments[alignments$consensus, ], config, "Group", "Betty")#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_insertions(alignments[alignments$consensus, ], config, "Group", "Betty")
Plots mismatches in relation to the amplicons for given
selection vector that groups values by given config group.
All reads should
already be converted to their relative position to their respective amplicon
using amplicanMap.
Zero position on new coordinates is the most left UPPER case letter of
the respective amplicon. This function filters out all alignment events
that have amplicons without UPPER case defined.
Top plot is for the forward reads and bottom plot is for reverse reads.
metaplot_mismatches(alnmt, config, group, selection)metaplot_mismatches(alnmt, config, group, selection)
alnmt |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
group |
(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection. |
selection |
(string or vector of strings) Values from config column specified in group argument. |
(mismatches metaplot) ggplot2 object of mismatches metaplot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
plot_cuts(),
plot_deletions(),
plot_heterogeneity(),
plot_insertions(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_mismatches(alignments, config, "Group", "Betty")#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_mismatches(alignments, config, "Group", "Betty")
Parse EMBOSS needle (or needleall) "pair" format into GRanges representation with events of deletions, insertions and mismatches. Make sure that each file corresponds to single subject (single amplicon). Assumes that bottom sequence "-bsequence" corresponds to the "subject" and full sequence alignment is returned.
pairToEvents(file, ID = "NA", strand_info = "+")pairToEvents(file, ID = "NA", strand_info = "+")
file |
(character) File path. |
ID |
(character) ID of the experiment, will be used as seqnames of the reutner ranges. |
strand_info |
(character) Strand to assign. |
(GRanges) Same as events.
This function plots cuts in relation to the amplicon with distinction for each ID.
plot_cuts(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)plot_cuts(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
(cuts plot) gtable object of cuts plot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
metaplot_mismatches(),
plot_deletions(),
plot_heterogeneity(),
plot_insertions(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_cuts(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3'))#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_cuts(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3'))
This function plots deletions in relation to the amplicon, assumes events are relative to the expected cut site. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.
plot_deletions( alignments, config, id, cut_buffer = 5, xlab_spacing = 4, over = "overlaps" )plot_deletions( alignments, config, id, cut_buffer = 5, xlab_spacing = 4, over = "overlaps" )
alignments |
(data.frame) Loaded alignment information from alignments.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
over |
(string) Specify which columns contains overlaps with
expected cut sites generated by |
(deletions plot) gtable object of deletions plot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
metaplot_mismatches(),
plot_cuts(),
plot_heterogeneity(),
plot_insertions(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_deletions(alignments[alignments$consensus, ], config, c('ID_1','ID_3'))#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_deletions(alignments[alignments$consensus, ], config, c('ID_1','ID_3'))
Helper function to calculate figure height based on number of elements to plot for automating sizes of figures in knited reports.
plot_height(x)plot_height(x)
x |
(numeric) number of elements to fit onto height axis |
(numeric) In inches
plot_height(20)plot_height(20)
This function creates stacked barplot explaining reads heterogeneity. It groups reads by user defined levels and measures how unique are reads in this level. Uniqueness of reads is simplified to the bins and colored according to the color gradient. Default color black indicates very high heterogeneity of the reads. The more yellow (default) the more similar are reads and less heterogeneous.
plot_heterogeneity( alignments, config, level = "ID", colors = c("#000000", "#F0E442"), bins = c(0, 5, seq(10, 100, 10)) )plot_heterogeneity( alignments, config, level = "ID", colors = c("#000000", "#F0E442"), bins = c(0, 5, seq(10, 100, 10)) )
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
level |
(string) Name of the column from config file specifying levels to group by. |
colors |
(html colors vector) Two colours for gradient, eg. c('#000000', '#F0E442'). |
bins |
(numeric vector) Numeric vector from 0 to 100 specifying bins eg. c(0, 5, seq(10, 100, 10)). |
(heterogeneity plot) ggplot2 object of heterogeneity plot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
metaplot_mismatches(),
plot_cuts(),
plot_deletions(),
plot_insertions(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_heterogeneity(alignments[alignments$consensus, ], config)#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_heterogeneity(alignments[alignments$consensus, ], config)
This function plots insertions in relation to the amplicon. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.
plot_insertions(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)plot_insertions(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
(insertions plot) gtable object of insertions plot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
metaplot_mismatches(),
plot_cuts(),
plot_deletions(),
plot_heterogeneity(),
plot_mismatches(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_insertions(alignments, config, c('ID_1','ID_3'))#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_insertions(alignments, config, c('ID_1','ID_3'))
Plots mismatches in relation to the amplicon, assumes your reads are relative to the respective amplicon sequences predicted cut sites. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.
plot_mismatches(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)plot_mismatches(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs, if it is possible to group them. They have to have the same amplicon, amplicons on the reverse strand will be reverse complemented to match forward strand amplicons. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
(mismatches plot) gtable object of mismatches plot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
metaplot_mismatches(),
plot_cuts(),
plot_deletions(),
plot_heterogeneity(),
plot_insertions(),
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) id <- c('ID_1', 'ID_3'); cut_buffer = 5; xlab_spacing = 4; p <- plot_mismatches(alignments, config, c('ID_1', 'ID_3')) ggplot2::ggsave("~/test.png", p, width = 25, units = "in")#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) id <- c('ID_1', 'ID_3'); cut_buffer = 5; xlab_spacing = 4; p <- plot_mismatches(alignments, config, c('ID_1', 'ID_3')) ggplot2::ggsave("~/test.png", p, width = 25, units = "in")
This function plots variants in relation to the amplicon. Shows sequences of top mutants without aggregating on deletions, insertions and mismatches.
plot_variants( alignments, config, id, cut_buffer = 5, top = 10, annot = if (amplican::get_seq(config, id, "Donor") == "") "cov" else NA, summary_plot = amplican::get_seq(config, id, "Donor") == "", frameshift = amplican::get_seq(config, id, "Donor") == "" )plot_variants( alignments, config, id, cut_buffer = 5, top = 10, annot = if (amplican::get_seq(config, id, "Donor") == "") "cov" else NA, summary_plot = amplican::get_seq(config, id, "Donor") == "", frameshift = amplican::get_seq(config, id, "Donor") == "" )
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. If Donor is available we will try to add the first donor and mark it on the plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
top |
(numeric) Specify number of most frequent reads to plot. By
default it is 10. Check |
annot |
("codon" or "cov" or NA) What to display for annotation top plot. When NA will not display anything, also not display total summary. Codon plot is all reading frames for a given window, and the default "cov" is coverage of all indels and mismatches over a given window. |
summary_plot |
(boolean) Whether small summary plot in the upper right corner should be displayed. Top bar summarizes total reads with frameshift (F), reads with Edits without Frameshift (Edits) and reads without Edits (Match). |
frameshift |
(boolean) Whether to include Frameshift column in the table. annot on | off |
Top plot shows all six possible frames for given amplicon. Amino acids are
colored as follows:
| Small nonpolar | G, A, S, T | Orange | ||
| Hydrophobic | C, V, I, L, P, F, Y, M, W | Green | ||
| Polar | N, Q, H | Magenta | ||
| Negatively charged | D, E | Red | ||
| Positively charged | K, R | Blue | ||
| Other | eg. *, U, + | Grey |
Variant plot shows amplicon reference, UPPER letters which were the basis for window selection are highlighted with dashed white box (guideRNA). Black triangles are reflecting insertion points. Dashed letters indicate deletions. Table associated with variant plot represents:
Frequency of given read in experiment. Variants are ordered by frequency value.
Represents raw count of this variant reads in experiment.
Sum of deletion and insertion widths of events overlapping presented window. Green background indicates frameshift.
(variant plot) gtable object of variants plot
Other specialized plots:
metaplot_deletions(),
metaplot_insertions(),
metaplot_mismatches(),
plot_cuts(),
plot_deletions(),
plot_heterogeneity(),
plot_insertions(),
plot_mismatches()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) alignments <- alignments[alignments$consensus & alignments$overlaps, ] p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3')) # with Donor we dont plot summary and the annot, summary plot and frameshift p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_5'))#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) alignments <- alignments[alignments$consensus & alignments$overlaps, ] p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3')) # with Donor we dont plot summary and the annot, summary plot and frameshift p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_5'))
This function generates a waffle chart, which is a grid of squares showing parts of a whole. It's a simple replacement for the deprecated 'waffle' package for basic use cases.
waffle(parts, rows = 10, title = NULL, colors = NULL, legend_pos = "bottom")waffle(parts, rows = 10, title = NULL, colors = NULL, legend_pos = "bottom")
parts |
A named numeric vector where names are the categories and values are the number of squares to be plotted for each category. |
rows |
The number of rows in the waffle grid. The number of columns will be calculated based on the total number of squares. |
title |
A string for the plot title. |
colors |
A character vector of colors to use for the different categories. The number of colors should match the number of categories in 'parts'. |
legend_pos |
The position of the legend (e.g., "bottom", "right", "none"). |
A ggplot object representing the waffle chart.
read_q_per <- c("Good Reads\n100 (80%)" = 80, "Bad Reads\n25 (20%)" = 20) waffle( parts = read_q_per, rows = 10, title = "Quality of all reads", colors = c('#E69F00', '#000000') )read_q_per <- c("Good Reads\n100 (80%)" = 80, "Bad Reads\n25 (20%)" = 20) waffle( parts = read_q_per, rows = 10, title = "Quality of all reads", colors = c('#E69F00', '#000000') )