Title: | Automated analysis of CRISPR experiments |
---|---|
Description: | `amplican` performs alignment of the amplicon reads, normalizes gathered data, calculates multiple statistics (e.g. cut rates, frameshifts) and presents results in form of aggregated reports. Data and statistics can be broken down by experiments, barcodes, user defined groups, guides and amplicons allowing for quick identification of potential problems. |
Authors: | Kornel Labun [aut], Eivind Valen [cph, cre] |
Maintainer: | Eivind Valen <[email protected]> |
License: | GPL-3 |
Version: | 1.29.0 |
Built: | 2024-11-02 06:00:32 UTC |
Source: | https://github.com/bioc/amplican |
Main goals:
Flexible pipeline for analysis of the CRISPR Mi-Seq or Hi-Seq data.
Compatible with GRanges and data.table style.
Precise quantification of mutation rates.
Prepare automatic reports as .Rmd files that are flexible and open for manipulation.
Provide specialized plots for deletions, insertions, mismatches, variants, heterogeneity of the reads.
To learn more about amplican, start with the vignettes:
browseVignettes(package = "amplican")
Maintainer: Eivind Valen [email protected] [copyright holder]
Authors:
Kornel Labun [email protected]
Useful links:
Usefull and needed for barcode reports.
amplican_print_reads(forward, reverse)
amplican_print_reads(forward, reverse)
forward |
(character or vector of characters) Forward reads. |
reverse |
(character or vector of characters) Will be reverse complemented before alignment. |
Vector with alignments ready to be printed.
# load example data unassigned_file <- system.file('extdata', 'results', 'alignments', 'unassigned_reads.csv', package = 'amplican') unassigned <- data.table::setDF(data.table::fread(unassigned_file)) # sort by frequency unassigned <- unassigned[order(unassigned$BarcodeFrequency, decreasing = TRUE), ] # print alignment of most frequent unassigned reads cat(amplican_print_reads(unassigned[1, 'Forward'], unassigned[1, 'Reverse']), sep = "\n")
# load example data unassigned_file <- system.file('extdata', 'results', 'alignments', 'unassigned_reads.csv', package = 'amplican') unassigned <- data.table::setDF(data.table::fread(unassigned_file)) # sort by frequency unassigned <- unassigned[order(unassigned$BarcodeFrequency, decreasing = TRUE), ] # print alignment of most frequent unassigned reads cat(amplican_print_reads(unassigned[1, 'Forward'], unassigned[1, 'Reverse']), sep = "\n")
amplicanAlign takes a configuration files, fastq reads and output
directory to prepare alignments and summary. It uses global Needleman-Wunsch
algorithm with parameters optimized for CRISPR experiment. After alignments,
object of AlignmentsExperimentSet
is returned that allows for
coercion into GRanges (plus is for forward and minus for reverse reads).
It is also possible to output alignments in other, additional formats.
amplicanAlign( config, fastq_folder, use_parallel = FALSE, average_quality = 30, min_quality = 20, filter_n = FALSE, batch_size = 1e+06, scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 0, donor_mismatch = 3, donor_strict = FALSE )
amplicanAlign( config, fastq_folder, use_parallel = FALSE, average_quality = 30, min_quality = 20, filter_n = FALSE, batch_size = 1e+06, scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 0, donor_mismatch = 3, donor_strict = FALSE )
config |
(string) The path to your configuration file. For example:
|
fastq_folder |
(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file. |
use_parallel |
(boolean) Set to TRUE, if you have registered multicore back-end. |
average_quality |
(numeric) The FASTQ file have a quality for each
nucleotide, depending on sequencing technology there exist many formats.
This package uses |
min_quality |
(numeric) Similar as in average_quality, but depicts
the minimum quality for ALL nucleotides in given read. If one of nucleotides
has quality BELLOW |
filter_n |
(boolean) Whether to filter out reads containing N base. |
batch_size |
(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files. |
scoring_matrix |
(matrix) Default is 'NUC44'. Pass desired matrix using
|
gap_opening |
(numeric) The opening gap score. |
gap_extension |
(numeric) The gap extension score. |
fastqfiles |
(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:
|
primer_mismatch |
(numeric) Decide how many mismatches are allowed
during primer matching of the reads, that groups reads by experiments.
When |
donor_mismatch |
(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers. |
donor_strict |
(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming. |
(AlignmentsExperimentSet) Check AlignmentsExperimentSet
class for details. You can use lookupAlignment
to examine
alignments visually.
Other analysis steps:
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") aln <- amplicanAlign(config, fastq_folder) aln
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") aln <- amplicanAlign(config, fastq_folder) aln
When forward and reverse reads are in agreement on the events (eg. deletion)
amplicanConsensus
will mark forward event as TRUE indicating that he
represents consensus.
In cases where forward and reverse read agree only partially, for example,
they share the same start of the deletion, but they have different end
amplicanConsensus
will pick the version of
read with higher alignment score, in situation where both of the reads
overlap expected cut site, otherwise both events will be rejected and marked
FALSE. When there are events only on one of the strands they will be
rejected.
amplicanConsensus(aln, cfgT, overlaps = "overlaps", promiscuous = TRUE)
amplicanConsensus(aln, cfgT, overlaps = "overlaps", promiscuous = TRUE)
aln |
(data.frame) Contains relevant events in GRanges style. |
cfgT |
(data.frame) Should be table containing at least positions of primers in the amplicons and their identifiers |
overlaps |
(character) Specifies which metadata column of |
promiscuous |
(boolean) Allows to relax consensus rules. When TRUE will allow Indels that are not confirmed by the other strand (when both are used). |
In situation where you have only forward or only reverse reads don't use this function and assign all TRUE to all of your events.
Consensus out of the forward + reverse reads is required for
amplicanSummary
, and amplicanConsensus
requires
amplicanOverlap
.
(bolean vector) Where TRUE means that given event represents consensus out of forward and reverse reads.
Other analysis steps:
amplicanAlign()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$consensus == amplicanConsensus(aln, cfgT))
file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$consensus == amplicanConsensus(aln, cfgT))
Very often alignments return deletions that are not real deletions, but
rather artifact of incomplete reads eg.:
ACTGAAAAA------- <- this "deletion" should be filtered ACTG----ACTGACTG
We call them Events Overlapping Primers and filter them together with reads that are potentially PRIMER DIMERS. This filter will also remove all events coming from reads with low alignment score - potential Off-targets.
amplicanFilter(aln, cfgT, PRIMER_DIMER)
amplicanFilter(aln, cfgT, PRIMER_DIMER)
aln |
(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end. |
cfgT |
(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon. |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
(aln) Reduced by events classified as PRIMER DIMER or overlapping primers.
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanFilter(aln, cfgT, 30)
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanFilter(aln, cfgT, 30)
Translate coordinates of GRanges
events so that they can be
relative to the amplicon. As point zero we assume first left sided UPPER case
letter in the
amplicon. Be weary that events for amplicons without expected cut sites are
filtered. Don't use this function, if you don't have expected cut sites
specified and don't use any of the metaplots.
amplicanMap(aln, cfgT)
amplicanMap(aln, cfgT)
aln |
(data.frame) List of events to map to the relative coordinates. |
cfgT |
(data.frame) config table |
(GRanges
) Same as events, but the coordinates are
relative to the expected cut sites.
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
# example config config <- read.csv(system.file("extdata", "config.csv", package = "amplican")) # example events events <- read.csv(system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican")) # make events relative to the UPPER case amplicanMap(events, config)
# example config config <- read.csv(system.file("extdata", "config.csv", package = "amplican")) # example events events <- read.csv(system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican")) # make events relative to the UPPER case amplicanMap(events, config)
This function can adjust events for small differences between
known annotations (amplicon sequences) and real DNA of the strain that
was sequenced. Events from the control are grouped by add
and
their frequencies are calculated in respect to number of total reads in that
groups. In next step events from the control are filtered according to
min_freq
, all events below are treated as sequencing errors and
rejected. Finally, all events that can be found in treatment group that find
their exact match (by non skipped columns) in control group are removed.
All events from control group are returned back.
amplicanNormalize( aln, cfgT, add = c("guideRNA", "Group"), skip = c("counts", "score", "seqnames", "read_id", "strand", "overlaps", "consensus"), min_freq = 0.01 )
amplicanNormalize( aln, cfgT, add = c("guideRNA", "Group"), skip = c("counts", "score", "seqnames", "read_id", "strand", "overlaps", "consensus"), min_freq = 0.01 )
aln |
(data.frame) Contains events from alignments. |
cfgT |
(data.frame) Config table with information about experiments. |
add |
(character vector) Columns from cfgT that should be included in event table for normalization matching. Defaults to c("guideRNA", "Group") , which means that only those events created by the same guideRNA in the same Group will be removed if found in Control. |
skip |
(character vector) Specifies which columns of aln to skip. |
min_freq |
(numeric) All events from control group below this frequency will be not included in filtering. Use this to filter out background noise and sequencing errors. |
(data.frame) Same as aln, but events are normalized. Events from Control are not changed. Additionally columns from add are added to the data.frame.
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
aln <- data.frame(seqnames = 1:5, start = 1, end = 2, width = 2, counts = 101:105) cfgT <- data.frame(ID = 1:5, guideRNA = rep("ACTG", 5), Reads_Filtered = c(2, 2, 3, 3, 4), Group = c("A", "A", "B", "B", "B"), Control = c(TRUE, FALSE, TRUE, FALSE, FALSE)) # all events are same as in the control group, therefore are filtered out # events from control groups stay amplicanNormalize(aln, cfgT) # events that are different from control group are preserved aln[2, "start"] <- 3 amplicanNormalize(aln, cfgT)
aln <- data.frame(seqnames = 1:5, start = 1, end = 2, width = 2, counts = 101:105) cfgT <- data.frame(ID = 1:5, guideRNA = rep("ACTG", 5), Reads_Filtered = c(2, 2, 3, 3, 4), Group = c("A", "A", "B", "B", "B"), Control = c(TRUE, FALSE, TRUE, FALSE, FALSE)) # all events are same as in the control group, therefore are filtered out # events from control groups stay amplicanNormalize(aln, cfgT) # events that are different from control group are preserved aln[2, "start"] <- 3 amplicanNormalize(aln, cfgT)
To determine which deletions, insertions and mismatches (events) are probably created by CRISPR we check whether they overlap expected cut sites. Expected cut sites should be specified in UPPER CASE letters in the amplicon sequences.
amplicanOverlap(aln, cfgT, cut_buffer = 5, relative = FALSE)
amplicanOverlap(aln, cfgT, cut_buffer = 5, relative = FALSE)
aln |
(data.frame) Contains relevant events in GRanges style. |
cfgT |
(data.frame) Contains amplicon sequences. |
cut_buffer |
(numeric) Number of bases that should expand 5' and 3' of the specified expected cut sites. |
relative |
(boolean) Sets whether events are relative to the position of the target site. |
(bolean vector) Where TRUE means that given event overlaps cut site.
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$overlaps == amplicanOverlap(aln, cfgT))
file_path <- system.file("test_data", "test_aln.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("test_data", "test_cfg.csv", package = "amplican")) all(aln$overlaps == amplicanOverlap(aln, cfgT))
amplicanPipeline is convenient wrapper around all functionality of the
package with the most robust settings. It will generate all results in the
result_folder
and also knit prepared reports into 'reports' folder.
amplicanPipeline( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE )
amplicanPipeline( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE )
config |
(string) The path to your configuration file. For example:
|
fastq_folder |
(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file. |
results_folder |
(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions. |
knit_reports |
(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted |
write_alignments_format |
(character vector) Whether
|
average_quality |
(numeric) The FASTQ file have a quality for each
nucleotide, depending on sequencing technology there exist many formats.
This package uses |
min_quality |
(numeric) Similar as in average_quality, but depicts
the minimum quality for ALL nucleotides in given read. If one of nucleotides
has quality BELLOW |
filter_n |
(boolean) Whether to filter out reads containing N base. |
batch_size |
(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files. |
use_parallel |
(boolean) Set to TRUE, if you have registered multicore back-end. |
scoring_matrix |
(matrix) Default is 'NUC44'. Pass desired matrix using
|
gap_opening |
(numeric) The opening gap score. |
gap_extension |
(numeric) The gap extension score. |
fastqfiles |
(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:
|
primer_mismatch |
(numeric) Decide how many mismatches are allowed
during primer matching of the reads, that groups reads by experiments.
When |
donor_mismatch |
(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers. |
donor_strict |
(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming. |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
event_filter |
(logical) Whether detection of offtarget reads, should be enabled. |
cut_buffer |
The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions. |
promiscuous_consensus |
(boolean) Whether rules of
|
normalize |
(character vector) If column 'Control' in config table has all FALSE/0 values then normalization is skipped. Otherwise, normalization is strict, which means events that are found in 'Control' TRUE group will be removed in 'Control' FALSE group. This parameter by default uses columns 'guideRNA' and 'Group' to impose additional restrictions on normalized events eg. only events created by the same 'guideRNA' in the same 'Group' will be normalized. |
min_freq |
(numeric) All events below this frequency are treated as
sequencing errors and rejected. This parameter is used during normalization
through |
continue |
(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed. |
(invisible) results_folder path
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipelineConservative()
,
amplicanReport()
,
amplicanSummarize()
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") # output folder results_folder <- tempdir() #full analysis, not knitting files automatically amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)
# path to example config file config <- system.file("extdata", "config.csv", package = "amplican") # path to example fastq files fastq_folder <- system.file("extdata", package = "amplican") # output folder results_folder <- tempdir() #full analysis, not knitting files automatically amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)
amplicanPipelineIndexHopping is identical as amplicanPipeline except that
default min_freq
threshold is set to 0.15. Setting this threshold
higher will decrease risks of inadequate normalization in cases of potential
Index Hopping, potentially decreasing precision of true editing rate calling.
Index Hopping can be mitigated with use of unique dual indexing pooling
combinations. However, in cases when you might expect Index Hopping to occur
you should use this function instead of amplicanPipeline.
amplicanPipelineConservative( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE )
amplicanPipelineConservative( config, fastq_folder, results_folder, knit_reports = TRUE, write_alignments_format = "None", average_quality = 30, min_quality = 0, filter_n = FALSE, batch_size = 1e+07, use_parallel = FALSE, scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = FALSE, type = "DNA"), gap_opening = 25, gap_extension = 0, fastqfiles = 0.5, primer_mismatch = 2, donor_mismatch = 3, donor_strict = FALSE, PRIMER_DIMER = 30, event_filter = TRUE, cut_buffer = 5, promiscuous_consensus = TRUE, normalize = c("guideRNA", "Group"), min_freq = min_freq_default, continue = TRUE )
config |
(string) The path to your configuration file. For example:
|
fastq_folder |
(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file. |
results_folder |
(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions. |
knit_reports |
(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted |
write_alignments_format |
(character vector) Whether
|
average_quality |
(numeric) The FASTQ file have a quality for each
nucleotide, depending on sequencing technology there exist many formats.
This package uses |
min_quality |
(numeric) Similar as in average_quality, but depicts
the minimum quality for ALL nucleotides in given read. If one of nucleotides
has quality BELLOW |
filter_n |
(boolean) Whether to filter out reads containing N base. |
batch_size |
(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files. |
use_parallel |
(boolean) Set to TRUE, if you have registered multicore back-end. |
scoring_matrix |
(matrix) Default is 'NUC44'. Pass desired matrix using
|
gap_opening |
(numeric) The opening gap score. |
gap_extension |
(numeric) The gap extension score. |
fastqfiles |
(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options:
|
primer_mismatch |
(numeric) Decide how many mismatches are allowed
during primer matching of the reads, that groups reads by experiments.
When |
donor_mismatch |
(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers. |
donor_strict |
(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming. |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
event_filter |
(logical) Whether detection of offtarget reads, should be enabled. |
cut_buffer |
The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions. |
promiscuous_consensus |
(boolean) Whether rules of
|
normalize |
(character vector) If column 'Control' in config table has all FALSE/0 values then normalization is skipped. Otherwise, normalization is strict, which means events that are found in 'Control' TRUE group will be removed in 'Control' FALSE group. This parameter by default uses columns 'guideRNA' and 'Group' to impose additional restrictions on normalized events eg. only events created by the same 'guideRNA' in the same 'Group' will be normalized. |
min_freq |
(numeric) All events below this frequency are treated as
sequencing errors and rejected. This parameter is used during normalization
through |
continue |
(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed. |
result_folder
and also knit prepared reports into 'reports' folder.
(invisible) results_folder path
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanReport()
,
amplicanSummarize()
amplicanReport takes a configuration file, fastq reads and output directory to prepare summaries as an editable .Rmd file. You can specify whether you want to make summaries based on ID, Barcode, Group or even guideRNA and Amplicon. This function automatically knits all reports after creation. If you want to postpone knitting and edit reports, use .Rmd templates to create your own version of reports instead of this function.
amplicanReport( results_folder, levels = c("id", "barcode", "group", "guide", "amplicon", "summary"), report_files = c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index"), cut_buffer = 5, xlab_spacing = 4, top = 5, knit_reports = TRUE )
amplicanReport( results_folder, levels = c("id", "barcode", "group", "guide", "amplicon", "summary"), report_files = c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index"), cut_buffer = 5, xlab_spacing = 4, top = 5, knit_reports = TRUE )
results_folder |
(string) Folder containing results from the
|
levels |
(vector) Possible values are: "id", "barcode", "group", "guide", "amplicon", "summary". You can also input more than one value eg. c("id", "barcode") will create two separate reports for each level. |
report_files |
(vector) You can supply your own names of the files. For each of the levels there has to be one file name. Files are created in current working directory by default. |
cut_buffer |
(numeric) Default 5. A number of bases that is used around the specified cut site. |
xlab_spacing |
(numeric) Default is 4. Spacing of the ticks on the x axis of plots. |
top |
(numeric) Default is 5. How many of the top most frequent unassigned reads to report? It is only relevant when you used forward and reverse reads. We align them to each other as we could not specify correct amplicon. |
knit_reports |
(boolean) Whether to knit reports automatically. |
(string) Path to the folder with results.
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanSummarize()
results_folder <- tempdir() amplicanReport(results_folder, report_files = file.path(results_folder, c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index")), knit_reports = FALSE)
results_folder <- tempdir() amplicanReport(results_folder, report_files = file.path(results_folder, c("id_report", "barcode_report", "group_report", "guide_report", "amplicon_report", "index")), knit_reports = FALSE)
Before using this function make sure events are filtered to represent
consensus with amplicanConsensus
, if you use both forward and
reverse reads. If you want to calculate metrics over expected cut site,
filter events using amplicanOverlap
.
amplicanSummarize(aln, cfgT)
amplicanSummarize(aln, cfgT)
aln |
(data.frame) Contains events from the alignments. |
cfgT |
(data.frame) Config file with the experiments details. |
Adds columns to cfgT:
ReadsCut Count of reads with deletions overlapping expected cut site.
Reads_Frameshifted Count of reads with frameshift overlapping expected cut site.
(data.frame) As cfgT, but with extra columns.
Other analysis steps:
amplicanAlign()
,
amplicanConsensus()
,
amplicanFilter()
,
amplicanMap()
,
amplicanNormalize()
,
amplicanOverlap()
,
amplicanPipeline()
,
amplicanPipelineConservative()
,
amplicanReport()
file_path <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanSummarize(aln, cfgT)
file_path <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) amplicanSummarize(aln, cfgT)
Transform extended CIGAR strings into GRanges
representation
with events of deletions, insertions and mismatches.
cigarsToEvents( cigars, aln_pos_start, query_seq, ref, read_id, mapq, seqnames, strands, counts )
cigarsToEvents( cigars, aln_pos_start, query_seq, ref, read_id, mapq, seqnames, strands, counts )
cigars |
(character) Extended CIGARS. |
aln_pos_start |
(integer) Pos of CIGARS. |
query_seq |
(character) Aligned query sequences. |
ref |
(character) Reference sequences used for alignment. |
read_id |
(numeric) Read id for assignment for each of the CIGARS. |
mapq |
(numeric) Maping scores. |
seqnames |
(character) Names of the sequences, potentially ids of the reference sequences. |
strands |
(character) Strands to assign. |
counts |
(integer) Vector of cigar counts, if data collapsed. |
(GRanges
) Same as events.
Generate all combinations along string seq
swapping m
characters at a time with letters defined in dictionary letters
.
Allows, for instance, to create a list of possible primers with two
mismatches.
comb_along(seq, m = 2, letters = c("A", "C", "T", "G"))
comb_along(seq, m = 2, letters = c("A", "C", "T", "G"))
seq |
(character) input character to permutate |
m |
(integer) number of elements to permutate at each step |
letters |
(character vector) dictionary source for combinations of elements |
(character vector) all unique combinations of permutated string
comb_along("AC") comb_along("AAA", 1) comb_along("AAA") comb_along("AAA", 3) comb_along("AAAAAAAAAA")
comb_along("AC") comb_along("AAA", 1) comb_along("AAA") comb_along("AAA", 3) comb_along("AAAAAAAAAA")
Very often alignments return deletions that are not real deletions, but
rather artifact of incomplete reads eg.:
ACTGAAAAA------- <- this "deletion" should be filtered ACTG----ACTGACTG
findEOP(aln, cfgT)
findEOP(aln, cfgT)
aln |
(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end. |
cfgT |
(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon. |
(logical vector) where TRUE indicates events that are overlapping primers
Other filters:
findLQR()
,
findPD()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findEOP(aln, cfgT)
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findEOP(aln, cfgT)
Will try to detect off-targets and low quality alignments (outliers). It
tries k-means clustering on normalized number of events per read and read
alignment score. If there are 3 clusters (decided based on silhouette
criterion) cluster with high event count and low alignment score will be
marked for filtering. When there is less than 1000
scores in aln
it will filter nothing.
findLQR(aln)
findLQR(aln)
aln |
(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end, score. |
(logical vector) where TRUE indicates events that are potential off-targets or low quality alignments.
Other filters:
findEOP()
,
findPD()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) aln <- aln[seqnames == "ID_1"] # for first experiment findLQR(aln)
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) aln <- aln[seqnames == "ID_1"] # for first experiment findLQR(aln)
Use to filter reads that are most likely PRIMER DIMERS.
findPD(aln, cfgT, PRIMER_DIMER = 30)
findPD(aln, cfgT, PRIMER_DIMER = 30)
aln |
(data.frame) Should contain events from alignments in
|
cfgT |
(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon. |
PRIMER_DIMER |
(numeric) Value specifying buffer for PRIMER DIMER
detection. For a given read it will be recognized as PRIMER DIMER when
alignment will introduce gap of size bigger than: |
(logical) Where TRUE indicates event classified as PRIMER DIMER
Other filters:
findEOP()
,
findLQR()
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findPD(aln, cfgT)
file_path <- system.file("extdata", "results", "alignments", "raw_events.csv", package = "amplican") aln <- data.table::fread(file_path) cfgT <- data.table::fread( system.file("extdata", "results", "config_summary.csv", package = "amplican")) findPD(aln, cfgT)
This set of functionality is copied from ggforce package due to dependency issues on Bioconductor and is used internally (not exported) only. This set of geoms makes it possible to connect points creating either quadratic or cubic beziers. bezier works by calculating points along the bezier and connecting these to draw the curve.
stat_bezier( mapping = NULL, data = NULL, geom = "path", position = "identity", na.rm = FALSE, show.legend = NA, n = 100, inherit.aes = TRUE, ... ) geom_bezier( mapping = NULL, data = NULL, stat = "bezier", position = "identity", arrow = NULL, lineend = "butt", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, n = 100, ... )
stat_bezier( mapping = NULL, data = NULL, geom = "path", position = "identity", na.rm = FALSE, show.legend = NA, n = 100, inherit.aes = TRUE, ... ) geom_bezier( mapping = NULL, data = NULL, stat = "bezier", position = "identity", arrow = NULL, lineend = "butt", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, n = 100, ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
The geometric object to use to display the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
n |
The number of points to create for each segment |
inherit.aes |
If |
... |
Other arguments passed on to
|
stat |
The statistical transformation to use on the data for this layer.
When using a
|
arrow |
Arrow specification, as created by |
lineend |
Line end style (round, butt, square). |
Input data is understood as a sequence of data points the first being the start point, then followed by one or two control points and then the end point. More than 4 and less than 3 points per group will throw an error.
geom_link, geom_link2 and geom_lin0 understand the following aesthetics (required aesthetics are in bold):
- **x** - **y** - color - linewidth - linetype - alpha - lineend
The interpolated point coordinates
The progression along the interpolation mapped between 0 and 1
Thomas Lin Pedersen
beziers <- data.frame( x = c(1, 2, 3, 4, 4, 6, 6), y = c(0, 2, 0, 0, 2, 2, 0), type = rep(c('cubic', 'quadratic'), c(3, 4)), point = c('end', 'control', 'end', 'end', 'control', 'control', 'end') ) help_lines <- data.frame( x = c(1, 3, 4, 6), xend = c(2, 2, 4, 6), y = 0, yend = 2 ) ggplot2::ggplot() + ggplot2::geom_segment( ggplot2::aes(x = x, xend = xend, y = y, yend = yend), data = help_lines, arrow = ggplot2::arrow(length = ggplot2::unit(c(0, 0, 0.5, 0.5), 'cm')), colour = 'grey') + amplican:::geom_bezier(ggplot2::aes(x= x, y = y, group = type, linetype = type), data = beziers) + ggplot2::geom_point(ggplot2::aes(x = x, y = y, colour = point), data = beziers)
beziers <- data.frame( x = c(1, 2, 3, 4, 4, 6, 6), y = c(0, 2, 0, 0, 2, 2, 0), type = rep(c('cubic', 'quadratic'), c(3, 4)), point = c('end', 'control', 'end', 'end', 'control', 'control', 'end') ) help_lines <- data.frame( x = c(1, 3, 4, 6), xend = c(2, 2, 4, 6), y = 0, yend = 2 ) ggplot2::ggplot() + ggplot2::geom_segment( ggplot2::aes(x = x, xend = xend, y = y, yend = yend), data = help_lines, arrow = ggplot2::arrow(length = ggplot2::unit(c(0, 0, 0.5, 0.5), 'cm')), colour = 'grey') + amplican:::geom_bezier(ggplot2::aes(x= x, y = y, group = type, linetype = type), data = beziers) + ggplot2::geom_point(ggplot2::aes(x = x, y = y, colour = point), data = beziers)
Transforms aligned strings into GRanges representation with events of deletions, insertions and mismatches. Subject should come from one amplicon sequence, after alignment to many sequences (patterns).
getEvents( pattern, subject, scores, ID = "NA", ampl_shift = 1L, ampl_start = 1L, strand_info = "+" )
getEvents( pattern, subject, scores, ID = "NA", ampl_shift = 1L, ampl_start = 1L, strand_info = "+" )
pattern |
(character) Aligned pattern. |
subject |
(character) Aligned subject. |
scores |
(integer) Alignment scores of the pattern and subject. |
ID |
(character) Will be used as seqnames of output GRanges. |
ampl_shift |
(numeric) Possible shift of the amplicons. |
ampl_start |
(numeric) Real amplicon starts.
|
strand_info |
(character) Strands to assign. |
(GRanges
) Same as events.
This is strict detection as compared to 'is_hdr' which was designed to be less specific and allow for all kinds of donors. This method requires that you have exactly the same events (mismatches, insertions, deletions) as the difference between amplicon and donor sequences. It ignores everything else, so other mismatches and small indels etc. as noise are allowed here for valid HDR.
is_hdr_strict(aln, cfgT, scoring_matrix, gap_opening = 25, gap_extension = 0)
is_hdr_strict(aln, cfgT, scoring_matrix, gap_opening = 25, gap_extension = 0)
aln |
(data.table) This are events that contain already consensus column, they are also shifted and normalized. |
cfgT |
(data.table) Config data.table with columns for amplicon and donor. |
scoring_matrix |
(scoring matrix) |
gap_opening |
(integer) |
gap_extension |
(integer) |
(aln) same as aln on entry, but readType is updated to TRUE when read is recognized as HDR
This function plots deletions in relation to the amplicons for given
selection
vector that groups values by given config group
.
All reads should
already be converted to their relative position to their respective amplicon
using amplicanMap
.
Top plot is for the forward reads and bottom plot is for reverse reads.
metaplot_deletions(alnmt, config, group, selection, over = "overlaps")
metaplot_deletions(alnmt, config, group, selection, over = "overlaps")
alnmt |
(data.frame) Loaded alignment information from events_filtered_shifted_normalized.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
group |
(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection. |
selection |
(string or vector of strings) Values from config column specified in group argument. |
over |
(string) Specify which column contains overlaps with
expected cut sites generated by |
(deletions metaplot) ggplot2 object of deletions metaplot
Other specialized plots:
metaplot_insertions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_deletions(alignments[alignments$consensus, ], config, "Group", "Betty")
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_deletions(alignments[alignments$consensus, ], config, "Group", "Betty")
This function plots insertions in relation to the amplicons for given
selection
vector that groups values by given config group
.
All reads should
already be converted to their relative position to their respective amplicon
using amplicanMap
.
Top plot is for the forward reads and bottom plot is for reverse reads.
metaplot_insertions(alnmt, config, group, selection)
metaplot_insertions(alnmt, config, group, selection)
alnmt |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
group |
(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection. |
selection |
(string or vector of strings) Values from config column specified in group argument. |
(insertions metaplot) ggplot2 object of insertions metaplot
Other specialized plots:
metaplot_deletions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_insertions(alignments[alignments$consensus, ], config, "Group", "Betty")
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_insertions(alignments[alignments$consensus, ], config, "Group", "Betty")
Plots mismatches in relation to the amplicons for given
selection
vector that groups values by given config group
.
All reads should
already be converted to their relative position to their respective amplicon
using amplicanMap
.
Zero position on new coordinates is the most left UPPER case letter of
the respective amplicon. This function filters out all alignment events
that have amplicons without UPPER case defined.
Top plot is for the forward reads and bottom plot is for reverse reads.
metaplot_mismatches(alnmt, config, group, selection)
metaplot_mismatches(alnmt, config, group, selection)
alnmt |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
group |
(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection. |
selection |
(string or vector of strings) Values from config column specified in group argument. |
(mismatches metaplot) ggplot2 object of mismatches metaplot
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
plot_cuts()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_mismatches(alignments, config, "Group", "Betty")
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) metaplot_mismatches(alignments, config, "Group", "Betty")
Parse EMBOSS needle (or needleall) "pair" format into GRanges representation with events of deletions, insertions and mismatches. Make sure that each file corresponds to single subject (single amplicon). Assumes that bottom sequence "-bsequence" corresponds to the "subject" and full sequence alignment is returned.
pairToEvents(file, ID = "NA", strand_info = "+")
pairToEvents(file, ID = "NA", strand_info = "+")
file |
(character) File path. |
ID |
(character) ID of the experiment, will be used as seqnames of the reutner ranges. |
strand_info |
(character) Strand to assign. |
(GRanges
) Same as events.
This function plots cuts in relation to the amplicon with distinction for each ID.
plot_cuts(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
plot_cuts(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
(cuts plot) gtable object of cuts plot
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
metaplot_mismatches()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_cuts(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3'))
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_cuts(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3'))
This function plots deletions in relation to the amplicon, assumes events are relative to the expected cut site. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.
plot_deletions( alignments, config, id, cut_buffer = 5, xlab_spacing = 4, over = "overlaps" )
plot_deletions( alignments, config, id, cut_buffer = 5, xlab_spacing = 4, over = "overlaps" )
alignments |
(data.frame) Loaded alignment information from alignments.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
over |
(string) Specify which columns contains overlaps with
expected cut sites generated by |
(deletions plot) gtable object of deletions plot
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_deletions(alignments[alignments$consensus, ], config, c('ID_1','ID_3'))
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_deletions(alignments[alignments$consensus, ], config, c('ID_1','ID_3'))
Helper function to calculate figure height based on number of elements to plot for automating sizes of figures in knited reports.
plot_height(x)
plot_height(x)
x |
(numeric) number of elements to fit onto height axis |
(numeric) In inches
plot_height(20)
plot_height(20)
This function creates stacked barplot explaining reads heterogeneity. It groups reads by user defined levels and measures how unique are reads in this level. Uniqueness of reads is simplified to the bins and colored according to the color gradient. Default color black indicates very high heterogeneity of the reads. The more yellow (default) the more similar are reads and less heterogeneous.
plot_heterogeneity( alignments, config, level = "ID", colors = c("#000000", "#F0E442"), bins = c(0, 5, seq(10, 100, 10)) )
plot_heterogeneity( alignments, config, level = "ID", colors = c("#000000", "#F0E442"), bins = c(0, 5, seq(10, 100, 10)) )
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
level |
(string) Name of the column from config file specifying levels to group by. |
colors |
(html colors vector) Two colours for gradient, eg. c('#000000', '#F0E442'). |
bins |
(numeric vector) Numeric vector from 0 to 100 specifying bins eg. c(0, 5, seq(10, 100, 10)). |
(heterogeneity plot) ggplot2 object of heterogeneity plot
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_deletions()
,
plot_insertions()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_heterogeneity(alignments[alignments$consensus, ], config)
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) plot_heterogeneity(alignments[alignments$consensus, ], config)
This function plots insertions in relation to the amplicon. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.
plot_insertions(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
plot_insertions(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
(insertions plot) gtable object of insertions plot
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_mismatches()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_insertions(alignments, config, c('ID_1','ID_3'))
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) p <- plot_insertions(alignments, config, c('ID_1','ID_3'))
Plots mismatches in relation to the amplicon, assumes your reads are relative to the respective amplicon sequences predicted cut sites. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.
plot_mismatches(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
plot_mismatches(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs, if it is possible to group them. They have to have the same amplicon, amplicons on the reverse strand will be reverse complemented to match forward strand amplicons. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
xlab_spacing |
(numeric) Spacing of the x axis labels. Default is 4. |
(mismatches plot) gtable object of mismatches plot
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_variants()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) id <- c('ID_1', 'ID_3'); cut_buffer = 5; xlab_spacing = 4; p <- plot_mismatches(alignments, config, c('ID_1', 'ID_3')) ggplot2::ggsave("~/test.png", p, width = 25, units = "in")
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) id <- c('ID_1', 'ID_3'); cut_buffer = 5; xlab_spacing = 4; p <- plot_mismatches(alignments, config, c('ID_1', 'ID_3')) ggplot2::ggsave("~/test.png", p, width = 25, units = "in")
This function plots variants in relation to the amplicon. Shows sequences of top mutants without aggregating on deletions, insertions and mismatches.
plot_variants( alignments, config, id, cut_buffer = 5, top = 10, annot = if (amplican:::get_seq(config, id, "Donor") == "") "cov" else NA, summary_plot = amplican:::get_seq(config, id, "Donor") == "", frameshift = amplican:::get_seq(config, id, "Donor") == "" )
plot_variants( alignments, config, id, cut_buffer = 5, top = 10, annot = if (amplican:::get_seq(config, id, "Donor") == "") "cov" else NA, summary_plot = amplican:::get_seq(config, id, "Donor") == "", frameshift = amplican:::get_seq(config, id, "Donor") == "" )
alignments |
(data.frame) Loaded alignment information from alignments_events.csv file. |
config |
(data.frame) Loaded table from config_summary.csv file. |
id |
(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. If Donor is available we will try to add the first donor and mark it on the plot. |
cut_buffer |
(numeric) Default is 5, you should specify the same as used in the analysis. |
top |
(numeric) Specify number of most frequent reads to plot. By
default it is 10. Check |
annot |
("codon" or "cov" or NA) What to display for annotation top plot. When NA will not display anything, also not display total summary. Codon plot is all reading frames for a given window, and the default "cov" is coverage of all indels and mismatches over a given window. |
summary_plot |
(boolean) Whether small summary plot in the upper right corner should be displayed. Top bar summarizes total reads with frameshift (F), reads with Edits without Frameshift (Edits) and reads without Edits (Match). |
frameshift |
(boolean) Whether to include Frameshift column in the table. annot on | off |
Top plot shows all six possible frames for given amplicon. Amino acids are
colored as follows:
Small nonpolar | G, A, S, T | Orange |
Hydrophobic | C, V, I, L, P, F, Y, M, W | Green |
Polar | N, Q, H | Magenta |
Negatively charged | D, E | Red |
Positively charged | K, R | Blue |
Other | eg. *, U, + | Grey |
Variant plot shows amplicon reference, UPPER letters which were the basis for window selection are highlighted with dashed white box (guideRNA). Black triangles are reflecting insertion points. Dashed letters indicate deletions. Table associated with variant plot represents:
Freq - Frequency of given read in experiment. Variants are ordered by frequency value.
Count - Represents raw count of this variant reads in experiment.
F - Sum of deletion and insertion widths of events overlapping presented window. Green background indicates frameshift.
(variant plot) gtable object of variants plot
This function is inspired by plotAlignments
.
Other specialized plots:
metaplot_deletions()
,
metaplot_insertions()
,
metaplot_mismatches()
,
plot_cuts()
,
plot_deletions()
,
plot_heterogeneity()
,
plot_insertions()
,
plot_mismatches()
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) alignments <- alignments[alignments$consensus & alignments$overlaps, ] p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3')) # with Donor we dont plot summary and the annot, summary plot and frameshift p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_5'))
#example config config <- read.csv(system.file("extdata", "results", "config_summary.csv", package = "amplican")) #example alignments results alignments_file <- system.file("extdata", "results", "alignments", "events_filtered_shifted_normalized.csv", package = "amplican") alignments <- read.csv(alignments_file) alignments <- alignments[alignments$consensus & alignments$overlaps, ] p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_1','ID_3')) # with Donor we dont plot summary and the annot, summary plot and frameshift p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ], config, c('ID_5'))