Package 'amplican' reference manual

Title:	Automated analysis of CRISPR experiments
Description:	`amplican` performs alignment of the amplicon reads, normalizes gathered data, calculates multiple statistics (e.g. cut rates, frameshifts) and presents results in form of aggregated reports. Data and statistics can be broken down by experiments, barcodes, user defined groups, guides and amplicons allowing for quick identification of potential problems.
Authors:	Kornel Labun [aut], Eivind Valen [cph, cre]
Maintainer:	Eivind Valen <[email protected]>
License:	GPL-3
Version:	1.29.0
Built:	2025-01-31 05:42:07 UTC
Source:	https://github.com/bioc/amplican

Automated analysis of CRISPR experiments.

Description

Main goals:

Flexible pipeline for analysis of the CRISPR Mi-Seq or Hi-Seq data.
Compatible with GRanges and data.table style.
Precise quantification of mutation rates.
Prepare automatic reports as .Rmd files that are flexible and open for manipulation.
Provide specialized plots for deletions, insertions, mismatches, variants, heterogeneity of the reads.

Details

To learn more about amplican, start with the vignettes: browseVignettes(package = "amplican")

Author(s)

Maintainer: Eivind Valen [email protected] [copyright holder]

Authors:

Kornel Labun [email protected]

Pretty print forward and reverse reads aligned to each other.

Description

Usefull and needed for barcode reports.

Usage

amplican_print_reads(forward, reverse)
amplican_print_reads(forward, reverse)

Arguments

`forward`	(character or vector of characters) Forward reads.
`reverse`	(character or vector of characters) Will be reverse complemented before alignment.

Value

Vector with alignments ready to be printed.

Examples

# load example data
unassigned_file <- system.file('extdata', 'results',  'alignments',
                               'unassigned_reads.csv', package = 'amplican')
unassigned <- data.table::setDF(data.table::fread(unassigned_file))
# sort by frequency
unassigned <- unassigned[order(unassigned$BarcodeFrequency,
                               decreasing = TRUE), ]
# print alignment of most frequent unassigned reads
cat(amplican_print_reads(unassigned[1, 'Forward'],
                         unassigned[1, 'Reverse']),
          sep = "\n")

# load example data
unassigned_file <- system.file('extdata', 'results',  'alignments',
                               'unassigned_reads.csv', package = 'amplican')
unassigned <- data.table::setDF(data.table::fread(unassigned_file))
# sort by frequency
unassigned <- unassigned[order(unassigned$BarcodeFrequency,
                               decreasing = TRUE), ]
# print alignment of most frequent unassigned reads
cat(amplican_print_reads(unassigned[1, 'Forward'],
                         unassigned[1, 'Reverse']),
          sep = "\n")

Align reads to amplicons.

Description

amplicanAlign takes a configuration files, fastq reads and output directory to prepare alignments and summary. It uses global Needleman-Wunsch algorithm with parameters optimized for CRISPR experiment. After alignments, object of AlignmentsExperimentSet is returned that allows for coercion into GRanges (plus is for forward and minus for reverse reads). It is also possible to output alignments in other, additional formats.

Usage

amplicanAlign(
  config,
  fastq_folder,
  use_parallel = FALSE,
  average_quality = 30,
  min_quality = 20,
  filter_n = FALSE,
  batch_size = 1e+06,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 0,
  donor_mismatch = 3,
  donor_strict = FALSE
)
amplicanAlign(
  config,
  fastq_folder,
  use_parallel = FALSE,
  average_quality = 30,
  min_quality = 20,
  filter_n = FALSE,
  batch_size = 1e+06,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 0,
  donor_mismatch = 3,
  donor_strict = FALSE
)

Arguments

`config`	(string) The path to your configuration file. For example: `system.file("extdata", "config.txt", package = "amplican")`. Configuration file can contain additional columns, but first 11 columns have to follow the example config specification.
`fastq_folder`	(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file.
`use_parallel`	(boolean) Set to TRUE, if you have registered multicore back-end.
`average_quality`	(numeric) The FASTQ file have a quality for each nucleotide, depending on sequencing technology there exist many formats. This package uses `readFastq` to parse the reads. If the average quality of the reads fall below value of `average_quality` then sequence is filtered. Default is 0.
`min_quality`	(numeric) Similar as in average_quality, but depicts the minimum quality for ALL nucleotides in given read. If one of nucleotides has quality BELLOW `min_quality`, then the sequence is filtered. Default is 20.
`filter_n`	(boolean) Whether to filter out reads containing N base.
`batch_size`	(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files.
`scoring_matrix`	(matrix) Default is 'NUC44'. Pass desired matrix using `nucleotideSubstitutionMatrix`.
`gap_opening`	(numeric) The opening gap score.
`gap_extension`	(numeric) The gap extension score.
`fastqfiles`	(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options: 0 Use both FASTQ files. 0.5 Use both FASTQ files, but only for one of the reads (forward or reverse) is required to have primer perfectly matched to sequence - eg. use when reverse reads are trimmed of primers, but forward reads have forward primer in the sequence. 1 Use only the forward FASTQ file. 2 Use only the reverse FASTQ file.
`primer_mismatch`	(numeric) Decide how many mismatches are allowed during primer matching of the reads, that groups reads by experiments. When `primer_mismatch = 0` no mismatches are allowed, which can increase number of unasssigned read.
`donor_mismatch`	(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers.
`donor_strict`	(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming.

Value

(AlignmentsExperimentSet) Check AlignmentsExperimentSet class for details. You can use lookupAlignment to examine alignments visually.

Examples

# path to example config file
config <- system.file("extdata", "config.csv", package = "amplican")
# path to example fastq files
fastq_folder <- system.file("extdata", package = "amplican")
aln <- amplicanAlign(config, fastq_folder)
aln

# path to example config file
config <- system.file("extdata", "config.csv", package = "amplican")
# path to example fastq files
fastq_folder <- system.file("extdata", package = "amplican")
aln <- amplicanAlign(config, fastq_folder)
aln

Extract consensus out of forward and reverse events.

Description

When forward and reverse reads are in agreement on the events (eg. deletion) amplicanConsensus will mark forward event as TRUE indicating that he represents consensus. In cases where forward and reverse read agree only partially, for example, they share the same start of the deletion, but they have different end amplicanConsensus will pick the version of read with higher alignment score, in situation where both of the reads overlap expected cut site, otherwise both events will be rejected and marked FALSE. When there are events only on one of the strands they will be rejected.

Usage

amplicanConsensus(aln, cfgT, overlaps = "overlaps", promiscuous = TRUE)
amplicanConsensus(aln, cfgT, overlaps = "overlaps", promiscuous = TRUE)

Arguments

`aln`	(data.frame) Contains relevant events in GRanges style.
`cfgT`	(data.frame) Should be table containing at least positions of primers in the amplicons and their identifiers
`overlaps`	(character) Specifies which metadata column of `aln` indicates which events are overlapping expected cut site.
`promiscuous`	(boolean) Allows to relax consensus rules. When TRUE will allow Indels that are not confirmed by the other strand (when both are used).

Details

In situation where you have only forward or only reverse reads don't use this function and assign all TRUE to all of your events.

Consensus out of the forward + reverse reads is required for amplicanSummary, and amplicanConsensus requires amplicanOverlap.

Value

(bolean vector) Where TRUE means that given event represents consensus out of forward and reverse reads.

Examples

file_path <- system.file("test_data", "test_aln.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("test_data", "test_cfg.csv", package = "amplican"))
all(aln$consensus == amplicanConsensus(aln, cfgT))

file_path <- system.file("test_data", "test_aln.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("test_data", "test_cfg.csv", package = "amplican"))
all(aln$consensus == amplicanConsensus(aln, cfgT))

Filter Events Overlapping Primers, PRIMER DIMERS and Low Alignment Score Events.

Description

Very often alignments return deletions that are not real deletions, but rather artifact of incomplete reads eg.:

ACTGAAAAA------- <- this "deletion" should be filtered
ACTG----ACTGACTG

We call them Events Overlapping Primers and filter them together with reads that are potentially PRIMER DIMERS. This filter will also remove all events coming from reads with low alignment score - potential Off-targets.

Usage

amplicanFilter(aln, cfgT, PRIMER_DIMER)
amplicanFilter(aln, cfgT, PRIMER_DIMER)

Arguments

`aln`	(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end.
`cfgT`	(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon.
`PRIMER_DIMER`	(numeric) Value specifying buffer for PRIMER DIMER detection. For a given read it will be recognized as PRIMER DIMER when alignment will introduce gap of size bigger than: `length of amplicon - (lengths of PRIMERS + PRIMER_DIMER value)`

Value

(aln) Reduced by events classified as PRIMER DIMER or overlapping primers.

Examples

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
amplicanFilter(aln, cfgT, 30)

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
amplicanFilter(aln, cfgT, 30)

Map events to their respective relative coordinates specified with UPPER case.

Description

Translate coordinates of GRanges events so that they can be relative to the amplicon. As point zero we assume first left sided UPPER case letter in the amplicon. Be weary that events for amplicons without expected cut sites are filtered. Don't use this function, if you don't have expected cut sites specified and don't use any of the metaplots.

Usage

amplicanMap(aln, cfgT)
amplicanMap(aln, cfgT)

Arguments

`aln`	(data.frame) List of events to map to the relative coordinates.
`cfgT`	(data.frame) config table

Value

(GRanges) Same as events, but the coordinates are relative to the expected cut sites.

Examples

# example config
config <- read.csv(system.file("extdata", "config.csv",
                   package = "amplican"))
# example events
events <- read.csv(system.file("extdata", "results", "alignments",
                   "raw_events.csv", package = "amplican"))
# make events relative to the UPPER case
amplicanMap(events, config)

# example config
config <- read.csv(system.file("extdata", "config.csv",
                   package = "amplican"))
# example events
events <- read.csv(system.file("extdata", "results", "alignments",
                   "raw_events.csv", package = "amplican"))
# make events relative to the UPPER case
amplicanMap(events, config)

Remove events that can be found in Controls.

Description

This function can adjust events for small differences between known annotations (amplicon sequences) and real DNA of the strain that was sequenced. Events from the control are grouped by add and their frequencies are calculated in respect to number of total reads in that groups. In next step events from the control are filtered according to min_freq, all events below are treated as sequencing errors and rejected. Finally, all events that can be found in treatment group that find their exact match (by non skipped columns) in control group are removed. All events from control group are returned back.

Usage

amplicanNormalize(
  aln,
  cfgT,
  add = c("guideRNA", "Group"),
  skip = c("counts", "score", "seqnames", "read_id", "strand", "overlaps", "consensus"),
  min_freq = 0.01
)
amplicanNormalize(
  aln,
  cfgT,
  add = c("guideRNA", "Group"),
  skip = c("counts", "score", "seqnames", "read_id", "strand", "overlaps", "consensus"),
  min_freq = 0.01
)

Arguments

`aln`	(data.frame) Contains events from alignments.
`cfgT`	(data.frame) Config table with information about experiments.
`add`	(character vector) Columns from cfgT that should be included in event table for normalization matching. Defaults to c("guideRNA", "Group") , which means that only those events created by the same guideRNA in the same Group will be removed if found in Control.
`skip`	(character vector) Specifies which columns of aln to skip.
`min_freq`	(numeric) All events from control group below this frequency will be not included in filtering. Use this to filter out background noise and sequencing errors.

Value

(data.frame) Same as aln, but events are normalized. Events from Control are not changed. Additionally columns from add are added to the data.frame.

Examples

aln <- data.frame(seqnames = 1:5, start = 1, end = 2, width = 2,
                  counts = 101:105)
cfgT <- data.frame(ID = 1:5, guideRNA = rep("ACTG", 5),
                   Reads_Filtered = c(2, 2, 3, 3, 4),
                   Group = c("A", "A", "B", "B", "B"),
                   Control = c(TRUE, FALSE, TRUE, FALSE, FALSE))
# all events are same as in the control group, therefore are filtered out
# events from control groups stay
amplicanNormalize(aln, cfgT)
# events that are different from control group are preserved
aln[2, "start"] <- 3
amplicanNormalize(aln, cfgT)

aln <- data.frame(seqnames = 1:5, start = 1, end = 2, width = 2,
                  counts = 101:105)
cfgT <- data.frame(ID = 1:5, guideRNA = rep("ACTG", 5),
                   Reads_Filtered = c(2, 2, 3, 3, 4),
                   Group = c("A", "A", "B", "B", "B"),
                   Control = c(TRUE, FALSE, TRUE, FALSE, FALSE))
# all events are same as in the control group, therefore are filtered out
# events from control groups stay
amplicanNormalize(aln, cfgT)
# events that are different from control group are preserved
aln[2, "start"] <- 3
amplicanNormalize(aln, cfgT)

Check which events overlap expected cut sites.

Description

To determine which deletions, insertions and mismatches (events) are probably created by CRISPR we check whether they overlap expected cut sites. Expected cut sites should be specified in UPPER CASE letters in the amplicon sequences.

Usage

amplicanOverlap(aln, cfgT, cut_buffer = 5, relative = FALSE)
amplicanOverlap(aln, cfgT, cut_buffer = 5, relative = FALSE)

Arguments

`aln`	(data.frame) Contains relevant events in GRanges style.
`cfgT`	(data.frame) Contains amplicon sequences.
`cut_buffer`	(numeric) Number of bases that should expand 5' and 3' of the specified expected cut sites.
`relative`	(boolean) Sets whether events are relative to the position of the target site.

Value

(bolean vector) Where TRUE means that given event overlaps cut site.

Examples

file_path <- system.file("test_data", "test_aln.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("test_data", "test_cfg.csv", package = "amplican"))
all(aln$overlaps == amplicanOverlap(aln, cfgT))

file_path <- system.file("test_data", "test_aln.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("test_data", "test_cfg.csv", package = "amplican"))
all(aln$overlaps == amplicanOverlap(aln, cfgT))

Wraps main package functionality into one function.

Description

amplicanPipeline is convenient wrapper around all functionality of the package with the most robust settings. It will generate all results in the result_folder and also knit prepared reports into 'reports' folder.

Usage

amplicanPipeline(
  config,
  fastq_folder,
  results_folder,
  knit_reports = TRUE,
  write_alignments_format = "None",
  average_quality = 30,
  min_quality = 0,
  filter_n = FALSE,
  batch_size = 1e+07,
  use_parallel = FALSE,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 2,
  donor_mismatch = 3,
  donor_strict = FALSE,
  PRIMER_DIMER = 30,
  event_filter = TRUE,
  cut_buffer = 5,
  promiscuous_consensus = TRUE,
  normalize = c("guideRNA", "Group"),
  min_freq = min_freq_default,
  continue = TRUE
)
amplicanPipeline(
  config,
  fastq_folder,
  results_folder,
  knit_reports = TRUE,
  write_alignments_format = "None",
  average_quality = 30,
  min_quality = 0,
  filter_n = FALSE,
  batch_size = 1e+07,
  use_parallel = FALSE,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 2,
  donor_mismatch = 3,
  donor_strict = FALSE,
  PRIMER_DIMER = 30,
  event_filter = TRUE,
  cut_buffer = 5,
  promiscuous_consensus = TRUE,
  normalize = c("guideRNA", "Group"),
  min_freq = min_freq_default,
  continue = TRUE
)

Arguments

`config`	(string) The path to your configuration file. For example: `system.file("extdata", "config.txt", package = "amplican")`. Configuration file can contain additional columns, but first 11 columns have to follow the example config specification.
`fastq_folder`	(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file.
`results_folder`	(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions.
`knit_reports`	(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted
`write_alignments_format`	(character vector) Whether `amplicanPipeline` should write alignments results to separate files. Alignments are also always saved as .rds object of `AlignmentsExperimentSet` class. Possible options are: "fasta" outputs alignments in fasta format where header indicates experiment ID, read id and number of reads "txt" simple format, read information followed by forward read and amplicon sequence followed by reverse read with its amplicon sequence eg.: ID: ID_1 Count: 7 ACTGAAAAA-------- ACTG-----ACTGACTG ------G-ACTG ACTGACTGACTG "None" Don't write any alignments to files. c("fasta", "txt") There are also possible combinations of above formats, pass a vector to get alignments in multiple formats.
`average_quality`	(numeric) The FASTQ file have a quality for each nucleotide, depending on sequencing technology there exist many formats. This package uses `readFastq` to parse the reads. If the average quality of the reads fall below value of `average_quality` then sequence is filtered. Default is 0.
`min_quality`	(numeric) Similar as in average_quality, but depicts the minimum quality for ALL nucleotides in given read. If one of nucleotides has quality BELLOW `min_quality`, then the sequence is filtered. Default is 20.
`filter_n`	(boolean) Whether to filter out reads containing N base.
`batch_size`	(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files.
`use_parallel`	(boolean) Set to TRUE, if you have registered multicore back-end.
`scoring_matrix`	(matrix) Default is 'NUC44'. Pass desired matrix using `nucleotideSubstitutionMatrix`.
`gap_opening`	(numeric) The opening gap score.
`gap_extension`	(numeric) The gap extension score.
`fastqfiles`	(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options: 0 Use both FASTQ files. 0.5 Use both FASTQ files, but only for one of the reads (forward or reverse) is required to have primer perfectly matched to sequence - eg. use when reverse reads are trimmed of primers, but forward reads have forward primer in the sequence. 1 Use only the forward FASTQ file. 2 Use only the reverse FASTQ file.
`primer_mismatch`	(numeric) Decide how many mismatches are allowed during primer matching of the reads, that groups reads by experiments. When `primer_mismatch = 0` no mismatches are allowed, which can increase number of unasssigned read.
`donor_mismatch`	(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers.
`donor_strict`	(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming.
`PRIMER_DIMER`	(numeric) Value specifying buffer for PRIMER DIMER detection. For a given read it will be recognized as PRIMER DIMER when alignment will introduce gap of size bigger than: `length of amplicon - (lengths of PRIMERS + PRIMER_DIMER value)`
`event_filter`	(logical) Whether detection of offtarget reads, should be enabled.
`cut_buffer`	The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions.
`promiscuous_consensus`	(boolean) Whether rules of `amplicanConsensus` should be `promiscuous`. When promiscuous, we allow indels that have no confirmation on the other strand.
`normalize`	(character vector) If column 'Control' in config table has all FALSE/0 values then normalization is skipped. Otherwise, normalization is strict, which means events that are found in 'Control' TRUE group will be removed in 'Control' FALSE group. This parameter by default uses columns 'guideRNA' and 'Group' to impose additional restrictions on normalized events eg. only events created by the same 'guideRNA' in the same 'Group' will be normalized.
`min_freq`	(numeric) All events below this frequency are treated as sequencing errors and rejected. This parameter is used during normalization through `amplicanNormalize`.
`continue`	(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed.

Value

(invisible) results_folder path

Examples

# path to example config file
config <- system.file("extdata", "config.csv", package = "amplican")
# path to example fastq files
fastq_folder <- system.file("extdata", package = "amplican")
# output folder
results_folder <- tempdir()

#full analysis, not knitting files automatically
amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)

# path to example config file
config <- system.file("extdata", "config.csv", package = "amplican")
# path to example fastq files
fastq_folder <- system.file("extdata", package = "amplican")
# output folder
results_folder <- tempdir()

#full analysis, not knitting files automatically
amplicanPipeline(config, fastq_folder, results_folder, knit_reports = FALSE)

Wraps main package functionality into one function.

Description

amplicanPipelineIndexHopping is identical as amplicanPipeline except that default min_freq threshold is set to 0.15. Setting this threshold higher will decrease risks of inadequate normalization in cases of potential Index Hopping, potentially decreasing precision of true editing rate calling. Index Hopping can be mitigated with use of unique dual indexing pooling combinations. However, in cases when you might expect Index Hopping to occur you should use this function instead of amplicanPipeline.

Usage

amplicanPipelineConservative(
  config,
  fastq_folder,
  results_folder,
  knit_reports = TRUE,
  write_alignments_format = "None",
  average_quality = 30,
  min_quality = 0,
  filter_n = FALSE,
  batch_size = 1e+07,
  use_parallel = FALSE,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 2,
  donor_mismatch = 3,
  donor_strict = FALSE,
  PRIMER_DIMER = 30,
  event_filter = TRUE,
  cut_buffer = 5,
  promiscuous_consensus = TRUE,
  normalize = c("guideRNA", "Group"),
  min_freq = min_freq_default,
  continue = TRUE
)
amplicanPipelineConservative(
  config,
  fastq_folder,
  results_folder,
  knit_reports = TRUE,
  write_alignments_format = "None",
  average_quality = 30,
  min_quality = 0,
  filter_n = FALSE,
  batch_size = 1e+07,
  use_parallel = FALSE,
  scoring_matrix = Biostrings::nucleotideSubstitutionMatrix(match = 5, mismatch = -4,
    baseOnly = FALSE, type = "DNA"),
  gap_opening = 25,
  gap_extension = 0,
  fastqfiles = 0.5,
  primer_mismatch = 2,
  donor_mismatch = 3,
  donor_strict = FALSE,
  PRIMER_DIMER = 30,
  event_filter = TRUE,
  cut_buffer = 5,
  promiscuous_consensus = TRUE,
  normalize = c("guideRNA", "Group"),
  min_freq = min_freq_default,
  continue = TRUE
)

Arguments

`config`	(string) The path to your configuration file. For example: `system.file("extdata", "config.txt", package = "amplican")`. Configuration file can contain additional columns, but first 11 columns have to follow the example config specification.
`fastq_folder`	(string) Path to FASTQ files. If not specified, FASTQ files should be in the same directory as config file.
`results_folder`	(string) Where do you want to store results? The package will create files in that folder so make sure you have writing permissions.
`knit_reports`	(boolean) whether function should "knit" all reports automatically for you (it is time consuming, be patient), when false reports will be prepared, but not knitted
`write_alignments_format`	(character vector) Whether `amplicanPipeline` should write alignments results to separate files. Alignments are also always saved as .rds object of `AlignmentsExperimentSet` class. Possible options are: "fasta" outputs alignments in fasta format where header indicates experiment ID, read id and number of reads "txt" simple format, read information followed by forward read and amplicon sequence followed by reverse read with its amplicon sequence eg.: ID: ID_1 Count: 7 ACTGAAAAA-------- ACTG-----ACTGACTG ------G-ACTG ACTGACTGACTG "None" Don't write any alignments to files. c("fasta", "txt") There are also possible combinations of above formats, pass a vector to get alignments in multiple formats.
`average_quality`	(numeric) The FASTQ file have a quality for each nucleotide, depending on sequencing technology there exist many formats. This package uses `readFastq` to parse the reads. If the average quality of the reads fall below value of `average_quality` then sequence is filtered. Default is 0.
`min_quality`	(numeric) Similar as in average_quality, but depicts the minimum quality for ALL nucleotides in given read. If one of nucleotides has quality BELLOW `min_quality`, then the sequence is filtered. Default is 20.
`filter_n`	(boolean) Whether to filter out reads containing N base.
`batch_size`	(numeric) How many reads to analyze at a time? Needed for filtering of large fastq files.
`use_parallel`	(boolean) Set to TRUE, if you have registered multicore back-end.
`scoring_matrix`	(matrix) Default is 'NUC44'. Pass desired matrix using `nucleotideSubstitutionMatrix`.
`gap_opening`	(numeric) The opening gap score.
`gap_extension`	(numeric) The gap extension score.
`fastqfiles`	(numeric) Normally you want to use both FASTQ files. But in some special cases, you may want to use only the forward file, or only the reverse file. Possible options: 0 Use both FASTQ files. 0.5 Use both FASTQ files, but only for one of the reads (forward or reverse) is required to have primer perfectly matched to sequence - eg. use when reverse reads are trimmed of primers, but forward reads have forward primer in the sequence. 1 Use only the forward FASTQ file. 2 Use only the reverse FASTQ file.
`primer_mismatch`	(numeric) Decide how many mismatches are allowed during primer matching of the reads, that groups reads by experiments. When `primer_mismatch = 0` no mismatches are allowed, which can increase number of unasssigned read.
`donor_mismatch`	(numeric) How many events of length 1 (mismatches, deletions and insertions of length 1) are allowed when aligning toward the donor template. This parameter is only used when donor template is specified. The higher the parameter the less strict will be algorithm accepting read as HDR. Set to 0 if only perfect alignments to the donor template marked as HDR, unadvised due to error rate of the sequencers.
`donor_strict`	(logical) Applies more strict algorithm for HDR detection. Only these reads that have all of the donor events will count as HDR. Tolerates 'donor_mismatch' level of noise, but no indels are allowed. Use this when your reads should span over the whole window of the donor events. Might be more time consuming.
`PRIMER_DIMER`	(numeric) Value specifying buffer for PRIMER DIMER detection. For a given read it will be recognized as PRIMER DIMER when alignment will introduce gap of size bigger than: `length of amplicon - (lengths of PRIMERS + PRIMER_DIMER value)`
`event_filter`	(logical) Whether detection of offtarget reads, should be enabled.
`cut_buffer`	The number of bases by which extend expected cut sites (specified as UPPER case letters in the amplicon) in 5' and 3' directions.
`promiscuous_consensus`	(boolean) Whether rules of `amplicanConsensus` should be `promiscuous`. When promiscuous, we allow indels that have no confirmation on the other strand.
`normalize`	(character vector) If column 'Control' in config table has all FALSE/0 values then normalization is skipped. Otherwise, normalization is strict, which means events that are found in 'Control' TRUE group will be removed in 'Control' FALSE group. This parameter by default uses columns 'guideRNA' and 'Group' to impose additional restrictions on normalized events eg. only events created by the same 'guideRNA' in the same 'Group' will be normalized.
`min_freq`	(numeric) All events below this frequency are treated as sequencing errors and rejected. This parameter is used during normalization through `amplicanNormalize`.
`continue`	(boolean) Default TRUE, decides whether to continue failed ampliCan runs. In case of FALSE, all contents in 'results' folder will be removed.

Details

result_folder and also knit prepared reports into 'reports' folder.

Value

(invisible) results_folder path

Prepare reports as .Rmd files.

Description

amplicanReport takes a configuration file, fastq reads and output directory to prepare summaries as an editable .Rmd file. You can specify whether you want to make summaries based on ID, Barcode, Group or even guideRNA and Amplicon. This function automatically knits all reports after creation. If you want to postpone knitting and edit reports, use .Rmd templates to create your own version of reports instead of this function.

Usage

amplicanReport(
  results_folder,
  levels = c("id", "barcode", "group", "guide", "amplicon", "summary"),
  report_files = c("id_report", "barcode_report", "group_report", "guide_report",
    "amplicon_report", "index"),
  cut_buffer = 5,
  xlab_spacing = 4,
  top = 5,
  knit_reports = TRUE
)
amplicanReport(
  results_folder,
  levels = c("id", "barcode", "group", "guide", "amplicon", "summary"),
  report_files = c("id_report", "barcode_report", "group_report", "guide_report",
    "amplicon_report", "index"),
  cut_buffer = 5,
  xlab_spacing = 4,
  top = 5,
  knit_reports = TRUE
)

Arguments

`results_folder`	(string) Folder containing results from the `amplicanAlign` function, do not change names of the files.
`levels`	(vector) Possible values are: "id", "barcode", "group", "guide", "amplicon", "summary". You can also input more than one value eg. c("id", "barcode") will create two separate reports for each level.
`report_files`	(vector) You can supply your own names of the files. For each of the levels there has to be one file name. Files are created in current working directory by default.
`cut_buffer`	(numeric) Default 5. A number of bases that is used around the specified cut site.
`xlab_spacing`	(numeric) Default is 4. Spacing of the ticks on the x axis of plots.
`top`	(numeric) Default is 5. How many of the top most frequent unassigned reads to report? It is only relevant when you used forward and reverse reads. We align them to each other as we could not specify correct amplicon.
`knit_reports`	(boolean) Whether to knit reports automatically.

Value

(string) Path to the folder with results.

Examples

results_folder <- tempdir()
amplicanReport(results_folder, report_files = file.path(results_folder,
                                                        c("id_report",
                                                          "barcode_report",
                                                          "group_report",
                                                          "guide_report",
                                                          "amplicon_report",
                                                          "index")),
               knit_reports = FALSE)

results_folder <- tempdir()
amplicanReport(results_folder, report_files = file.path(results_folder,
                                                        c("id_report",
                                                          "barcode_report",
                                                          "group_report",
                                                          "guide_report",
                                                          "amplicon_report",
                                                          "index")),
               knit_reports = FALSE)

Summarize how many reads have frameshift and how many reads have deletions.

Description

Before using this function make sure events are filtered to represent consensus with amplicanConsensus, if you use both forward and reverse reads. If you want to calculate metrics over expected cut site, filter events using amplicanOverlap.

Usage

amplicanSummarize(aln, cfgT)
amplicanSummarize(aln, cfgT)

Arguments

`aln`	(data.frame) Contains events from the alignments.
`cfgT`	(data.frame) Config file with the experiments details.

Details

Adds columns to cfgT:

ReadsCut Count of reads with deletions overlapping expected cut site.
Reads_Frameshifted Count of reads with frameshift overlapping expected cut site.

Value

(data.frame) As cfgT, but with extra columns.

Examples

file_path <- system.file("extdata", "results", "alignments",
                         "events_filtered_shifted_normalized.csv",
                         package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
amplicanSummarize(aln, cfgT)

file_path <- system.file("extdata", "results", "alignments",
                         "events_filtered_shifted_normalized.csv",
                         package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
amplicanSummarize(aln, cfgT)

Transform extended CIGAR strings into GRanges.

Description

Transform extended CIGAR strings into GRanges representation with events of deletions, insertions and mismatches.

Usage

cigarsToEvents(
  cigars,
  aln_pos_start,
  query_seq,
  ref,
  read_id,
  mapq,
  seqnames,
  strands,
  counts
)
cigarsToEvents(
  cigars,
  aln_pos_start,
  query_seq,
  ref,
  read_id,
  mapq,
  seqnames,
  strands,
  counts
)

Arguments

`cigars`	(character) Extended CIGARS.
`aln_pos_start`	(integer) Pos of CIGARS.
`query_seq`	(character) Aligned query sequences.
`ref`	(character) Reference sequences used for alignment.
`read_id`	(numeric) Read id for assignment for each of the CIGARS.
`mapq`	(numeric) Maping scores.
`seqnames`	(character) Names of the sequences, potentially ids of the reference sequences.
`strands`	(character) Strands to assign.
`counts`	(integer) Vector of cigar counts, if data collapsed.

Value

(GRanges) Same as events.

Generate all combinations along string exchanging m characters at a time with dictionary letters.

Description

Generate all combinations along string seq swapping m characters at a time with letters defined in dictionary letters. Allows, for instance, to create a list of possible primers with two mismatches.

Usage

comb_along(seq, m = 2, letters = c("A", "C", "T", "G"))
comb_along(seq, m = 2, letters = c("A", "C", "T", "G"))

Arguments

`seq`	(character) input character to permutate
`m`	(integer) number of elements to permutate at each step
`letters`	(character vector) dictionary source for combinations of elements

Value

(character vector) all unique combinations of permutated string

Examples

comb_along("AC")
comb_along("AAA", 1)
comb_along("AAA")
comb_along("AAA", 3)
comb_along("AAAAAAAAAA")

comb_along("AC")
comb_along("AAA", 1)
comb_along("AAA")
comb_along("AAA", 3)
comb_along("AAAAAAAAAA")

Find Events Overlapping Primers.

Description

Very often alignments return deletions that are not real deletions, but rather artifact of incomplete reads eg.:

ACTGAAAAA------- <- this "deletion" should be filtered
ACTG----ACTGACTG

Usage

findEOP(aln, cfgT)
findEOP(aln, cfgT)

Arguments

`aln`	(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end.
`cfgT`	(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon.

Value

(logical vector) where TRUE indicates events that are overlapping primers

Examples

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
findEOP(aln, cfgT)

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
findEOP(aln, cfgT)

Find Off-targets and Fragmented alignments from reads.

Description

Will try to detect off-targets and low quality alignments (outliers). It tries k-means clustering on normalized number of events per read and read alignment score. If there are 3 clusters (decided based on silhouette criterion) cluster with high event count and low alignment score will be marked for filtering. When there is less than 1000 scores in aln it will filter nothing.

Usage

findLQR(aln)
findLQR(aln)

Arguments

aln

(data.frame) Should contain events from alignments in GRanges style with columns eg. seqnames, width, start, end, score.

Value

(logical vector) where TRUE indicates events that are potential off-targets or low quality alignments.

Examples

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
aln <- aln[seqnames == "ID_1"] # for first experiment
findLQR(aln)

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
aln <- aln[seqnames == "ID_1"] # for first experiment
findLQR(aln)

Find PRIMER DIMER reads.

Description

Use to filter reads that are most likely PRIMER DIMERS.

Usage

findPD(aln, cfgT, PRIMER_DIMER = 30)
findPD(aln, cfgT, PRIMER_DIMER = 30)

Arguments

`aln`	(data.frame) Should contain events from alignments in `GRanges` style with columns eg. seqnames, width, start, end.
`cfgT`	(data.frame) Needs columns Forward_Primer, ReversePrimer and Amplicon.
`PRIMER_DIMER`	(numeric) Value specifying buffer for PRIMER DIMER detection. For a given read it will be recognized as PRIMER DIMER when alignment will introduce gap of size bigger than: `length of amplicon - (lengths of PRIMERS + PRIMER_DIMER value)`

Value

(logical) Where TRUE indicates event classified as PRIMER DIMER

Examples

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
findPD(aln, cfgT)

file_path <- system.file("extdata", "results", "alignments",
                         "raw_events.csv", package = "amplican")
aln <- data.table::fread(file_path)
cfgT <- data.table::fread(
  system.file("extdata", "results", "config_summary.csv",
              package = "amplican"))
findPD(aln, cfgT)

Create quadratic or cubic bezier curves [copied from ggforce]

Description

This set of functionality is copied from ggforce package due to dependency issues on Bioconductor and is used internally (not exported) only. This set of geoms makes it possible to connect points creating either quadratic or cubic beziers. bezier works by calculating points along the bezier and connecting these to draw the curve.

Usage

stat_bezier(
  mapping = NULL,
  data = NULL,
  geom = "path",
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  n = 100,
  inherit.aes = TRUE,
  ...
)

geom_bezier(
  mapping = NULL,
  data = NULL,
  stat = "bezier",
  position = "identity",
  arrow = NULL,
  lineend = "butt",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  n = 100,
  ...
)
stat_bezier(
  mapping = NULL,
  data = NULL,
  geom = "path",
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  n = 100,
  inherit.aes = TRUE,
  ...
)

geom_bezier(
  mapping = NULL,
  data = NULL,
  stat = "bezier",
  position = "identity",
  arrow = NULL,
  lineend = "butt",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  n = 100,
  ...
)

Arguments

`mapping`	Set of aesthetic mappings created by `aes()`. If specified and `inherit.aes = TRUE` (the default), it is combined with the default mapping at the top level of the plot. You must supply `mapping` if there is no plot mapping.
`data`	The data to be displayed in this layer. There are three options: If `NULL`, the default, the data is inherited from the plot data as specified in the call to `ggplot()`. A `data.frame`, or other object, will override the plot data. All objects will be fortified to produce a data frame. See `fortify()` for which variables will be created. A `function` will be called with a single argument, the plot data. The return value must be a `data.frame`, and will be used as the layer data. A `function` can be created from a `formula` (e.g. `~ head(.x, 10)`).
`geom`	The geometric object to use to display the data for this layer. When using a `⁠stat_*()⁠` function to construct a layer, the `geom` argument can be used to override the default coupling between stats and geoms. The `geom` argument accepts the following: A `Geom` ggproto subclass, for example `GeomPoint`. A string naming the geom. To give the geom as a string, strip the function name of the `geom_` prefix. For example, to use `geom_point()`, give the geom as `"point"`. For more information and other ways to specify the geom, see the layer geom documentation.
`position`	A position adjustment to use on the data for this layer. This can be used in various ways, including to prevent overplotting and improving the display. The `position` argument accepts the following: The result of calling a position function, such as `position_jitter()`. This method allows for passing extra arguments to the position. A string naming the position adjustment. To give the position as a string, strip the function name of the `position_` prefix. For example, to use `position_jitter()`, give the position as `"jitter"`. For more information and other ways to specify the position, see the layer position documentation.
`na.rm`	If `FALSE`, the default, missing values are removed with a warning. If `TRUE`, missing values are silently removed.
`show.legend`	logical. Should this layer be included in the legends? `NA`, the default, includes if any aesthetics are mapped. `FALSE` never includes, and `TRUE` always includes. It can also be a named logical vector to finely select the aesthetics to display.
`n`	The number of points to create for each segment
`inherit.aes`	If `FALSE`, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. `borders()`.
`...`	Other arguments passed on to `layer()`'s `params` argument. These arguments broadly fall into one of 4 categories below. Notably, further arguments to the `position` argument, or aesthetics that are required can not be passed through `...`. Unknown arguments that are not part of the 4 categories below are ignored. Static aesthetics that are not mapped to a scale, but are at a fixed value and apply to the layer as a whole. For example, `colour = "red"` or `linewidth = 3`. The geom's documentation has an Aesthetics section that lists the available options. The 'required' aesthetics cannot be passed on to the `params`. Please note that while passing unmapped aesthetics as vectors is technically possible, the order and required length is not guaranteed to be parallel to the input data. When constructing a layer using a `⁠stat_()⁠` function, the `...` argument can be used to pass on parameters to the `geom` part of the layer. An example of this is `stat_density(geom = "area", outline.type = "both")`. The geom's documentation lists which parameters it can accept. Inversely, when constructing a layer using a `⁠geom_()⁠` function, the `...` argument can be used to pass on parameters to the `stat` part of the layer. An example of this is `geom_area(stat = "density", adjust = 0.5)`. The stat's documentation lists which parameters it can accept. The `key_glyph` argument of `layer()` may also be passed on through `...`. This can be one of the functions described as key glyphs, to change the display of the layer in the legend.
`stat`	The statistical transformation to use on the data for this layer. When using a `⁠geom_*()⁠` function to construct a layer, the `stat` argument can be used the override the default coupling between geoms and stats. The `stat` argument accepts the following: A `Stat` ggproto subclass, for example `StatCount`. A string naming the stat. To give the stat as a string, strip the function name of the `stat_` prefix. For example, to use `stat_count()`, give the stat as `"count"`. For more information and other ways to specify the stat, see the layer stat documentation.
`arrow`	Arrow specification, as created by `grid::arrow()`.
`lineend`	Line end style (round, butt, square).

Details

Input data is understood as a sequence of data points the first being the start point, then followed by one or two control points and then the end point. More than 4 and less than 3 points per group will throw an error.

Aesthetics

geom_link, geom_link2 and geom_lin0 understand the following aesthetics (required aesthetics are in bold):

- **x** - **y** - color - linewidth - linetype - alpha - lineend

Computed variables

x, y: The interpolated point coordinates
index: The progression along the interpolation mapped between 0 and 1

Author(s)

Thomas Lin Pedersen

Examples

beziers <- data.frame(
    x = c(1, 2, 3, 4, 4, 6, 6),
    y = c(0, 2, 0, 0, 2, 2, 0),
    type = rep(c('cubic', 'quadratic'), c(3, 4)),
    point = c('end', 'control', 'end', 'end', 'control', 'control', 'end')
)
help_lines <- data.frame(
    x = c(1, 3, 4, 6),
    xend = c(2, 2, 4, 6),
    y = 0,
    yend = 2
)
ggplot2::ggplot() + ggplot2::geom_segment(
  ggplot2::aes(x = x, xend = xend, y = y, yend = yend),
  data = help_lines,
  arrow = ggplot2::arrow(length = ggplot2::unit(c(0, 0, 0.5, 0.5), 'cm')),
  colour = 'grey') +
  amplican:::geom_bezier(ggplot2::aes(x= x, y = y, group = type, linetype = type),
              data = beziers) +
  ggplot2::geom_point(ggplot2::aes(x = x, y = y, colour = point), data = beziers)

beziers <- data.frame(
    x = c(1, 2, 3, 4, 4, 6, 6),
    y = c(0, 2, 0, 0, 2, 2, 0),
    type = rep(c('cubic', 'quadratic'), c(3, 4)),
    point = c('end', 'control', 'end', 'end', 'control', 'control', 'end')
)
help_lines <- data.frame(
    x = c(1, 3, 4, 6),
    xend = c(2, 2, 4, 6),
    y = 0,
    yend = 2
)
ggplot2::ggplot() + ggplot2::geom_segment(
  ggplot2::aes(x = x, xend = xend, y = y, yend = yend),
  data = help_lines,
  arrow = ggplot2::arrow(length = ggplot2::unit(c(0, 0, 0.5, 0.5), 'cm')),
  colour = 'grey') +
  amplican:::geom_bezier(ggplot2::aes(x= x, y = y, group = type, linetype = type),
              data = beziers) +
  ggplot2::geom_point(ggplot2::aes(x = x, y = y, colour = point), data = beziers)

Transform aligned strings into GRanges representation of events.

Description

Transforms aligned strings into GRanges representation with events of deletions, insertions and mismatches. Subject should come from one amplicon sequence, after alignment to many sequences (patterns).

Usage

getEvents(
  pattern,
  subject,
  scores,
  ID = "NA",
  ampl_shift = 1L,
  ampl_start = 1L,
  strand_info = "+"
)
getEvents(
  pattern,
  subject,
  scores,
  ID = "NA",
  ampl_shift = 1L,
  ampl_start = 1L,
  strand_info = "+"
)

Arguments

`pattern`	(character) Aligned pattern.
`subject`	(character) Aligned subject.
`scores`	(integer) Alignment scores of the pattern and subject.
`ID`	(character) Will be used as seqnames of output GRanges.
`ampl_shift`	(numeric) Possible shift of the amplicons.
`ampl_start`	(numeric) Real amplicon starts. `pairwiseAlignment` clips alignments, therefore to output GRanges relative to the amplicon sequence (subject) ranges have to be shifted.
`strand_info`	(character) Strands to assign.

Value

(GRanges) Same as events.

Figure out which reads conform to the HDR using the donor.

Description

This is strict detection as compared to 'is_hdr' which was designed to be less specific and allow for all kinds of donors. This method requires that you have exactly the same events (mismatches, insertions, deletions) as the difference between amplicon and donor sequences. It ignores everything else, so other mismatches and small indels etc. as noise are allowed here for valid HDR.

Usage

is_hdr_strict(aln, cfgT, scoring_matrix, gap_opening = 25, gap_extension = 0)
is_hdr_strict(aln, cfgT, scoring_matrix, gap_opening = 25, gap_extension = 0)

Arguments

`aln`	(data.table) This are events that contain already consensus column, they are also shifted and normalized.
`cfgT`	(data.table) Config data.table with columns for amplicon and donor.
`scoring_matrix`	(scoring matrix)
`gap_opening`	(integer)
`gap_extension`	(integer)

Value

(aln) same as aln on entry, but readType is updated to TRUE when read is recognized as HDR

MetaPlots deletions using ggplot2.

Description

This function plots deletions in relation to the amplicons for given selection vector that groups values by given config group. All reads should already be converted to their relative position to their respective amplicon using amplicanMap. Top plot is for the forward reads and bottom plot is for reverse reads.

Usage

metaplot_deletions(alnmt, config, group, selection, over = "overlaps")
metaplot_deletions(alnmt, config, group, selection, over = "overlaps")

Arguments

`alnmt`	(data.frame) Loaded alignment information from events_filtered_shifted_normalized.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`group`	(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection.
`selection`	(string or vector of strings) Values from config column specified in group argument.
`over`	(string) Specify which column contains overlaps with expected cut sites generated by `amplicanOverlap`

Value

(deletions metaplot) ggplot2 object of deletions metaplot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
metaplot_deletions(alignments[alignments$consensus, ],
                   config, "Group", "Betty")

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
metaplot_deletions(alignments[alignments$consensus, ],
                   config, "Group", "Betty")

MetaPlots insertions using ggplot2.

Description

This function plots insertions in relation to the amplicons for given selection vector that groups values by given config group. All reads should already be converted to their relative position to their respective amplicon using amplicanMap. Top plot is for the forward reads and bottom plot is for reverse reads.

Usage

metaplot_insertions(alnmt, config, group, selection)
metaplot_insertions(alnmt, config, group, selection)

Arguments

`alnmt`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`group`	(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection.
`selection`	(string or vector of strings) Values from config column specified in group argument.

Value

(insertions metaplot) ggplot2 object of insertions metaplot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
metaplot_insertions(alignments[alignments$consensus, ], config,
                    "Group", "Betty")

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
metaplot_insertions(alignments[alignments$consensus, ], config,
                    "Group", "Betty")

MetaPlots mismatches using ggplot2.

Description

Plots mismatches in relation to the amplicons for given selection vector that groups values by given config group. All reads should already be converted to their relative position to their respective amplicon using amplicanMap. Zero position on new coordinates is the most left UPPER case letter of the respective amplicon. This function filters out all alignment events that have amplicons without UPPER case defined. Top plot is for the forward reads and bottom plot is for reverse reads.

Usage

metaplot_mismatches(alnmt, config, group, selection)
metaplot_mismatches(alnmt, config, group, selection)

Arguments

`alnmt`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`group`	(string) Name of the column from the config file to use for grouping. Events are subselected based on this column and values from selection.
`selection`	(string or vector of strings) Values from config column specified in group argument.

Value

(mismatches metaplot) ggplot2 object of mismatches metaplot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
metaplot_mismatches(alignments,
                    config, "Group", "Betty")

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
metaplot_mismatches(alignments,
                    config, "Group", "Betty")

Read "pair" format of EMBOSS needle into GRanges as events.

Description

Parse EMBOSS needle (or needleall) "pair" format into GRanges representation with events of deletions, insertions and mismatches. Make sure that each file corresponds to single subject (single amplicon). Assumes that bottom sequence "-bsequence" corresponds to the "subject" and full sequence alignment is returned.

Usage

pairToEvents(file, ID = "NA", strand_info = "+")
pairToEvents(file, ID = "NA", strand_info = "+")

Arguments

`file`	(character) File path.
`ID`	(character) ID of the experiment, will be used as seqnames of the reutner ranges.
`strand_info`	(character) Strand to assign.

Value

(GRanges) Same as events.

Plots cuts using ggplot2.

Description

This function plots cuts in relation to the amplicon with distinction for each ID.

Usage

plot_cuts(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
plot_cuts(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)

Arguments

`alignments`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`id`	(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot.
`cut_buffer`	(numeric) Default is 5, you should specify the same as used in the analysis.
`xlab_spacing`	(numeric) Spacing of the x axis labels. Default is 4.

Value

(cuts plot) gtable object of cuts plot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
plot_cuts(alignments[alignments$consensus & alignments$overlaps, ],
          config, c('ID_1','ID_3'))

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
plot_cuts(alignments[alignments$consensus & alignments$overlaps, ],
          config, c('ID_1','ID_3'))

Plots deletions using ggplot2.

Description

This function plots deletions in relation to the amplicon, assumes events are relative to the expected cut site. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.

Usage

plot_deletions(
  alignments,
  config,
  id,
  cut_buffer = 5,
  xlab_spacing = 4,
  over = "overlaps"
)
plot_deletions(
  alignments,
  config,
  id,
  cut_buffer = 5,
  xlab_spacing = 4,
  over = "overlaps"
)

Arguments

`alignments`	(data.frame) Loaded alignment information from alignments.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`id`	(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot.
`cut_buffer`	(numeric) Default is 5, you should specify the same as used in the analysis.
`xlab_spacing`	(numeric) Spacing of the x axis labels. Default is 4.
`over`	(string) Specify which columns contains overlaps with expected cut sites generated by `amplicanOverlap`

Value

(deletions plot) gtable object of deletions plot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
p <- plot_deletions(alignments[alignments$consensus, ],
                    config, c('ID_1','ID_3'))

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
p <- plot_deletions(alignments[alignments$consensus, ],
                    config, c('ID_1','ID_3'))

Get figure height in inches for number of elements on y axis.

Description

Helper function to calculate figure height based on number of elements to plot for automating sizes of figures in knited reports.

Usage

plot_height(x)
plot_height(x)

Arguments

`x`	(numeric) number of elements to fit onto height axis

Value

(numeric) In inches

Examples

plot_height(20)

plot_height(20)

Plots heterogeneity of the reads using ggplot2.

Description

This function creates stacked barplot explaining reads heterogeneity. It groups reads by user defined levels and measures how unique are reads in this level. Uniqueness of reads is simplified to the bins and colored according to the color gradient. Default color black indicates very high heterogeneity of the reads. The more yellow (default) the more similar are reads and less heterogeneous.

Usage

plot_heterogeneity(
  alignments,
  config,
  level = "ID",
  colors = c("#000000", "#F0E442"),
  bins = c(0, 5, seq(10, 100, 10))
)
plot_heterogeneity(
  alignments,
  config,
  level = "ID",
  colors = c("#000000", "#F0E442"),
  bins = c(0, 5, seq(10, 100, 10))
)

Arguments

`alignments`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`level`	(string) Name of the column from config file specifying levels to group by.
`colors`	(html colors vector) Two colours for gradient, eg. c('#000000', '#F0E442').
`bins`	(numeric vector) Numeric vector from 0 to 100 specifying bins eg. c(0, 5, seq(10, 100, 10)).

Value

(heterogeneity plot) ggplot2 object of heterogeneity plot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
plot_heterogeneity(alignments[alignments$consensus, ], config)

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
plot_heterogeneity(alignments[alignments$consensus, ], config)

Plots insertions using ggplot2.

Description

This function plots insertions in relation to the amplicon. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.

Usage

plot_insertions(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
plot_insertions(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)

Arguments

`alignments`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`id`	(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot.
`cut_buffer`	(numeric) Default is 5, you should specify the same as used in the analysis.
`xlab_spacing`	(numeric) Spacing of the x axis labels. Default is 4.

Value

(insertions plot) gtable object of insertions plot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
p <- plot_insertions(alignments, config, c('ID_1','ID_3'))

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
p <- plot_insertions(alignments, config, c('ID_1','ID_3'))

Plots mismatches using ggplot2.

Description

Plots mismatches in relation to the amplicon, assumes your reads are relative to the respective amplicon sequences predicted cut sites. Top plot is for the forward reads, middle one shows amplicon sequence, and bottom plot is for reverse reads.

Usage

plot_mismatches(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)
plot_mismatches(alignments, config, id, cut_buffer = 5, xlab_spacing = 4)

Arguments

`alignments`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`id`	(string or vector of strings) Name of the ID column from config file or name of multiple IDs, if it is possible to group them. They have to have the same amplicon, amplicons on the reverse strand will be reverse complemented to match forward strand amplicons.
`cut_buffer`	(numeric) Default is 5, you should specify the same as used in the analysis.
`xlab_spacing`	(numeric) Spacing of the x axis labels. Default is 4.

Value

(mismatches plot) gtable object of mismatches plot

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
id <- c('ID_1', 'ID_3'); cut_buffer = 5; xlab_spacing = 4;
p <- plot_mismatches(alignments, config, c('ID_1', 'ID_3'))
ggplot2::ggsave("~/test.png", p, width = 25, units = "in")

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)
id <- c('ID_1', 'ID_3'); cut_buffer = 5; xlab_spacing = 4;
p <- plot_mismatches(alignments, config, c('ID_1', 'ID_3'))
ggplot2::ggsave("~/test.png", p, width = 25, units = "in")

Plots most frequent variants using ggplot2.

Description

This function plots variants in relation to the amplicon. Shows sequences of top mutants without aggregating on deletions, insertions and mismatches.

Usage

plot_variants(
  alignments,
  config,
  id,
  cut_buffer = 5,
  top = 10,
  annot = if (amplican:::get_seq(config, id, "Donor") == "") "cov" else NA,
  summary_plot = amplican:::get_seq(config, id, "Donor") == "",
  frameshift = amplican:::get_seq(config, id, "Donor") == ""
)
plot_variants(
  alignments,
  config,
  id,
  cut_buffer = 5,
  top = 10,
  annot = if (amplican:::get_seq(config, id, "Donor") == "") "cov" else NA,
  summary_plot = amplican:::get_seq(config, id, "Donor") == "",
  frameshift = amplican:::get_seq(config, id, "Donor") == ""
)

Arguments

`alignments`	(data.frame) Loaded alignment information from alignments_events.csv file.
`config`	(data.frame) Loaded table from config_summary.csv file.
`id`	(string or vector of strings) Name of the ID column from config file or name of multiple IDs if it is possible to group them. First amplicon will be used as the basis for plot. If Donor is available we will try to add the first donor and mark it on the plot.
`cut_buffer`	(numeric) Default is 5, you should specify the same as used in the analysis.
`top`	(numeric) Specify number of most frequent reads to plot. By default it is 10. Check `plot_heterogeneity` to see how many reads will be enough to give good overview of your variants.
`annot`	("codon" or "cov" or NA) What to display for annotation top plot. When NA will not display anything, also not display total summary. Codon plot is all reading frames for a given window, and the default "cov" is coverage of all indels and mismatches over a given window.
`summary_plot`	(boolean) Whether small summary plot in the upper right corner should be displayed. Top bar summarizes total reads with frameshift (F), reads with Edits without Frameshift (Edits) and reads without Edits (Match).
`frameshift`	(boolean) Whether to include Frameshift column in the table. annot on \| off

Details

Top plot shows all six possible frames for given amplicon. Amino acids are colored as follows:

Small nonpolar	G, A, S, T	Orange
Hydrophobic	C, V, I, L, P, F, Y, M, W	Green
Polar	N, Q, H	Magenta
Negatively charged	D, E	Red
Positively charged	K, R	Blue
Other	eg. *, U, +	Grey

Variant plot shows amplicon reference, UPPER letters which were the basis for window selection are highlighted with dashed white box (guideRNA). Black triangles are reflecting insertion points. Dashed letters indicate deletions. Table associated with variant plot represents:

Freq - Frequency of given read in experiment. Variants are ordered by frequency value.
Count - Represents raw count of this variant reads in experiment.
F - Sum of deletion and insertion widths of events overlapping presented window. Green background indicates frameshift.

Value

(variant plot) gtable object of variants plot

Note

This function is inspired by plotAlignments.

Examples

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)

alignments <- alignments[alignments$consensus & alignments$overlaps, ]
p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ],
                   config, c('ID_1','ID_3'))
# with Donor we dont plot summary and the annot, summary plot and frameshift
p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ],
                   config, c('ID_5'))

#example config
config <- read.csv(system.file("extdata", "results", "config_summary.csv",
                               package = "amplican"))
#example alignments results
alignments_file <- system.file("extdata", "results", "alignments",
                               "events_filtered_shifted_normalized.csv",
                               package = "amplican")
alignments <- read.csv(alignments_file)

alignments <- alignments[alignments$consensus & alignments$overlaps, ]
p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ],
                   config, c('ID_1','ID_3'))
# with Donor we dont plot summary and the annot, summary plot and frameshift
p <- plot_variants(alignments[alignments$consensus & alignments$overlaps, ],
                   config, c('ID_5'))

Package 'amplican'

Help Index

Automated analysis of CRISPR experiments.

Description

Details

Author(s)

See Also

Pretty print forward and reverse reads aligned to each other.

Description

Usage

Arguments

Value

Examples

Align reads to amplicons.

Description

Usage

Arguments

Value

See Also

Examples

Extract consensus out of forward and reverse events.

Description

Usage

Arguments

Details

Value

See Also

Examples

Filter Events Overlapping Primers, PRIMER DIMERS and Low Alignment Score Events.

Description

Usage

Arguments

Value

See Also

Examples

Map events to their respective relative coordinates specified with UPPER case.

Description

Usage

Arguments

Value

See Also

Examples

Remove events that can be found in Controls.

Description

Usage

Arguments

Value

See Also

Examples

Check which events overlap expected cut sites.

Description

Usage

Arguments

Value

See Also

Examples

Wraps main package functionality into one function.

Description

Usage

Arguments

Value

See Also

Examples

Wraps main package functionality into one function.

Description

Usage

Arguments

Details

Value

See Also

Prepare reports as .Rmd files.

Description

Usage

Arguments

Value

See Also

Examples

Summarize how many reads have frameshift and how many reads have deletions.

Description

Usage