sangeranalyseR is an R package that provides fast, flexible, and reproducible workflows for assembling Sanger sequencing data into contigs. It is a free, open-source alternative to Geneious, CodonCode Aligner, and Phred-Phrap-Consed. The full reference manual is on the ReadTheDocs site; this vignette focuses on the recipes most users actually need: how to call the constructors, what each parameter does, and how to interpret the output.
The package is built around three S4 classes that form a containment hierarchy:
SangerAlignment ← a set of contigs aligned to each other
└── SangerContig ← one assembled contig (forward + reverse reads)
└── SangerRead ← one ABIF or FASTA read
Every recipe uses the bundled Allolobophora chlorotica
ABIF fixture (8 reads arranged into 4 forward+reverse pairs). The
system.file() call below works from any installed copy of
the package.
ab1_dir <- system.file("extdata", "Allolobophora_chlorotica", "ACHLO",
package = "sangeranalyseR")
list.files(ab1_dir, pattern = "\\.ab1$")## [1] "Achl_ACHLO006-09_1_F.ab1" "Achl_ACHLO006-09_2_R.ab1"
## [3] "Achl_ACHLO007-09_1_F.ab1" "Achl_ACHLO007-09_2_R.ab1"
## [5] "Achl_ACHLO040-09_1_F.ab1" "Achl_ACHLO040-09_2_R.ab1"
## [7] "Achl_ACHLO041-09_1_F.ab1" "Achl_ACHLO041-09_2_R.ab1"
sc <- SangerContig(
inputSource = "ABIF",
processMethod = "REGEX",
ABIF_Directory = ab1_dir,
contigName = "Achl_ACHLO006-09",
REGEX_SuffixForward = "_[0-9]*_F\\.ab1$",
REGEX_SuffixReverse = "_[0-9]*_R\\.ab1$",
TrimmingMethod = "M1",
M1TrimmingCutoff = 0.0001
)
sc@objectResults@creationResult # TRUE
length(sc@forwardReadList) # 1 forward read
length(sc@reverseReadList) # 1 reverse read
as.character(sc@contigSeq) # the consensus sequenceSangerAlignment)When your filenames don’t follow a clean _F.ab1 /
_R.ab1 convention, supply a
reads,direction,contig CSV that explicitly maps every
read:
csv_path <- system.file("extdata", "ab1", "SangerAlignment",
"names_conversion.csv", package = "sangeranalyseR")
sa_csv <- SangerAlignment(
inputSource = "ABIF",
processMethod = "CSV",
ABIF_Directory = ab1_dir,
CSV_NamesConversion = csv_path
)Phase-15 fix: contig labels in the CSV no longer have to appear as
substrings of filenames. The CSV’s reads column drives the
lookup directly.
Common in 16S barcoding and short-read survey pipelines. Pass
NULL (or NA_character_) for the
missing-direction suffix and set minReadsNum = 1 so each
surviving read can become its own contig:
Two trimming algorithms, controlled by
TrimmingMethod:
| Method | Algorithm | Parameters |
|---|---|---|
"M1" |
Modified Mott’s (Phred/Phrap-style cumulative) | M1TrimmingCutoff (probability; default
0.0001) |
"M2" |
Sliding-window mean Phred (Trimmomatic-style) | M2CutoffQualityScore,
M2SlidingWindowSize |
Tighter trimming for noisy data:
sa_strict <- SangerAlignment(
inputSource = "ABIF",
processMethod = "REGEX",
ABIF_Directory = ab1_dir,
REGEX_SuffixForward = "_[0-9]*_F\\.ab1$",
REGEX_SuffixReverse = "_[0-9]*_R\\.ab1$",
TrimmingMethod = "M2",
M2CutoffQualityScore = 30,
M2SlidingWindowSize = 15,
minReadLength = 50 # post-trim length floor
)Phase-16 added a defensive width filter: any read trimmed to < 2
bp is silently dropped before alignment with a
MIN_READ_LENGTH_DEFENSIVE_DROP warning, so you never see
DECIPHER::AlignSeqs crash on degenerate inputs.
Forward + reverse reads with poor overlap silently produce IUPAC-ambiguity-soup consensus. Phase-16 adds an opt-in alignment-quality check:
sa_overlap <- SangerAlignment(
inputSource = "ABIF",
processMethod = "REGEX",
ABIF_Directory = ab1_dir,
REGEX_SuffixForward = "_[0-9]*_F\\.ab1$",
REGEX_SuffixReverse = "_[0-9]*_R\\.ab1$",
minOverlapBases = 50L, # warn if any pairwise overlap < 50 bp
minOverlapFraction = 0.05 # or < 5% of the shorter read
)When triggered, you’ll see LOW_OVERLAP_WARN in the log
alongside the offending read pair.
Three modes are exposed via consensusMethod:
| Mode | Behaviour | Per-position quality |
|---|---|---|
"strict" (default) |
DECIPHER’s ConsensusSequence with IUPAC ambiguity codes
for disagreements. |
not provided |
"majority" |
Per-column plurality vote; ties break alphabetically. No IUPAC codes ever appear in the output. | synthetic Phred |
"quality_weighted" |
Same as majority but votes are weighted by source-read Phred. Alias:
qualityAware = TRUE. |
mean Phred of agreers |
sc_majority <- SangerContig(
inputSource = "ABIF",
processMethod = "REGEX",
ABIF_Directory = ab1_dir,
contigName = "Achl_ACHLO006-09",
REGEX_SuffixForward = "_[0-9]*_F\\.ab1$",
REGEX_SuffixReverse = "_[0-9]*_R\\.ab1$",
consensusMethod = "majority"
)
as.character(sc_majority@contigSeq) # plain ACGT, no IUPAC codes
attr(sc_majority@contigSeq, "qualityScores") # per-position synthetic PhredABIF files store per-base trace amplitudes for the four channels (A,
C, G, T) and per-base Phred quality scores in the PCON.2
data block. sangeranalyseR re-runs the base-calling step on
those raw traces using the MakeBaseCallsInside helper:
getpeaks).signalRatioCutoff (default 0.33 — secondary peaks
below that fraction of the primary peak are dropped).abifRawData@data$PCON.2 (one
entry per detected peak).Visualize either as a static PDF
(chromatogram_overwrite()) or an interactive WebGL widget
(chromatogram_plotly() — Phase-8):
sr <- sa@contigList[[1]]@forwardReadList[[1]]
chromatogram_plotly(sr, max_points = 8000, showtrim = TRUE)For very long traces (> 50 k points) the widget downsamples by
uniform stride to keep the browser responsive; the original / rendered
point counts are reported via
attr(p, "downsample_info").
Each SangerRead exposes:
@primarySeq — the strongest base at each position
(DNAString).@secondarySeq — the second-strongest base at each
position (DNAString).@signalRatioCutoff (in @ChromatogramParam)
— the threshold below which a secondary peak is dropped.To inspect secondary peaks within a contig alignment, look at
sc@secondaryPeakDF — Phase-3 added
one-row-per-ambiguous-column reporting:
Re-run base-calling with a tighter (or looser) cutoff:
Per-read trimming sliders, contig overview, alignment browser, FASTA / HTML report export. Phase-8 also added a lightweight gadget for batch trimming across a whole alignment:
out_dir <- tempdir()
writeFasta(sa, outputDir = out_dir) # SR / SC / SA dispatcher
generateReport(sa, outputDir = out_dir) # HTML report (requires pandoc)Phase-8 fix: reports now correctly populate the per-frame AA tables
under the default lazyAA = TRUE constructor mode
(previously the tables silently rendered empty).
Every recipe above is built on three S4 constructors. The full
parameter list is in ?SangerAlignment /
?SangerContig / ?SangerRead; the
most-asked-about groups are summarised below.
| Parameter | What it controls |
|---|---|
inputSource |
"ABIF" (raw chromatograms) or "FASTA"
(pre-called sequences). |
processMethod |
"REGEX" (group reads by filename suffix) or
"CSV" (explicit reads,direction,contig
mapping). |
ABIF_Directory |
Path to the directory of .ab1 files. Required for
inputSource = "ABIF". |
FASTA_File |
Path to a single FASTA file. Required for
inputSource = "FASTA". |
REGEX_SuffixForward |
A regex matched against forward-read filenames,
e.g. "_F\\.ab1$". Pass NULL for
reverse-only. |
REGEX_SuffixReverse |
A regex matched against reverse-read filenames,
e.g. "_R\\.ab1$". Pass NULL for
forward-only. |
contigName |
(SangerContig only) The label / prefix shared by reads
in this contig. |
CSV_NamesConversion |
Path to a CSV with three columns: reads,
direction (F/R), contig. Required for
processMethod = "CSV". |
| Parameter | When used | Default | Notes |
|---|---|---|---|
TrimmingMethod |
always | "M1" |
"M1" (modified Mott) or "M2" (sliding
window). |
M1TrimmingCutoff |
TrimmingMethod = "M1" |
0.0001 |
Cumulative probability cutoff. Tighter = more aggressive trim. |
M2CutoffQualityScore |
TrimmingMethod = "M2" |
20 |
Mean Phred threshold within the sliding window. |
M2SlidingWindowSize |
TrimmingMethod = "M2" |
10 |
Width of the sliding window in bp. |
minReadLength |
always | 20L |
Reads trimmed to less than this are dropped from the contig. |
signalRatioCutoff |
inputSource = "ABIF" |
0.33 |
Secondary peaks below this fraction of the primary peak are dropped. |
| Parameter | Default | Notes |
|---|---|---|
consensusMethod |
"strict" |
"strict" (DECIPHER+IUPAC), "majority"
(plurality vote, no IUPAC), "quality_weighted"
(Phred-weighted). |
qualityAware |
FALSE |
Shorthand for
consensusMethod = "quality_weighted". |
minFractionCall |
0.5 |
DECIPHER minInformation for "strict"
mode. |
maxFractionLost |
0.5 |
DECIPHER threshold for "strict" mode. |
minOverlapBases |
0L |
If > 0, log LOW_OVERLAP_WARN when smallest pairwise
non-gap overlap < this. |
minOverlapFraction |
0.0 |
Same in fractional terms (overlap as a fraction of the shorter read). |
alignSeqsParams |
list() |
Extra named args forwarded to DECIPHER::AlignSeqs
(e.g. list(iterations = 1L, refinements = 1L)). |
| Parameter | Default | Notes |
|---|---|---|
processorsNum |
1 |
Legacy integer worker count. Honoured for backwards compatibility. |
BPPARAM |
NULL |
Any BiocParallelParam. Auto-derived from
processorsNum if NULL
(SerialParam for 1,
Multicore/Snow for ≥2). |
lazyAA |
TRUE |
Skip eager 3-frame AA translation (Phase-6 default; ~35% wall-time
saving). Use primaryAASeqS{1,2,3}() accessors. |
| Symptom | Cause / fix |
|---|---|
'qualityPhredScores' length cannot be zero |
ABIF has empty PCON.2 quality block (older 3500/Beckman
firmware). Phase-15 fix: synthesises Phred 30 with
MISSING_QUALITY_SCORES_WARN. Update to the devel
branch. |
'REGEX_SuffixReverse' must be character type on
forward-only data |
Phase-15 fix: pass REGEX_SuffixReverse = NULL (or
NA_character_) for forward-only datasets, plus
minReadsNum = 1. |
CONTIG_NUMBER_ZERO_ERROR even though each
SangerContig() works individually |
Phase-15 fix: the CSV+ABIF aggregator no longer requires contig
labels to be substrings of filenames; the reads column
drives the lookup. |
'x' must be an XStringSet object from
writeFasta on a single-read contig |
Phase-15 fix: writeFastaSC detects empty alignment and
writes a single-record FASTA from @contigSeq. |
| Consensus is full of IUPAC ambiguity codes | Phase-17: try consensusMethod = "majority" or
"quality_weighted". Also check pairwise overlap with
minOverlapBases = 50L to detect spurious merges. |
| Reports render with empty AA tables | Phase-8 fix: the RMD templates were reading
@primaryAASeqS* slots directly under
lazyAA = TRUE. Fixed in the devel branch — use
primaryAASeqS1/S2/S3() accessors if you customise the
templates. |
Please cite the package via:
Kuan-Hao Chao, Kirston Barton, Sarah Palmer, Robert Lanfear (2021). sangeranalyseR: simple and interactive processing of Sanger sequencing data in R. Genome Biology and Evolution. doi:10.1093/gbe/evab028.
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] sangeranalyseR_1.23.0 sangerseqR_1.49.0 stringr_1.6.0
## [4] pwalign_1.9.1 DECIPHER_3.9.0 Biostrings_2.81.2
## [7] Seqinfo_1.3.0 XVector_0.53.0 IRanges_2.47.2
## [10] S4Vectors_0.51.3 BiocGenerics_0.59.6 generics_0.1.4
## [13] BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] ade4_1.7-24 tidyselect_1.2.1 viridisLite_0.4.3
## [4] dplyr_1.2.1 farver_2.1.2 S7_0.2.2
## [7] fastmap_1.2.0 lazyeval_0.2.3 promises_1.5.0
## [10] shinyjs_2.1.1 digest_0.6.39 mime_0.13
## [13] lifecycle_1.0.5 magrittr_2.0.5 compiler_4.6.0
## [16] rlang_1.2.0 sass_0.4.10 tools_4.6.0
## [19] yaml_2.3.12 data.table_1.18.4 excelR_0.4.0
## [22] knitr_1.51 htmlwidgets_1.6.4 RColorBrewer_1.1-3
## [25] BiocParallel_1.47.0 purrr_1.2.2 sys_3.4.3
## [28] shinyWidgets_0.9.1 grid_4.6.0 xtable_1.8-8
## [31] ggplot2_4.0.3 scales_1.4.0 MASS_7.3-65
## [34] cli_3.6.6 rmarkdown_2.31 crayon_1.5.3
## [37] otel_0.2.0 httr_1.4.8 DBI_1.3.0
## [40] ape_5.8-1 cachem_1.1.0 parallel_4.6.0
## [43] BiocManager_1.30.27 vctrs_0.7.3 jsonlite_2.0.0
## [46] seqinr_4.2-44 maketools_1.3.2 plotly_4.12.0
## [49] jquerylib_0.1.4 tidyr_1.3.2 ggdendro_0.2.0
## [52] glue_1.8.1 codetools_0.2-20 DT_0.34.0
## [55] stringi_1.8.7 gtable_0.3.6 later_1.4.8
## [58] shinycssloaders_1.1.0 shinydashboard_0.7.3 tibble_3.3.1
## [61] logger_0.4.2 pillar_1.11.1 htmltools_0.5.9
## [64] R6_2.6.1 evaluate_1.0.5 shiny_1.13.0
## [67] lattice_0.22-9 openxlsx_4.2.8.1 httpuv_1.6.17
## [70] bslib_0.11.0 Rcpp_1.1.1-1.1 zip_2.3.3
## [73] gridExtra_2.3 nlme_3.1-169 xfun_0.58
## [76] buildtools_1.0.0 pkgconfig_2.0.3
SangerAlignment)