crisprBwa
provides two main functions to align short DNA
sequences to a reference genome using the short read aligner
BWA-backtrack (Li and Durbin 2009) and
return the alignments as R objects: runBwa
and
runCrisprBwa
. It utilizes the Bioconductor package
Rbwa
to access the BWA program in a platform-independent
manner. This means that users do not need to install BWA prior to using
crisprBwa
.
The latter function (runCrisprBwa
) is specifically
designed to map and annotate CRISPR guide RNA (gRNA) spacer sequences
using CRISPR nuclease objects and CRISPR genomic arithmetics defined in
the Bioconductor package crisprBase. This
enables a fast and accurate on-target and off-target search of gRNA
spacer sequences for virtually any type of CRISPR nucleases. It also
provides an off-target search engine for our main gRNA design package crisprDesign of
the crisprVerse ecosystem.
See the addSpacerAlignments
function in
crisprDesign
for more details.
This package is supported for macOS and Linux only. Package was developed and tested on R version 4.2.1.
To use runBwa
or runCrisprBwa
, users need
to first build a BWA genome index. For a given genome, this step has to
be done only once. The Rbwa
package conveniently provides
the function bwa_build_index
to build a BWA index from any
custom genome from a FASTA file.
As an example, we build a BWA index for a small portion of the human
chromosome 12 (chr12.fa
file provided in the
crisprBwa
package) and save the index file as
myIndex
to a temporary directory:
library(Rbwa)
fasta <- system.file(package="crisprBwa", "example/chr12.fa")
outdir <- tempdir()
index <- file.path(outdir, "chr12")
Rbwa::bwa_build_index(fasta,
index_prefix=index)
To learn how to create a BWA index for a complete genome or transcriptome, please visit our tutorial page.
runCrisprBwa
As an example, we align 5 spacer sequences (of length 20bp) to the custom genome built above, allowing a maximum of 3 mismatches between the spacer and protospacer sequences.
We specify that the search is for the wildtype Cas9 (SpCas9) nuclease
by providing the CrisprNuclease
object SpCas9
available through the crisprBase
package. The argument
canonical=FALSE
specifies that non-canonical PAM sequences
are also considered (NAG and NGA for SpCas9). The function
getAvailableCrisprNucleases
in crisprBase
returns a character vector of available crisprNuclease
objects found in crisprBase
.
We also need to provide a BSgenome
object corresponding
to the reference genome used for alignment to extract protospacer and
PAM sequences of the target sequences.
## Loading required package: GenomeInfoDb
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: BSgenome
## Loading required package: GenomicRanges
## Loading required package: Biostrings
## Loading required package: XVector
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
## Loading required package: BiocIO
## Loading required package: rtracklayer
##
## Attaching package: 'rtracklayer'
## The following object is masked from 'package:BiocIO':
##
## FileForFormat
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
bsgenome <- BSgenome.Hsapiens.UCSC.hg38
spacers <- c("AGCTGTCCGTGGGGGTCCGC",
"CCCCTGCTGCTGTGCCAGGC",
"ACGAACTGTAAAAGGCTTGG",
"ACGAACTGTAACAGGCTTGG",
"AAGGCCCTCAGAGTAATTAC")
runCrisprBwa(spacers,
bsgenome=bsgenome,
crisprNuclease=crisprNuclease,
n_mismatches=3,
canonical=FALSE,
bwa_index=index)
## [runCrisprBwa] Using BSgenome.Hsapiens.UCSC.hg38
## [runCrisprBwa] Searching for SpCas9 protospacers
## spacer protospacer pam chr pam_site strand
## 1 AAGGCCCTCAGAGTAATTAC AAGGCCCTCAGAGTAATTAC AGA chr12 170636 +
## 2 ACGAACTGTAAAAGGCTTGG ACGAACTGTAAAAGGCTTGG AGG chr12 170815 -
## 3 ACGAACTGTAACAGGCTTGG ACGAACTGTAAAAGGCTTGG AGG chr12 170815 -
## 4 AGCTGTCCGTGGGGGTCCGC AGCTGTCCGTGGGGGTCCGC AGG chr12 170585 +
## 5 CCCCTGCTGCTGTGCCAGGC CCCCTGCTGCTGTGCCAGGC CGG chr12 170609 +
## n_mismatches canonical
## 1 0 FALSE
## 2 0 TRUE
## 3 1 TRUE
## 4 0 TRUE
## 5 0 TRUE
The function runBwa
is similar to
runCrisprBwa
, but does not impose constraints on PAM
sequences. It can be used to search for any short read sequence in a
genome.
Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNAi) experiments. runBWa
can be used to map
shRNA/siRNA seed sequences to reference genomes to predict putative
off-targets:
## query chr pos strand n_mismatches
## 1 AACGGGGAGATTG chr12 68337 - 2
## 2 AACGGGGAGATTG chr12 1666 - 2
## 3 AACGGGGAGATTG chr12 123863 + 2
## 4 AACGGGGAGATTG chr12 151731 - 2
## 5 AACGGGGAGATTG chr12 110901 + 2
## 6 GTAAGCGGAGTGT chr12 101550 - 2
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.75.0
## [3] rtracklayer_1.67.0 BiocIO_1.17.0
## [5] Biostrings_2.75.1 XVector_0.47.0
## [7] GenomicRanges_1.59.1 GenomeInfoDb_1.43.1
## [9] IRanges_2.41.1 S4Vectors_0.45.2
## [11] BiocGenerics_0.53.3 generics_0.1.3
## [13] crisprBwa_1.11.0 Rbwa_1.11.0
## [15] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.37.0 rjson_0.2.23
## [3] xfun_0.49 bslib_0.8.0
## [5] Biobase_2.67.0 lattice_0.22-6
## [7] tzdb_0.4.0 vctrs_0.6.5
## [9] tools_4.4.2 bitops_1.0-9
## [11] curl_6.0.1 parallel_4.4.2
## [13] fansi_1.0.6 tibble_3.2.1
## [15] pkgconfig_2.0.3 Matrix_1.7-1
## [17] lifecycle_1.0.4 GenomeInfoDbData_1.2.13
## [19] compiler_4.4.2 stringr_1.5.1
## [21] Rsamtools_2.23.0 codetools_0.2-20
## [23] htmltools_0.5.8.1 sys_3.4.3
## [25] buildtools_1.0.0 sass_0.4.9
## [27] RCurl_1.98-1.16 yaml_2.3.10
## [29] pillar_1.9.0 crayon_1.5.3
## [31] jquerylib_0.1.4 BiocParallel_1.41.0
## [33] cachem_1.1.0 DelayedArray_0.33.2
## [35] abind_1.4-8 tidyselect_1.2.1
## [37] digest_0.6.37 stringi_1.8.4
## [39] restfulr_0.0.15 maketools_1.3.1
## [41] fastmap_1.2.0 grid_4.4.2
## [43] cli_3.6.3 SparseArray_1.7.2
## [45] magrittr_2.0.3 S4Arrays_1.7.1
## [47] utf8_1.2.4 XML_3.99-0.17
## [49] readr_2.1.5 UCSC.utils_1.3.0
## [51] bit64_4.5.2 rmarkdown_2.29
## [53] httr_1.4.7 matrixStats_1.4.1
## [55] bit_4.5.0 hms_1.1.3
## [57] evaluate_1.0.1 knitr_1.49
## [59] rlang_1.1.4 glue_1.8.0
## [61] crisprBase_1.11.0 BiocManager_1.30.25
## [63] vroom_1.6.5 jsonlite_1.8.9
## [65] R6_2.5.1 MatrixGenerics_1.19.0
## [67] GenomicAlignments_1.43.0 zlibbioc_1.52.0