crisprBowtie
provides two main functions to align short
DNA sequences to a reference genome using the short read aligner bowtie
(Langmead et al. 2009) and return the
alignments as R objects: runBowtie
and
runCrisprBowtie
. It utilizes the Bioconductor package
Rbowtie
to access the Bowtie program in a
platform-independent manner. This means that users do not need to
install Bowtie prior to using crisprBowtie
.
The latter function (runCrisprBowtie
) is specifically
designed to map and annotate CRISPR guide RNA (gRNA) spacer sequences
using CRISPR nuclease objects and CRISPR genomic arithmetics defined in
the Bioconductor package crisprBase. This
enables a fast and accurate on-target and off-target search of gRNA
spacer sequences for virtually any type of CRISPR nucleases. It also
provides an off-target search engine for our main gRNA design package crisprDesign of
the crisprVerse ecosystem.
See the addSpacerAlignments
function in
crisprDesign
for more details.
This package is supported for macOS, Linux and Windows machines. Package was developed and tested on R version 4.2.
crisprBowtie
can be installed from from the Bioconductor
devel branch using the following commands in a fresh R session:
To use runBowtie
or runCrisprBowtie
, users
need to first build a Bowtie genome index. For a given genome, this step
has to be done only once. The Rbowtie
package conveniently
provides the function bowtie_build
to build a Bowtie index
from any custom genome from a FASTA file.
As an example, we build a Bowtie index for a small portion of the
human chromosome 1 (chr1.fa
file provided in the
crisprBowtie
package) and save the index file as
myIndex
to a temporary directory:
library(Rbowtie)
fasta <- file.path(find.package("crisprBowtie"), "example/chr1.fa")
tempDir <- tempdir()
Rbowtie::bowtie_build(fasta,
outdir=tempDir,
force=TRUE,
prefix="myIndex")
To learn how to create a Bowtie index for a complete genome or transcriptome, please visit our tutorial page.
runCrisprBowtie
As an example, we align 6 spacer sequences (of length 20bp) to the custom genome built above, allowing a maximum of 3 mismatches between the spacer and protospacer sequences.
We specify that the search is for the wildtype Cas9 (SpCas9) nuclease
by providing the CrisprNuclease
object SpCas9
available through the crisprBase
package. The argument
canonical=FALSE
specifies that non-canonical PAM sequences
are also considered (NAG and NGA for SpCas9). The function
getAvailableCrisprNucleases
in crisprBase
returns a character vector of available crisprNuclease
objects found in crisprBase
.
library(crisprBowtie)
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
spacers <- c("TCCGCGGGCGACAATGGCAT",
"TGATCCCGCGCTCCCCGATG",
"CCGGGAGCCGGGGCTGGACG",
"CCACCCTCAGGTGTGCGGCC",
"CGGAGGGCTGCAGAAAGCCT",
"GGTGATGGCGCGGGCCGGGC")
runCrisprBowtie(spacers,
crisprNuclease=crisprNuclease,
n_mismatches=3,
canonical=FALSE,
bowtie_index=file.path(tempDir, "myIndex"))
## [runCrisprBowtie] Searching for SpCas9 protospacers
## spacer protospacer pam chr pam_site strand
## 1 CCACCCTCAGGTGTGCGGCC CCACCCTCAGGTGTGCGGCC TGG chr1 679 +
## 2 CCGGGAGCCGGGGCTGGACG CCGGGAGCCGGGGCTGGACG GAG chr1 466 +
## 3 CGGAGGGCTGCAGAAAGCCT CGGAGGGCTGCAGAAAGCCT TGG chr1 706 +
## 4 GGTGATGGCGCGGGCCGGGC GGTGATGGCGCGGGCCGGGC CGG chr1 831 +
## 5 TGATCCCGCGCTCCCCGATG TGATCCCGCGCTCCCCGATG CAG chr1 341 +
## n_mismatches canonical
## 1 0 TRUE
## 2 0 FALSE
## 3 0 TRUE
## 4 0 TRUE
## 5 0 FALSE
The function runBowtie
is similar to
runCrisprBowtie
, but does not impose constraints on PAM
sequences. It can be used to search for any short read sequence in a
genome.
Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNA) experiments. runBowtie
can be used to
map shRNA/siRNA seed sequences to reference genomes to predict putative
off-targets:
seeds <- c("GTAAAGGT", "AAGGATTG")
runBowtie(seeds,
n_mismatches=2,
bowtie_index=file.path(tempDir, "myIndex"))
## query target chr pos strand n_mismatches
## 1 AAGGATTG AAAGAATG chr1 163 - 2
## 2 AAGGATTG AAGCCTTG chr1 700 + 2
## 3 AAGGATTG AAGGCTTT chr1 699 - 2
## 4 AAGGATTG CAGGCTTG chr1 905 - 2
## 5 GTAAAGGT GGGAAGGT chr1 724 + 2
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] crisprBowtie_1.11.0 Rbowtie_1.47.0 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.37.0 rjson_0.2.23
## [3] xfun_0.49 bslib_0.8.0
## [5] Biobase_2.67.0 lattice_0.22-6
## [7] tzdb_0.4.0 vctrs_0.6.5
## [9] tools_4.4.2 bitops_1.0-9
## [11] generics_0.1.3 stats4_4.4.2
## [13] curl_6.0.1 parallel_4.4.2
## [15] fansi_1.0.6 tibble_3.2.1
## [17] pkgconfig_2.0.3 Matrix_1.7-1
## [19] BSgenome_1.75.0 S4Vectors_0.45.2
## [21] lifecycle_1.0.4 GenomeInfoDbData_1.2.13
## [23] compiler_4.4.2 stringr_1.5.1
## [25] Rsamtools_2.23.1 Biostrings_2.75.1
## [27] codetools_0.2-20 GenomeInfoDb_1.43.2
## [29] htmltools_0.5.8.1 sys_3.4.3
## [31] buildtools_1.0.0 sass_0.4.9
## [33] RCurl_1.98-1.16 yaml_2.3.10
## [35] pillar_1.9.0 crayon_1.5.3
## [37] jquerylib_0.1.4 BiocParallel_1.41.0
## [39] cachem_1.1.0 DelayedArray_0.33.2
## [41] abind_1.4-8 tidyselect_1.2.1
## [43] digest_0.6.37 stringi_1.8.4
## [45] restfulr_0.0.15 maketools_1.3.1
## [47] fastmap_1.2.0 grid_4.4.2
## [49] cli_3.6.3 SparseArray_1.7.2
## [51] magrittr_2.0.3 S4Arrays_1.7.1
## [53] utf8_1.2.4 XML_3.99-0.17
## [55] withr_3.0.2 readr_2.1.5
## [57] UCSC.utils_1.3.0 bit64_4.5.2
## [59] rmarkdown_2.29 XVector_0.47.0
## [61] httr_1.4.7 matrixStats_1.4.1
## [63] bit_4.5.0 hms_1.1.3
## [65] evaluate_1.0.1 knitr_1.49
## [67] GenomicRanges_1.59.1 IRanges_2.41.1
## [69] BiocIO_1.17.1 rtracklayer_1.67.0
## [71] rlang_1.1.4 glue_1.8.0
## [73] crisprBase_1.11.0 BiocManager_1.30.25
## [75] BiocGenerics_0.53.3 vroom_1.6.5
## [77] jsonlite_1.8.9 R6_2.5.1
## [79] MatrixGenerics_1.19.0 GenomicAlignments_1.43.0
## [81] zlibbioc_1.52.0