crisprBowtie: alignment of gRNA spacer sequences using bowtie

Overview of crisprBowtie

crisprBowtie provides two main functions to align short DNA sequences to a reference genome using the short read aligner bowtie (Langmead et al. 2009) and return the alignments as R objects: runBowtie and runCrisprBowtie. It utilizes the Bioconductor package Rbowtie to access the Bowtie program in a platform-independent manner. This means that users do not need to install Bowtie prior to using crisprBowtie.

The latter function (runCrisprBowtie) is specifically designed to map and annotate CRISPR guide RNA (gRNA) spacer sequences using CRISPR nuclease objects and CRISPR genomic arithmetics defined in the Bioconductor package crisprBase. This enables a fast and accurate on-target and off-target search of gRNA spacer sequences for virtually any type of CRISPR nucleases. It also provides an off-target search engine for our main gRNA design package crisprDesign of the crisprVerse ecosystem. See the addSpacerAlignments function in crisprDesign for more details.

Installation and getting started

Software requirements

OS Requirements

This package is supported for macOS, Linux and Windows machines. Package was developed and tested on R version 4.2.

Installation from Bioconductor

crisprBowtie can be installed from from the Bioconductor devel branch using the following commands in a fresh R session:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(version="devel")
BiocManager::install("crisprBowtie")

Building a bowtie index

To use runBowtie or runCrisprBowtie, users need to first build a Bowtie genome index. For a given genome, this step has to be done only once. The Rbowtie package conveniently provides the function bowtie_build to build a Bowtie index from any custom genome from a FASTA file.

As an example, we build a Bowtie index for a small portion of the human chromosome 1 (chr1.fa file provided in the crisprBowtie package) and save the index file as myIndex to a temporary directory:

library(Rbowtie)
fasta <- file.path(find.package("crisprBowtie"), "example/chr1.fa")
tempDir <- tempdir()
Rbowtie::bowtie_build(fasta,
                      outdir=tempDir,
                      force=TRUE,
                      prefix="myIndex")

To learn how to create a Bowtie index for a complete genome or transcriptome, please visit our tutorial page.

Alignment using runCrisprBowtie

As an example, we align 6 spacer sequences (of length 20bp) to the custom genome built above, allowing a maximum of 3 mismatches between the spacer and protospacer sequences.

We specify that the search is for the wildtype Cas9 (SpCas9) nuclease by providing the CrisprNuclease object SpCas9 available through the crisprBase package. The argument canonical=FALSE specifies that non-canonical PAM sequences are also considered (NAG and NGA for SpCas9). The function getAvailableCrisprNucleases in crisprBase returns a character vector of available crisprNuclease objects found in crisprBase.

library(crisprBowtie)
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
spacers <- c("TCCGCGGGCGACAATGGCAT",
             "TGATCCCGCGCTCCCCGATG",
             "CCGGGAGCCGGGGCTGGACG",
             "CCACCCTCAGGTGTGCGGCC",
             "CGGAGGGCTGCAGAAAGCCT",
             "GGTGATGGCGCGGGCCGGGC")
runCrisprBowtie(spacers,
                crisprNuclease=crisprNuclease,
                n_mismatches=3,
                canonical=FALSE,
                bowtie_index=file.path(tempDir, "myIndex"))
## [runCrisprBowtie] Searching for SpCas9 protospacers
##                 spacer          protospacer pam  chr pam_site strand
## 1 CCACCCTCAGGTGTGCGGCC CCACCCTCAGGTGTGCGGCC TGG chr1      679      +
## 2 CCGGGAGCCGGGGCTGGACG CCGGGAGCCGGGGCTGGACG GAG chr1      466      +
## 3 CGGAGGGCTGCAGAAAGCCT CGGAGGGCTGCAGAAAGCCT TGG chr1      706      +
## 4 GGTGATGGCGCGGGCCGGGC GGTGATGGCGCGGGCCGGGC CGG chr1      831      +
## 5 TGATCCCGCGCTCCCCGATG TGATCCCGCGCTCCCCGATG CAG chr1      341      +
##   n_mismatches canonical
## 1            0      TRUE
## 2            0     FALSE
## 3            0      TRUE
## 4            0      TRUE
## 5            0     FALSE

Applications beyond CRISPR

The function runBowtie is similar to runCrisprBowtie, but does not impose constraints on PAM sequences. It can be used to search for any short read sequence in a genome.

Example using RNAi (siRNA design)

Seed-related off-targets caused by mismatch tolerance outside of the seed region is a well-studied and characterized problem observed in RNA interference (RNA) experiments. runBowtie can be used to map shRNA/siRNA seed sequences to reference genomes to predict putative off-targets:

seeds <- c("GTAAAGGT", "AAGGATTG")
runBowtie(seeds,
          n_mismatches=2,
          bowtie_index=file.path(tempDir, "myIndex"))
##      query   target  chr pos strand n_mismatches
## 1 AAGGATTG AAAGAATG chr1 163      -            2
## 2 AAGGATTG AAGCCTTG chr1 700      +            2
## 3 AAGGATTG AAGGCTTT chr1 699      -            2
## 4 AAGGATTG CAGGCTTG chr1 905      -            2
## 5 GTAAAGGT GGGAAGGT chr1 724      +            2

Reproducibility

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] crisprBowtie_1.11.0 Rbowtie_1.45.0      BiocStyle_2.35.0   
## 
## loaded via a namespace (and not attached):
##  [1] SummarizedExperiment_1.35.5 rjson_0.2.23               
##  [3] xfun_0.48                   bslib_0.8.0                
##  [5] Biobase_2.67.0              lattice_0.22-6             
##  [7] tzdb_0.4.0                  vctrs_0.6.5                
##  [9] tools_4.4.1                 bitops_1.0-9               
## [11] stats4_4.4.1                curl_5.2.3                 
## [13] parallel_4.4.1              fansi_1.0.6                
## [15] tibble_3.2.1                pkgconfig_2.0.3            
## [17] Matrix_1.7-1                BSgenome_1.73.1            
## [19] S4Vectors_0.43.2            lifecycle_1.0.4            
## [21] GenomeInfoDbData_1.2.13     compiler_4.4.1             
## [23] stringr_1.5.1               Rsamtools_2.21.2           
## [25] Biostrings_2.75.0           codetools_0.2-20           
## [27] GenomeInfoDb_1.41.2         htmltools_0.5.8.1          
## [29] sys_3.4.3                   buildtools_1.0.0           
## [31] sass_0.4.9                  RCurl_1.98-1.16            
## [33] yaml_2.3.10                 pillar_1.9.0               
## [35] crayon_1.5.3                jquerylib_0.1.4            
## [37] BiocParallel_1.39.0         cachem_1.1.0               
## [39] DelayedArray_0.31.14        abind_1.4-8                
## [41] tidyselect_1.2.1            digest_0.6.37              
## [43] stringi_1.8.4               restfulr_0.0.15            
## [45] maketools_1.3.1             fastmap_1.2.0              
## [47] grid_4.4.1                  cli_3.6.3                  
## [49] SparseArray_1.5.45          magrittr_2.0.3             
## [51] S4Arrays_1.5.11             utf8_1.2.4                 
## [53] XML_3.99-0.17               withr_3.0.2                
## [55] readr_2.1.5                 UCSC.utils_1.1.0           
## [57] bit64_4.5.2                 rmarkdown_2.28             
## [59] XVector_0.45.0              httr_1.4.7                 
## [61] matrixStats_1.4.1           bit_4.5.0                  
## [63] hms_1.1.3                   evaluate_1.0.1             
## [65] knitr_1.48                  GenomicRanges_1.57.2       
## [67] IRanges_2.39.2              BiocIO_1.17.0              
## [69] rtracklayer_1.65.0          rlang_1.1.4                
## [71] glue_1.8.0                  crisprBase_1.9.1           
## [73] BiocManager_1.30.25         BiocGenerics_0.53.0        
## [75] vroom_1.6.5                 jsonlite_1.8.9             
## [77] R6_2.5.1                    MatrixGenerics_1.17.1      
## [79] GenomicAlignments_1.41.0    zlibbioc_1.51.2

References

Langmead, Ben, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. “Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome.” Genome Biology 10 (3): R25. https://doi.org/10.1186/gb-2009-10-3-r25.