The crisprViz
package enables the graphical
interpretation of GuideSet
objects from the crisprDesign
package by plotting guide RNA (gRNA) cutting locations against their
target gene or other genomic region.
This vignette walks through several use cases that demonstrate the
range of and how to use plotting functions in the crisprViz
package. This vignette also uses our core gRNA design package crisprDesignto
manipulate GuideSet
objects in conjunction with plotting in
the process of gRNA design.
Visit our crisprVerse tutorial page to learn more about how to design gRNAs for different applications.
This package is supported for macOS, Linux and Windows machines. Packages were developed and tested on R version 4.2.
All examples in this vignette will use human genome assembly
hg38
from the BSgenome.Hsapiens.UCSC.hg38
package and gene model coordinates from Ensembl release 104. We begin by
loading the necessary packages.
Suppose we want to design the four best gRNAs using the SpCas9 CRISPR nuclease to knockout the human KRAS gene. To have the greatest impact on gene function we want to prioritize gRNAs that have greater isoform coverage, and target closer to the 5’ end of the CDS.
Let’s load a precomputed GuideSet
object containing all
possible gRNAs targeting the the CDS of KRAS, and a
GRangesList
object describing the gene model for KRAS.
data("krasGuideSet", package="crisprViz")
data("krasGeneModel", package="crisprViz")
length(krasGuideSet) # number of candidate gRNAs
## [1] 52
For how to design such gRNAs, see the crisprDesign
package. Before we plot all of our candidate gRNAs, let’s first generate
a simple plot with a few gRNAs to familiarize ourselves with some plot
components and options.
There are a few things to note here.
targetGene
KRAS is plotted next, using coordinates
from the provided gene model krasGeneModel
, followed by our
spacer subset. The name of each track is given on the left.<-
for reverse strand and ->
for forward
strand.This last point is important: the default plot window is set by the
spacers’ ranges in the input GuideSet
object. We can
manually adjust this window by using the from
,
to
, extend.left
, and extend.right
arguments. Here is the same plot adjusted to show the whole KRAS gene,
which also reveals an additional isoform that is not targeted by any
spacer in this example subset.
from <- min(start(krasGeneModel$transcripts))
to <- max(end(krasGeneModel$transcripts))
plotGuideSet(krasGuideSet[1:4],
geneModel=krasGeneModel,
targetGene="KRAS",
from=from,
to=to,
extend.left=1000,
extend.right=1000)
As calculated above, there are a total of 52 candidate gRNAs
targeting the CDS of KRAS. Including all of them could crowd the plot
space, making it difficult to interpret. To alleviate this we can hide
the gRNA labels by setting the showGuideLabels
argument to
FALSE
.
plotGuideSet(krasGuideSet,
geneModel=krasGeneModel,
targetGene="KRAS",
showGuideLabels=FALSE,
from=from,
to=to,
extend.left=1000,
extend.right=1000)
At the gene level, the plot window is too large to discern details for each spacer target. However, we can see five distinct clusters of spacer target locations that cover the CDS of KRAS. The spacers in the 5’-most cluster (on the reverse strand) target the only coding region of the gene that is expressed by all isoforms, making it an ideal target for our scenario.
We can see which gRNAs target this region by returning
showGuideLabels
to its default value of TRUE
,
and by adjusting the plot window to focus on our exon of interest.
# new window range around target exon
targetExon <- queryTxObject(krasGeneModel,
featureType="cds",
queryColumn="exon_id",
queryValue="ENSE00000936617")
targetExon <- unique(targetExon)
from <- start(targetExon)
to <- end(targetExon)
plotGuideSet(krasGuideSet,
geneModel=krasGeneModel,
targetGene="KRAS",
from=from,
to=to,
extend.left=20,
extend.right=20)
At this resolution we can get a much better idea of spacer location and orientation. In particular, the PAM sequence is visible as a narrow box on the 3’ side of our protospacer sequences. We can also distinctly see which spacer targets overlap each other–it may be best to avoid pairing such spacers in some applications lest they sterically interfere with each other.
If we have many gRNA targets in a smaller window and are not
concerned with overlaps, we can configure the plot to only show the
pam_site
, rather than the entire protospacer and PAM
sequence, by setting pamSiteOnly
to TRUE
.
plotGuideSet(krasGuideSet,
geneModel=krasGeneModel,
targetGene="KRAS",
from=from,
to=to,
extend.left=20,
extend.right=20,
pamSiteOnly=TRUE)
Let’s filter our GuideSet
by the spacer names in the
plot then pass an on-target score column in our GuideSet
to
onTargetScores
to color the spacers according to that
score, with darker blue colors indicating higher scores. Note that for
this plot we need not provide values for from
and
to
, as the plot window adjusts to our filtered
GuideSet
.
For a given CRISPR application, the target region may consist of only several base pairs rather than an exon or entire gene CDS. In these instances it may be important to know exactly where the gRNAs target, and plots of gRNAs must be at a resolution capable of distinguishing individual bases. This is often the case for CRISPR base editor (CRISPRbe) applications, as the editing window for each gRNA is narrow and the results are specific to each target sequence.
In this example, we will zoom in on a few gRNAs targeting the 5’ end
of the human GPR21 gene. We want our plot to include genomic sequence
information so we will set the bsgenome
argument to the
same BSgenome
object we used to create our
GuideSet
.
First, we load the precomputed GuideSet
and gene model
objects for GPR21,
and then plot the gRNAs.
plotGuideSet(gpr21GuideSet,
geneModel=gpr21GeneModel,
targetGene="GPR21",
bsgenome=BSgenome.Hsapiens.UCSC.hg38,
margin=0.3)
The genomic sequence is given at the bottom of the plot as color-coded boxes. The color scheme for describing the nucleotides is given in the biovizBase package. If the plot has sufficient space, it will display nucleotide symbols rather than boxes. We can accomplish this by plotting a narrower range or by increasing the width of our plot space (see “Setting plot size” section).
The plot above was generated with a plot space width of 6 inches; here’s the same plot after we increase the width to 10 inches:
In this scenario we want to increase expression of the human MMP7 gene via CRISPR activation (CRISPRa). We will use the SpCas9 CRISPR nuclease.
The GuideSet
contains candidate gRNAs in the 2kb window
immediately upstream of the TSS of MMP7. We will also use a
GRanges
object containing repeat elements in this
region:
Let’s begin by plotting our GuideSet
, and adding a track
of repeat elements using the annotations
argument. Our
guideSet
also contains SNP annotation, which we would also
prefer our gRNAs to not overlap. To include a SNP annotation track, we
will set includeSNPTrack=TRUE
(default).
from <- min(start(mmp7GuideSet))
to <- max(end(mmp7GuideSet))
plotGuideSet(mmp7GuideSet,
geneModel=mmp7GeneModel,
targetGene="MMP7",
guideStacking="dense",
annotations=list(Repeats=repeats),
pamSiteOnly=TRUE,
from=from,
to=to,
extend.left=600,
extend.right=100,
includeSNPTrack=TRUE)
Some of our candidate gRNAs target repeat elements and likely target a large number of loci in the genome, potentially causing unintended effects, or overlap with SNPs, which can reduce its activity. Let’s remove these gRNAs and regenerate the plot.
filteredGuideSet <- crisprDesign::removeRepeats(mmp7GuideSet,
gr.repeats=repeats)
filteredGuideSet <- filteredGuideSet[!filteredGuideSet$hasSNP]
plotGuideSet(filteredGuideSet,
geneModel=mmp7GeneModel,
targetGene="MMP7",
guideStacking="dense",
annotations=list(Repeats=repeats),
pamSiteOnly=TRUE,
from=from,
to=to,
extend.left=600,
extend.right=100,
includeSNPTrack=TRUE)
Note how removing gRNAs that overlap SNPs from our
GuideSet
also removed the SNP track. To prevent plotting an
empty track, plotGuideSet
will only include a SNPs track if
at least one gRNA includes SNP annotation (i.e. overlaps a SNP).
Conversely, there are specific genomic regions that would be
beneficial to target, such as CAGE peaks and DNase I Hypersensitivity
tracks. We show in the inst\scripts
folder how to obtain
such data from the Bioconductor package AnnotationHub
, but
for the sake of time, we have precomputed those objects and they can be
loaded from the crisprViz
package directly:
We now plot gRNAs alongside with those two tracks:
plotGuideSet(filteredGuideSet,
geneModel=mmp7GeneModel,
targetGene="MMP7",
guideStacking="dense",
annotations=list(CAGE=cage, DNase=dnase),
pamSiteOnly=TRUE,
from=from,
to=to,
extend.left=600,
extend.right=100)
Let’s filter our GuideSet
for guides overlapping the
plotted DNase site then regenerate the plot.
# filter GuideSet for gRNAs overlapping DNase track
overlaps <- findOverlaps(filteredGuideSet, dnase, ignore.strand=TRUE)
finalGuideSet <- filteredGuideSet[queryHits(overlaps)]
plotGuideSet(finalGuideSet,
geneModel=mmp7GeneModel,
targetGene="MMP7",
guideStacking="dense",
annotations=list(CAGE=cage, DNase=dnase),
pamSiteOnly=TRUE,
margin=0.4)
The choice of the CRISPR nuclease can be influenced by the abundance
of PAM sequences recognized by a given nuclease in the target region.
For example, we would expect AT-rich regions to have fewer possible
targets for the SpCas9 nuclease, whose PAM is NGG. In these regions, the
CRISPR nuclease AsCas12a, whose PAM is TTTV, may prove more appropriate.
Given multiple GuideSet
s targeting the same region, we can
compare the gRNAs of each in the same plot using
plotMultipleGuideSets
.
Here, we pass our GuideSet
s targeting an exon in the
human gene LTN1 in a named list. Note that there are no available
options for displaying guide labels or guide stacking, and only the PAM
sites are plotted. We will also add a track to monitor the percent GC
content (using a window roughly the length of our protospacers). Not
surprisingly, this AT-rich region has fewer targets for SpCas9 compared
to AsCas12a. (Note: when plotting several GuideSets you may need to
increase the height of the plot space in order for the track names to
appear on the left side; see “Setting plot size” below.)
We first load the precomputed GuideSet
objects:
Plots with many gene isoforms and/or gRNAs may require more space to render than is allotted by your graphical device’s default settings, resulting in an error. One solution, depending on your graphical device, is offered by the grDevices package.
Here is an example using macOS Quartz device:
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] crisprViz_1.9.0 crisprDesign_1.7.2
## [3] crisprBase_1.9.1 BSgenome.Hsapiens.UCSC.hg38_1.4.5
## [5] BSgenome_1.73.1 rtracklayer_1.65.0
## [7] BiocIO_1.17.0 Biostrings_2.75.0
## [9] XVector_0.45.0 GenomicRanges_1.57.2
## [11] GenomeInfoDb_1.41.2 IRanges_2.39.2
## [13] S4Vectors_0.43.2 BiocGenerics_0.53.0
## [15] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.17.1
## [3] sys_3.4.3 jsonlite_1.8.9
## [5] magrittr_2.0.3 GenomicFeatures_1.57.1
## [7] rmarkdown_2.28 zlibbioc_1.51.2
## [9] vctrs_0.6.5 memoise_2.0.1
## [11] Rsamtools_2.21.2 RCurl_1.98-1.16
## [13] base64enc_0.1-3 htmltools_0.5.8.1
## [15] S4Arrays_1.5.11 progress_1.2.3
## [17] AnnotationHub_3.15.0 curl_5.2.3
## [19] SparseArray_1.5.45 Formula_1.2-5
## [21] sass_0.4.9 bslib_0.8.0
## [23] htmlwidgets_1.6.4 basilisk_1.19.0
## [25] Gviz_1.49.0 httr2_1.0.5
## [27] cachem_1.1.0 buildtools_1.0.0
## [29] GenomicAlignments_1.41.0 lifecycle_1.0.4
## [31] pkgconfig_2.0.3 Matrix_1.7-1
## [33] R6_2.5.1 fastmap_1.2.0
## [35] GenomeInfoDbData_1.2.13 MatrixGenerics_1.17.1
## [37] digest_0.6.37 colorspace_2.1-1
## [39] AnnotationDbi_1.69.0 ExperimentHub_2.13.1
## [41] Hmisc_5.2-0 RSQLite_2.3.7
## [43] filelock_1.0.3 randomForest_4.7-1.2
## [45] fansi_1.0.6 httr_1.4.7
## [47] abind_1.4-8 compiler_4.4.1
## [49] Rbowtie_1.45.0 bit64_4.5.2
## [51] backports_1.5.0 htmlTable_2.4.3
## [53] BiocParallel_1.39.0 DBI_1.2.3
## [55] highr_0.11 biomaRt_2.63.0
## [57] rappdirs_0.3.3 DelayedArray_0.31.14
## [59] rjson_0.2.23 tools_4.4.1
## [61] foreign_0.8-87 crisprScore_1.9.3
## [63] nnet_7.3-19 glue_1.8.0
## [65] restfulr_0.0.15 grid_4.4.1
## [67] checkmate_2.3.2 cluster_2.1.6
## [69] generics_0.1.3 gtable_0.3.6
## [71] tzdb_0.4.0 ensembldb_2.29.1
## [73] data.table_1.16.2 hms_1.1.3
## [75] xml2_1.3.6 utf8_1.2.4
## [77] BiocVersion_3.21.1 pillar_1.9.0
## [79] stringr_1.5.1 dplyr_1.1.4
## [81] BiocFileCache_2.15.0 lattice_0.22-6
## [83] deldir_2.0-4 bit_4.5.0
## [85] crisprBowtie_1.9.0 crisprScoreData_1.9.0
## [87] biovizBase_1.55.0 tidyselect_1.2.1
## [89] maketools_1.3.1 knitr_1.48
## [91] gridExtra_2.3 ProtGenerics_1.37.1
## [93] SummarizedExperiment_1.35.5 xfun_0.48
## [95] Biobase_2.67.0 matrixStats_1.4.1
## [97] stringi_1.8.4 UCSC.utils_1.1.0
## [99] lazyeval_0.2.2 yaml_2.3.10
## [101] evaluate_1.0.1 codetools_0.2-20
## [103] interp_1.1-6 tibble_3.2.1
## [105] BiocManager_1.30.25 cli_3.6.3
## [107] rpart_4.1.23 reticulate_1.39.0
## [109] munsell_0.5.1 jquerylib_0.1.4
## [111] dichromat_2.0-0.1 Rcpp_1.0.13
## [113] dir.expiry_1.13.1 dbplyr_2.5.0
## [115] png_0.1-8 XML_3.99-0.17
## [117] parallel_4.4.1 ggplot2_3.5.1
## [119] readr_2.1.5 blob_1.2.4
## [121] basilisk.utils_1.19.0 prettyunits_1.2.0
## [123] jpeg_0.1-10 latticeExtra_0.6-30
## [125] AnnotationFilter_1.31.0 bitops_1.0-9
## [127] txdbmaker_1.1.2 VariantAnnotation_1.51.2
## [129] scales_1.3.0 crayon_1.5.3
## [131] rlang_1.1.4 KEGGREST_1.45.1