--- title: "miRNA affinity models and the KdModel class" author: - name: Pierre-Luc Germain affiliation: - D-HEST Institute for Neuroscience, ETH - Lab of Statistical Bioinformatics, UZH - name: Michael Soutschek affiliation: Lab of Systems Neuroscience, D-HEST Institute for Neuroscience, ETH - name: Fridolin Gross affiliation: Lab of Systems Neuroscience, D-HEST Institute for Neuroscience, ETH package: scanMiR output: BiocStyle::html_document abstract: | This vignettes introduces the KdModel and KdModelList classes used for storing miRNA 12-mer affinities and predicting the dissociation constant of specific sites. vignette: | %\VignetteIndexEntry{2_Kdmodels} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} library(BiocStyle) knitr::opts_chunk$set(crop = NULL) ``` # 12-mer dissociation rates [McGeary, Lin et al. (2019)](https://dx.doi.org/10.1126/science.aav1741) used RNA bind-n-seq (RBNS) to empirically determine the affinities (i.e. dissoiation rates) of selected miRNAs towards random 12-nucleotide sequences (termed 12-mers). As expected, bound sequences typically exhibited complementarity to the miRNA seed region (positions 2-8 from the miRNA's 5' end), but the study also revealed non-canonical bindings and the importance of flanking di-nucleotides. Based on these data, the authors developed a model which predicted 12-mer dissociation rates (KD) based on the miRNA sequence. ScanMiR encodes a compressed version of these prediction in the form of a `KdModel` object. The 12-mer is defined as the 8 nucleotides opposite the miRNA's extended seed region plus flanking dinucleotides on either side: ```{r echo=FALSE, out.width="35%", fig.align = 'center'} knitr::include_graphics(system.file('docs', '12mer.png', package = 'scanMiR')) ``` # KdModels The `KdModel` class contains the information concerning the sequence (12-mer) affinity of a given miRNA, and is meant to compress and make easily manipulable the dissociation constants (Kd) predictions from [McGeary, Lin et al. (2019)](https://dx.doi.org/10.1126/science.aav1741). We can take a look at the example `KdModel`: ```{r} library(scanMiR) data(SampleKdModel) SampleKdModel ``` In addition to the information necessary to predict the binding affinity to any given 12-mer sequence, the model contains, minimally, the name and sequence of the miRNA. Since the `KdModel` class extends the list class, any further information can be stored: ```{r} SampleKdModel$myVariable <- "test" ``` An overview of the binding affinities can be obtained with the following plot: ```{r} plotKdModel(SampleKdModel, what="seeds") ``` The plot gives the -log(Kd) values of the top 7-mers (including both canonical and non-canonical sites), with or without the final "A" vis-à-vis the first miRNA nucleotide. To predict the dissociation constant (and binding type, if any) of a given 12-mer sequence, you can use the `assignKdType` function: ```{r} assignKdType("ACGTACGTACGT", SampleKdModel) # or using multiple sequences: assignKdType(c("CTAGCATTAAGT","ACGTACGTACGT"), SampleKdModel) ``` The log_kd column contains log(Kd) values multiplied by 1000 and stored as an integer (which is more economical when dealing with millions of sites). In the example above, `r (lkd <- assignKdType("CTAGCATTAAGT", SampleKdModel)$log_kd)` means `r lkd/1000`, or a dissociation constant of `r exp(lkd/1000)`. The smaller the values, the stronger the relative affinity. ## KdModelLists A `KdModelList` object is simply a collection of `KdModel` objects. We can build one in the following way: ```{r} # we create a copy of the KdModel, and give it a different name: mod2 <- SampleKdModel mod2$name <- "dummy-miRNA" kml <- KdModelList(SampleKdModel, mod2) kml summary(kml) ``` Beyond operations typically performed on a list (e.g. subsetting), some specific slots of the respective KdModels can be accessed, for example: ```{r} conservation(kml) ``` # Creating a KdModel object `KdModel` objects are meant to be created from a table assigning a log_kd values to 12-mer target sequences, as produced by the CNN from McGeary, Lin et al. (2019). For the purpose of example, we create such a dummy table: ```{r} kd <- dummyKdData() head(kd) ``` A `KdModel` object can then be created with: ```{r} mod3 <- getKdModel(kd=kd, mirseq="TTAATGCTAATCGTGATAGGGGTT", name = "my-miRNA") ``` Alternatively, the `kd` argument can also be the path to the output file of the CNN (and if `mirseq` and `name` are in the table, they can be omitted). # Common KdModel collections The [scanMiRData](https://github.com/ETHZ-INS/scanMiRData) package contains `KdModel` collections corresponding to all human, mouse and rat mirbase miRNAs. # Under the hood When calling `getKdModel`, the dissociation constants are stored as an lightweight overfitted linear model, with base KDs coefficients (stored as integers in `object$mer8`) for each 1024 partially-matching 8-mers (i.e. at least 4 consecutive matching nucleotides) to which are added 8-mer-specific coefficients (stored in `object$fl`) that are multiplied with a flanking score generated by the flanking di-nucleotides. The flanking score is calculated based on the di-nucleotide effects experimentally measured by McGeary, Lin et al. (2019). To save space, the actual 8-mer sequences are not stored but generated when needed in a deterministic fashion. The 8-mers can be obtained, in the right order, with the `getSeed8mers` function.

# Session info {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ```