Title: | scanMiR |
---|---|
Description: | A set of tools for working with miRNA affinity models (KdModels), efficiently scanning for miRNA binding sites, and predicting target repression. It supports scanning using miRNA seeds, full miRNA sequences (enabling 3' alignment) and KdModels, and includes the prediction of slicing and TDMD sites. Finally, it includes utility and plotting functions (e.g. for the visual representation of miRNA-target alignment). |
Authors: | Pierre-Luc Germain [cre, aut] , Michael Soutschek [aut], Fridolin Gross [aut] |
Maintainer: | Pierre-Luc Germain <[email protected]> |
License: | GPL-3 |
Version: | 1.13.0 |
Built: | 2024-10-31 04:35:19 UTC |
Source: | https://github.com/bioc/scanMiR |
Aggregates miRNA binding sites with log_kd values to predict transcript repression. See the vignette for more detail.
aggregateMatches( m, a = 0.007726, b = 0.5735, c = 0.181, p3 = 0.051, coef_utr = 0, coef_orf = 0, p3.range = c(3L, 8L), keepSiteInfo = TRUE, toInt = FALSE, BP = NULL )
aggregateMatches( m, a = 0.007726, b = 0.5735, c = 0.181, p3 = 0.051, coef_utr = 0, coef_orf = 0, p3.range = c(3L, 8L), keepSiteInfo = TRUE, toInt = FALSE, BP = NULL )
m |
A GRanges or data.frame of matches as returned by 'findSeedMatches'. |
a |
The relative concentration of unbound AGO-miRNA complexes. |
b |
Factor specifying the additional repression by a single bound AGO. |
c |
Penalty for sites that are found within the ORF region. |
p3 |
Factor specifying additional repression due to 3p alignment. |
coef_utr |
Factor specifying additional repression due to UTR length. |
coef_orf |
Factor specifying additional repression due to ORF length. |
p3.range |
Range used for 3p alignment. |
keepSiteInfo |
Logical; whether to return information about site types (default = TRUE). Ignored if 'm' does not contain 'log_kd' values |
toInt |
Logical; whether to convert repression scores to integers (default = FALSE). |
BP |
Pass 'BiocParallel::MulticoreParam(ncores, progressbar=TRUE)' to
enable multithreading. Note that in addition, 'aggregateMatches' uses the
data.table package, which is often set to use multi-threading by
default (which would be multiplied by threads determined by 'BP'). See
|
a data.frame containing aggregated repression values and/or information about the numbers and types of matches
# we create mock RNA sequences and seeds: seqs <- getRandomSeq(n=10) # load sample KdModel data(SampleKdModel) # find matches matches <- findSeedMatches(seqs, SampleKdModel) # aggregate matches aggregateMatches(matches)
# we create mock RNA sequences and seeds: seqs <- getRandomSeq(n=10) # load sample KdModel data(SampleKdModel) # find matches matches <- findSeedMatches(seqs, SampleKdModel) # aggregate matches aggregateMatches(matches)
Assigns a log_kd and match type to a set of matched sequences.
assignKdType(x, mod, mer8 = NULL)
assignKdType(x, mod, mer8 = NULL)
x |
A vector of matched sequences, each of 12 nucleotides |
mod |
An object of class 'KdModel' |
mer8 |
The optional set of 8mers included in the model (for internal use; can be reconstructed from the model). |
A data.frame with one row for each element of 'x', and the columns 'type' and 'log_kd'. To save space, the reported log_kd is multiplied by 1000, rounded and saved as an integer.
data(SampleKdModel) assignKdType(c("CTAGCATTAAGT","ACGTACGTACGT"), SampleKdModel)
data(SampleKdModel) assignKdType(c("CTAGCATTAAGT","ACGTACGTACGT"), SampleKdModel)
conservation
conservation(x)
conservation(x)
x |
A KdModelList, or a KdModel |
A vector of the conservation status for each miRNA
data(SampleKdModel) conservation(SampleKdModel)
data(SampleKdModel) conservation(SampleKdModel)
Create dummy log_kd per 12-mer data
dummyKdData(mod = NULL)
dummyKdData(mod = NULL)
mod |
Optional model from which to create the dummy data |
A data.frame with 12-mers and log_kds
kd <- dummyKdData()
kd <- dummyKdData()
'findSeedMatches' takes a set of sequences and a set of miRNAs (given either
as target seeds, mature miRNA sequences, or a KdModelList
).
findSeedMatches( seqs, seeds, shadow = 0L, onlyCanonical = FALSE, maxLogKd = c(-1, -1.5), keepMatchSeq = FALSE, minDist = 7L, p3.extra = FALSE, p3.params = list(maxMirLoop = 7L, maxTargetLoop = 9L, maxLoopDiff = 4L, mismatch = TRUE, GUwob = TRUE), agg.params = .defaultAggParams(), ret = c("GRanges", "data.frame", "aggregated"), BP = NULL, verbose = NULL, n_seeds = NULL, useTmpFiles = FALSE, keepTmpFiles = FALSE )
findSeedMatches( seqs, seeds, shadow = 0L, onlyCanonical = FALSE, maxLogKd = c(-1, -1.5), keepMatchSeq = FALSE, minDist = 7L, p3.extra = FALSE, p3.params = list(maxMirLoop = 7L, maxTargetLoop = 9L, maxLoopDiff = 4L, mismatch = TRUE, GUwob = TRUE), agg.params = .defaultAggParams(), ret = c("GRanges", "data.frame", "aggregated"), BP = NULL, verbose = NULL, n_seeds = NULL, useTmpFiles = FALSE, keepTmpFiles = FALSE )
seqs |
A character vector or 'DNAStringSet' of DNA sequences in which to look. |
seeds |
A character vector of 7-nt seeds to look for. If RNA, will be reversed and complemented before matching. If DNA, they are assumed to be the target sequence to look for. Alternatively, a list of objects of class 'KdModel' or an object of class 'KdModelList' can be given. |
shadow |
Integer giving the shadow, i.e. the number of nucleotides hidden at the beginning of the sequence (default 0). |
onlyCanonical |
Logical; whether to restrict the search only to canonical binding sites. |
maxLogKd |
Maximum log_kd value to keep. This has a major impact on the number of sites returned, and hence on the memory requirements. Set to Inf to disable (_not_ recommended when running large scans!). |
keepMatchSeq |
Logical; whether to keep the sequence (including flanking dinucleotides) for each seed match (default FALSE). |
minDist |
Integer specifying the minimum distance between matches of the same miRNA (default 7). Closer matches will be reduced to the highest-affinity. To disable the removal of overlapping features, use 'minDist=-Inf'. |
p3.extra |
Logical; whether to keep extra information about 3' alignment. Disable (default) this when running large scans, otherwise you might hit your system's memory limits. |
p3.params |
Named list of parameters for 3' alignment with slots 'maxMirLoop' (integer, default = 7), 'maxTargetLoop' (integer, default = 9), 'maxLoopDiff' (integer, default = 4), 'mismatch' (logical, default = TRUE) and 'GUwob' (logical, default = TRUE). |
agg.params |
A named list with slots 'a', 'b', 'c', 'p3', 'coef_utr', 'coef_orf' and 'keepSiteInfo' indicating the parameters for the aggregation. Ignored if 'ret!="aggregated"'. For further details see documentation of 'aggregateMatches'. |
ret |
The type of data to return, either "GRanges" (default), "data.frame", or "aggregated" (aggregates affinities/sites for each seed-transcript pair). |
BP |
Pass 'BiocParallel::MulticoreParam(ncores, progressbar=TRUE)' to enable multithreading. |
verbose |
Logical; whether to print additional progress messages (default on if not multithreading) |
n_seeds |
Integer; the number of seeds that are processed in parallel to avoid memory issues. |
useTmpFiles |
Logical; whether to write results for single miRNAs in temporary files (ignored when scanning for a single seed). Alternatively, 'useTmpFiles' can be a character vector of length 1 indicating the path to the directory in which to write temporary files. |
keepTmpFiles |
Logical; whether to keep the temporary files at the end of the process; ignored if 'useTmpFiles=FALSE'. Temporary files are removed only upon successful completion of the function, meaning that they will not be deleted in case of errors. |
A GRanges of all matches. If 'seeds' is a 'KdModel' or 'KdModelList', the 'log_kd' column will report the ln(Kd) multiplied by 1000, rounded and saved as an integer. If 'ret!="GRanges', returns a data.frame.
# we create mock RNA sequences and seeds: seqs <- getRandomSeq(n=10) seeds <- c("AAACCAC", "AAACCUU") findSeedMatches(seqs, seeds)
# we create mock RNA sequences and seeds: seqs <- getRandomSeq(n=10) seeds <- c("AAACCAC", "AAACCUU") findSeedMatches(seqs, seeds)
Performs a local alignment of the miRNA 3' sequence (determined by 'mir3p.start') on given the given sequences.
get3pAlignment( seqs, mirseq, mir3p.start = 9L, allow.mismatch = TRUE, maxMirLoop = 7L, maxTargetLoop = 9L, maxLoopDiff = 4L, TGsub = TRUE, siteType = NULL )
get3pAlignment( seqs, mirseq, mir3p.start = 9L, allow.mismatch = TRUE, maxMirLoop = 7L, maxTargetLoop = 9L, maxLoopDiff = 4L, TGsub = TRUE, siteType = NULL )
seqs |
A set of sequences in which to look for 3' matches (i.e. upstream of the seed match) |
mirseq |
The sequence of the mature miRNA |
mir3p.start |
The position in 'mirseq' in which to start looking |
allow.mismatch |
Logical; whether to allow mismatches |
maxMirLoop |
Maximum miRNA loop size |
maxTargetLoop |
Maximum target loop size |
maxLoopDiff |
Maximum size difference between miRNA and target loops |
TGsub |
Logical; whether to allow T/G substitutions. |
siteType |
The optional type of seed-complementarity, as returned by
|
A data.frame with one row for each element of 'seqs', indicating the size of the miRNA bulge, the size of the target mRNA bulge, the number of mismatches at the 3' end, and the partial 3' alignment score (i.e. roughly the number of consecutive matching nucleotides)
get3pAlignment(seqs="NNAGTGTGCCATNN", mirseq="TGGAGTGTGACAATGGTGTTTG")
get3pAlignment(seqs="NNAGTGTGCCATNN", mirseq="TGGAGTGTGACAATGGTGTTTG")
Returns the minimum and maximum 8-mer log-kd values
get8merRange(mod)
get8merRange(mod)
mod |
A 'KdModel' |
A numeric vector of length two
data("SampleKdModel") get8merRange(SampleKdModel)
data("SampleKdModel") get8merRange(SampleKdModel)
getKdModel
getKdModel(kd, mirseq = NULL, name = NULL, conservation = NA_integer_, ...)
getKdModel(kd, mirseq = NULL, name = NULL, conservation = NA_integer_, ...)
kd |
A data.frame containing the log_kd per 12-mer sequence, or the path to a text/csv file containing such a table. Should contain the columns 'log_kd', '12mer' (or 'X12mer'), and eventually 'mirseq' (if the 'mirseq' argument is NULL) and 'mir' (if the 'name' argument is NULL). |
mirseq |
The miRNA (cDNA) sequence. |
name |
The name of the miRNA. |
conservation |
The conservation level of the miRNA. See 'scanMiR:::.conservation_levels()' for possible values. |
... |
Any additional information to be saved with the model. |
An object of class 'KdModel'.
kd <- dummyKdData() mod <- getKdModel(kd=kd, mirseq="TTAATGCTAATCGTGATAGGGGTT", name="my-miRNA")
kd <- dummyKdData() mod <- getKdModel(kd=kd, mirseq="TTAATGCTAATCGTGATAGGGGTT", name="my-miRNA")
Returns all combinations of 'n' elements of 'from'
getKmers(n = 4, from = c("A", "C", "G", "T"))
getKmers(n = 4, from = c("A", "C", "G", "T"))
n |
Number of elements |
from |
Letters sampled |
A character vector
getKmers(3)
getKmers(3)
Given a seed and a set of sequences matching it, returns the type of match.
getMatchTypes(x, seed, checkWobble = TRUE)
getMatchTypes(x, seed, checkWobble = TRUE)
x |
A character vector of short sequences. |
seed |
A 7 or 8 nucleotides string indicating the seed (5' to 3' sequence of the target RNA). If of length 7, an "A" will be appended. |
checkWobble |
Whether to flag wobbled sites |
A factor of match types.
x <- c("AACACTCCAG","GACACTCCGC","GTACTCCAT","ACGTACGTAC") getMatchTypes(x, seed="ACACTCCA")
x <- c("AACACTCCAG","GACACTCCGC","GTACTCCAT","ACGTACGTAC") getMatchTypes(x, seed="ACACTCCA")
Produces a random sequence of the given letters
getRandomSeq(length = 3000, alphabet = c("A", "C", "G", "T"), n = 1)
getRandomSeq(length = 3000, alphabet = c("A", "C", "G", "T"), n = 1)
length |
Length of the sequence |
alphabet |
Letters from which to sample |
n |
The number of sequences to generate |
A character vector of length 1
getRandomSeq(100)
getRandomSeq(100)
Generates all possible 8mers with 4 consecutive and positioned matches to a given seed.
getSeed8mers(seed, addNs = FALSE)
getSeed8mers(seed, addNs = FALSE)
seed |
The miRNA seed (target DNA sequence), a character vector of length 8 (if of length 7, a "A" will be added on the right) |
addNs |
Logical; whether to include 8mers with one flanking N |
A vector of 1024 8mers.
head(getSeed8mers("ACACTCCA"))
head(getSeed8mers("ACACTCCA"))
Methods for the KdModel
class
## S4 method for signature 'KdModel' show(object) ## S4 method for signature 'KdModel' summary(object) ## S4 method for signature 'KdModel' c(x, ...)
## S4 method for signature 'KdModel' show(object) ## S4 method for signature 'KdModel' summary(object) ## S4 method for signature 'KdModel' c(x, ...)
object , x , ...
|
An object of class |
Depends on the method.
data(SampleKdModel) SampleKdModel summary(SampleKdModel)
data(SampleKdModel) SampleKdModel summary(SampleKdModel)
KdModelList
KdModelList(..., description = NULL, makeUnique = FALSE)
KdModelList(..., description = NULL, makeUnique = FALSE)
... |
Any number of |
description |
A description for the collection. |
makeUnique |
Logical; whether to rename models if names are duplicated. |
A KdModelList
data(SampleKdModel) mods <- KdModelList(SampleKdModel, SampleKdModel, makeUnique = TRUE) mods
data(SampleKdModel) mods <- KdModelList(SampleKdModel, SampleKdModel, makeUnique = TRUE) mods
KdModelList
classesMethods for the KdModelList
classes
## S4 method for signature 'KdModelList' summary(object) ## S4 method for signature 'KdModelList,ANY' x[i, j = NULL, ..., drop = TRUE]
## S4 method for signature 'KdModelList' summary(object) ## S4 method for signature 'KdModelList,ANY' x[i, j = NULL, ..., drop = TRUE]
object , x
|
An object of class |
i |
the index of item(s) to select |
j , drop , ...
|
ignored |
Depends on the method.
# create a KdModelList : data(SampleKdModel) kml <- KdModelList( SampleKdModel, SampleKdModel, makeUnique=TRUE ) summary(kml) kml[1] # returns a KdModelList kml[[2]] # returns a KdModel conservation(kml)
# create a KdModelList : data(SampleKdModel) kml <- KdModelList( SampleKdModel, SampleKdModel, makeUnique=TRUE ) summary(kml) kml[1] # returns a KdModelList kml[[2]] # returns a KdModel conservation(kml)
Plots the summary of an affinity model.
plotKdModel(mod, what = c("both", "seeds", "logo"), n = 10)
plotKdModel(mod, what = c("both", "seeds", "logo"), n = 10)
mod |
A 'KdModel' |
what |
Either 'seeds', 'logo', or 'both' (default). |
n |
The number of top 7-mers to plot (when ‘what=’seeds'') |
‘what=’seeds'' plots the -$log(K_d)$ values of the top 'n' 7-mers (including both canonical and non-canonical sites), with or without the final "A" vis-a-vis the first miRNA nucleotide. ‘what=’logo'' plots a 'seqLogo' (requires the [seqLogo]https://bioconductor.org/packages/release/bioc/html/seqLogo.html package) showing the nucleotide-wise information content and preferences for all 12-mers (centered around the seed, oriented in the direction of the target mRNA). 'what="both"' plots both. Note that if the package 'ggseqlogo' is installed, this will be used instead to plot the logo, resulting in more detailed plot annotation.
If 'what="logo"', returns nothing and plots a position weight matrix. Otherwise returns a ggplot.
data(SampleKdModel) plotKdModel(SampleKdModel, what="seeds")
data(SampleKdModel) plotKdModel(SampleKdModel, what="seeds")
Removes elements from a GRanges that overlap (or are within a given distance of) other elements higher up in the list (i.e. assumes that the ranges are sorted in order of priority). The function handles overlaps between more than two ranges by successively removing those that overlap higher-priority ones.
removeOverlappingRanges( x, minDist = 7L, retIndices = FALSE, ignore.strand = FALSE )
removeOverlappingRanges( x, minDist = 7L, retIndices = FALSE, ignore.strand = FALSE )
x |
A GRanges, sorted by (decreasing) importance. |
minDist |
Minimum distance between ranges. |
retIndices |
Logical; whether to return the indices of entries to remove, rather than the filtered GRanges. |
ignore.strand |
Logical. Whether the strand of the input ranges should be ignored or not. |
A filtered GRanges, or an integer vector of indices to be removed if 'retIndices==TRUE'.
library(GenomicRanges) gr <- GRanges(seqnames=rep("A",4), IRanges(start=c(10,25,45,35), width=6)) removeOverlappingRanges(gr, minDist=7)
library(GenomicRanges) gr <- GRanges(seqnames=rep("A",4), IRanges(start=c(10,25,45,35), width=6)) removeOverlappingRanges(gr, minDist=7)
'KdModel' for hsa-miR-155-5p, based on Kd predictions from the CNN of [McGeary, Lin et al. (2019)](https://dx.doi.org/10.1126/science.aav1741).
a 'KdModel' object
data(SampleKdModel) SampleKdModel
data(SampleKdModel) SampleKdModel
An artificial transcript sequence used for examples.
a named character vector of length 1.
viewTargetAlignment
viewTargetAlignment( m, miRNA, seqs = NULL, flagBulgeMatches = FALSE, p3.params = list(), min3pMatch = 3L, hideSingletons = FALSE, UGsub = TRUE, ..., outputType = c("print", "data.frame", "plot", "ggplot") )
viewTargetAlignment( m, miRNA, seqs = NULL, flagBulgeMatches = FALSE, p3.params = list(), min3pMatch = 3L, hideSingletons = FALSE, UGsub = TRUE, ..., outputType = c("print", "data.frame", "plot", "ggplot") )
m |
A GRanges of length 1 giving the information for a given match, as
produced by |
miRNA |
A miRNA sequence, or a |
seqs |
The sequences corresponding to the seqnames of 'm'. Not needed if 'm' contains the target sequences. |
flagBulgeMatches |
Logical; whether to flag matches inside the bulge (default FALSE) |
p3.params |
See |
min3pMatch |
The minimum 3' alignment for any to be plotted |
hideSingletons |
Logical; whether to hide isolated single base-pair matches |
UGsub |
Logical; whether to show U-G matches |
... |
Passed to 'text' if 'outputType="plot"'. |
outputType |
Either 'print' (default, prints to console), 'data.frame', or 'plot'. |
Returns nothing 'outputType="print"'. If 'outputType="data.frame"', returns a data.frame containing the alignment strings; if 'outputType="ggplot"' returns a 'ggplot' object.
data(SampleKdModel) seq <- c(seq1="CGACCCCTATCACGTCCGCAGCATTAAAT") m <- findSeedMatches(seq, SampleKdModel, verbose=FALSE) viewTargetAlignment(m, miRNA=SampleKdModel, seqs=seq)
data(SampleKdModel) seq <- c(seq1="CGACCCCTATCACGTCCGCAGCATTAAAT") m <- findSeedMatches(seq, SampleKdModel, verbose=FALSE) viewTargetAlignment(m, miRNA=SampleKdModel, seqs=seq)