Title: | Data and functions for dealing with microRNAs |
---|---|
Description: | Different data resources for microRNAs and some functions for manipulating them. |
Authors: | R. Gentleman, S. Falcon |
Maintainer: | "James F. Reid" <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.65.0 |
Built: | 2024-10-30 08:48:53 UTC |
Source: | https://github.com/bioc/microRNA |
This function finds the longest self-hybridizing subsequences present in RNA or DNA sequences.
get_selfhyb_subseq(seq, minlen, type = c("RNA", "DNA")) show_selfhyb_counts(L) show_selfhyb_lengths(L)
get_selfhyb_subseq(seq, minlen, type = c("RNA", "DNA")) show_selfhyb_counts(L) show_selfhyb_lengths(L)
seq |
character vector of RNA or DNA sequences |
minlen |
an integer specifying the minimum length in bases of the
self-hybridizing subsequences. Subsequences with length less than
|
type |
one of |
L |
The output of |
get_selfhyb_subseq
finds the longest self-hybridizing
subsequences of the specified minimum length.
These are defined to be the longest string that is found in both
the input sequence, seq
, and in its reverse complement.
A list with an element for each sequence in seq
. The list will
be named using names(seq)
.
Each element is itself a list with an element for each longest self-hybridizing subsequence (there can be more than one). Each such element is yet another list with components:
starts |
integer vector giving the character start positions for the self-hybridizing subsequence in the sequence. |
rcstarts |
integer vector giving the character start positions for the reverse complement of the self-hybridizing subsequence in the sequence. |
Seth Falcon
seqs = c(a="UGAGGUAGUAGGUUGUAUAGUU", b="UGAGGUAGUAGGUUGUGUGGUU", c="UGAGGUAGUAGGUUGUAUGGUU") ans = get_selfhyb_subseq(seqs, minlen=3, type="RNA") length(ans) ans[["a"]] show_selfhyb_counts(ans) show_selfhyb_lengths(ans)
seqs = c(a="UGAGGUAGUAGGUUGUAUAGUU", b="UGAGGUAGUAGGUUGUGUGGUU", c="UGAGGUAGUAGGUUGUAUGGUU") ans = get_selfhyb_subseq(seqs, minlen=3, type="RNA") length(ans) ans[["a"]] show_selfhyb_counts(ans) show_selfhyb_lengths(ans)
A set of human microRNA sequences.
data(hsSeqs)
data(hsSeqs)
A character vector.
Each sequence represents a different mature human microRNA.
http://microrna.sanger.ac.uk/sequences/index.shtml
miRBase: microRNA sequences, targets and gene nomenclature. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. NAR, 2006, 34, Database Issue, D140-D144
The microRNA Registry. Griffiths-Jones S. NAR, 2004, 32, Database Issue, D109-D111
data(hsSeqs)
data(hsSeqs)
A set of human microRNA names and their corresponding known targets given as ensembl Transcript IDs.
data(hsTargets)
data(hsTargets)
A data frame of microRNAs and their target ensembl IDs as recovered from miRBase. Additional columns are also provided to give the Chromosome as well as the start and end position of the microRNA binding site, and the strand orientation (plus or minus).
Each mapping represents a different human microRNA, paired with one viable target. Other information about where the microRNA binds is also included. Some microRNAs have multiple targets and so some microRNAs may be represented more than once.
http://microrna.sanger.ac.uk/sequences/index.shtml
miRBase: microRNA sequences, targets and gene nomenclature. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. NAR, 2006, 34, Database Issue, D140-D144
The microRNA Registry. Griffiths-Jones S. NAR, 2004, 32, Database Issue, D109-D111
data(hsTargets)
data(hsTargets)
Given an input set of seed regions and a set of sequences all locations of the seed regions (exact matches) within the sequences are found.
matchSeeds(seeds, seqs)
matchSeeds(seeds, seqs)
seeds |
The seeds, or short sequences, to match. |
seqs |
The sequences to find matches in. |
We presume that the problem is an exact matching problem and that
all sequences are in the correct orientation for that. If, for
example, you start with seed regions from a microRNA (for seeds
)
and 3'UTR sequences (for seqs
), then you would want to reverse
complement one of the two sequences. And make sure all sequences
are either DNA or RNA.
Names from either seeds
or seqs
are propogated, as much
as is possible.
A list containing one entry for each element of seeds
that had
at least one match in one entry of seqs
. Each element of this list
is a named vector containing the elements of seqs
that the corresponding
seed has an exact match in.
R. Gentleman
library(Biostrings) data(hsSeqs) data(s3utr) hSeedReg = seedRegions(hsSeqs) comphSeed = as.character(reverseComplement(RNAStringSet(hSeedReg))) comph = RNA2DNA(comphSeed) mx = matchSeeds(comph, s3utr)
library(Biostrings) data(hsSeqs) data(s3utr) hSeedReg = seedRegions(hsSeqs) comphSeed = as.character(reverseComplement(RNAStringSet(hSeedReg))) comph = RNA2DNA(comphSeed) mx = matchSeeds(comph, s3utr)
A set of mouse microRNA sequences.
data(mmSeqs)
data(mmSeqs)
A character vector.
Each sequence represents a different mature mouse microRNA.
http://microrna.sanger.ac.uk/sequences/index.shtml
miRBase: microRNA sequences, targets and gene nomenclature. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. NAR, 2006, 34, Database Issue, D140-D144
The microRNA Registry. Griffiths-Jones S. NAR, 2004, 32, Database Issue, D109-D111
data(mmSeqs)
data(mmSeqs)
A set of mouse microRNA names and their corresponding known targets given as ensembl Transcript IDs.
data(mmTargets)
data(mmTargets)
A data frame of microRNAs and their target ensembl IDs as recovered from miRBase. Additional columns are also provided to give the Chromosome as well as the start and end position of the microRNA binding site, and the strand orientation (plus or minus).
Each mapping represents a different mouse microRNA, paired with one viable target. Other information about where the microRNA binds is also included. Some microRNAs have multiple targets and so some microRNAs may be represented more than once.
http://microrna.sanger.ac.uk/sequences/index.shtml
miRBase: microRNA sequences, targets and gene nomenclature. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. NAR, 2006, 34, Database Issue, D140-D144
The microRNA Registry. Griffiths-Jones S. NAR, 2004, 32, Database Issue, D109-D111
data(mmTargets)
data(mmTargets)
RNA and DNA differ in that RNA uses uracil (U) and DNA uses thiamine (T), this function translates an RNA sequence into a DNA sequence by translating the characters.
RNA2DNA(x)
RNA2DNA(x)
x |
A valid RNA sequence. |
No checking for validity of sequence is made, and the input sequence is translated to upper case.
A character vector, of the same length as x
where all characters
are in upper case, and any instance of U
in x
is replaced
by a T
.
R. Gentleman
input = c("AUCG", "uuac") RNA2DNA(input)
input = c("AUCG", "uuac") RNA2DNA(input)
A vector of 3' UTR sequence data, the names correspond to Entrez Gene IDs and the data were extracted using biomaRt.
data(s3utr)
data(s3utr)
A character vector, the values are the 3' UTR for a set of genes, the names are Entrez Gene Identifiers.
The data were downloaded using the getSequence
function in the
biomaRt package and duplicate strings removed. There remain some duplicated
Entrez IDs but the reported 3' UTRs are different.
data(s3utr)
data(s3utr)
The seed region of a microRNA consists of a set of nucleotides at the 5' end of the microRNA, typically bases 2 through 7, although some times 8 is used.
seedRegions(x, start = 2, stop = 7)
seedRegions(x, start = 2, stop = 7)
x |
A vector of microRNA sequences. |
start |
The start locations, can be a vector. |
stop |
The stop locations, can be a vector. |
We use substr
to extract these sequences.
A vector of the same length as x
with the substrings.
R. Gentleman
data(hsSeqs) seedRegions(hsSeqs[1:5]) seedRegions(hsSeqs[1:3], start=c(2,1,2), stop=c(8,7,9))
data(hsSeqs) seedRegions(hsSeqs[1:5]) seedRegions(hsSeqs[1:3], start=c(2,1,2), stop=c(8,7,9))