Title: | An R interface to the Rfam database |
---|---|
Description: | rfaRm provides a client interface to the Rfam database of RNA families. Data that can be retrieved include RNA families, secondary structure images, covariance models, sequences within each family, alignments leading to the identification of a family and secondary structures in the dot-bracket format. |
Authors: | Lara Selles Vidal, Rafael Ayala, Guy-Bart Stan, Rodrigo Ledesma-Amaro |
Maintainer: | Lara Selles Vidal <[email protected]>, Rafael Ayala <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-11-19 04:11:25 UTC |
Source: | https://github.com/bioc/rfaRm |
Retrieves the consensus secondary structure and sequence of the specified Rfam family. The notation of the secondary structure can be specified through the "format" argument. If a filename is provided, the consensus secondary structure and sequence will be saved in the specified format.
rfamConsensusSecondaryStructure(rfamFamily, filename=NULL, format="DB")
rfamConsensusSecondaryStructure(rfamFamily, filename=NULL, format="DB")
rfamFamily |
string with an Rfam family accession or ID for which the consensus secondary structure and sequence should be retrieved. |
filename |
string indicating a path to which the retrieved consensus secondary structure and sequence should be written. The notation used for the RNA secondary structure is determined by the "format" argument. In the default behaviour, filename=NULL, and no file will be written. Instead, the consensus secondary structure and sequence will just be returned as a character vector. |
format |
string indicating the notation to be used for the RNA secondary
structure. It can be either "DB" (extended Dot-Bracket notation; default) or
"WUSS" (Washington University Secondary Structure notation).For a description
of the different notations of RNA secondary structure, see |
A character vector where the first element is the consensus sequence of the Rfam family, and the second element is the consensus RNA secondary structure in the specified format.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/rna_structure_notations.html
# Generate a character vector with the consensus secondary structure and # sequence for the Rfam family with ID "FMN", with extended Dot-Bracket # notation. rfamConsensusSecondaryStructure("FMN", format="DB") # Generate a character vector with the consensus secondary structure and # sequence for the Rfam family with accession "RF00174", with extended # Dot-Bracket notation, and save it to a file. rfamConsensusSecondaryStructure("RF00174", filename="RF00174consensusSS.txt", format="DB") # Generate a character vector with the consensus secondary structure and # sequence for the Rfam family with accession "RF00050", with WUSS notation, and # save it to a file. rfamConsensusSecondaryStructure("RF00050", filename="RF00050consensusSS.txt", format="WUSS")
# Generate a character vector with the consensus secondary structure and # sequence for the Rfam family with ID "FMN", with extended Dot-Bracket # notation. rfamConsensusSecondaryStructure("FMN", format="DB") # Generate a character vector with the consensus secondary structure and # sequence for the Rfam family with accession "RF00174", with extended # Dot-Bracket notation, and save it to a file. rfamConsensusSecondaryStructure("RF00174", filename="RF00174consensusSS.txt", format="DB") # Generate a character vector with the consensus secondary structure and # sequence for the Rfam family with accession "RF00050", with WUSS notation, and # save it to a file. rfamConsensusSecondaryStructure("RF00050", filename="RF00050consensusSS.txt", format="WUSS")
Gets the covariance model generated with the Infernal software of the Rfam family. Covariance models can be used in Infernal to search through a list of sequences for high-scoring hits to the models. If a filename is provided, the covariance model will be saved to the specified path.
rfamCovarianceModel(rfamFamily, filename=NULL)
rfamCovarianceModel(rfamFamily, filename=NULL)
rfamFamily |
string with an Rfam family accession or ID for which the covariance model should be retrieved. |
filename |
string indicating a path to which the covariance model should be written. In the default behaviour, filename=NULL, the covariance model will not be saved into any file, and instead will only be returned as a string. |
A string with the covariance model of the Rfam family in the Infernal format. If a filename is provided, the covariance model file will also be written to the specified path.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
Eric P Nawrocki, Sean R Eddy, Infernal 1.1: 100-fold faster RNA homology searches, Bioinformatics, Volume 29, Issue 22, 15 November 2013, Pages 2933–2935, https://doi.org/10.1093/bioinformatics/btt509
https://docs.rfam.org/en/latest/api.html
http://eddylab.org/infernal/Userguide.pdf
# Retrieve the covariance model for the Rfam family with ID "FMN" and save it # to a file rfamCovarianceModel("FMN", filename="FMNcovarianceModel.cm") # Retrieve the covariance model for the Rfam family with accession "RF00174" and # save it to a file rfamCovarianceModel("RF00174", filename="RF00174covarianceModel.cm")
# Retrieve the covariance model for the Rfam family with ID "FMN" and save it # to a file rfamCovarianceModel("FMN", filename="FMNcovarianceModel.cm") # Retrieve the covariance model for the Rfam family with accession "RF00174" and # save it to a file rfamCovarianceModel("RF00174", filename="RF00174covarianceModel.cm")
Gets the Rfam family ID corresponding to the provided Rfam family accession.
rfamFamilyAccessionToID(rfamFamilyAccession)
rfamFamilyAccessionToID(rfamFamilyAccession)
rfamFamilyAccession |
string with the Rfam family accession to be converted to an Rfam family ID. |
A string with the Rfam family ID corresponding to the provided Rfam family accesion. The Rfam family ID usually describes the type of RNA associated to the Rfam family.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Get the Rfam family ID corresponding to the Rfam accession RF00050 rfamFamilyAccessionToID("RF00050")
# Get the Rfam family ID corresponding to the Rfam accession RF00050 rfamFamilyAccessionToID("RF00050")
Gets the Rfam family accession corresponding to the provided Rfam family ID.
rfamFamilyIDToAccession(rfamFamilyID)
rfamFamilyIDToAccession(rfamFamilyID)
rfamFamilyID |
string with the Rfam family ID to be converted to an Rfam family accession. |
A string with the Rfam family accession corresponding to the provided Rfam family ID.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Get the Rfam family accession corresponding to the Rfam family ID "FMN"" rfamFamilyIDToAccession("FMN")
# Get the Rfam family accession corresponding to the Rfam family ID "FMN"" rfamFamilyIDToAccession("FMN")
Gets a summary describing an Rfam family. The summary includes information regarding the number of sequences and species contained in the family, a brief description about the function of the family and the corresponding type of RNA.
rfamFamilySummary(rfamFamily)
rfamFamilySummary(rfamFamily)
rfamFamily |
string with an Rfam family accession or ID for which a descriptive summary should be retrieved. |
A list containing the following elements that describe the Rfam family:
rfamReleaseNumber |
Version of Rfam used in the query |
numberSequencesSeedAlignment |
Number of sequences used in the seed alignment of the Rfam family |
sourceSeedAlignment |
Published reference with the seed alignment of the Rfam family |
numberSpecies |
Number of species containing in their genomes sequences that belong to the Rfam family |
RNAType |
Keywords describing the type of RNA corresponding to the Rfam family |
numberSequencesAll |
Total number of sequences included in the Rfam family |
structureSource |
Description of the source for the secondary structure of the family (published, predicted) and PMID of the corresponding publication or prediction method if applicable |
description |
Descriptive name of the RNA included in the Rfam family |
rfamAccession |
Accession of the Rfam family |
rfamID |
ID of the Rfam family |
comment |
Short paragraph describing the characteristics and biological role of the Rfam family |
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Get a summary for the Rfam family with ID "FMN"" rfamFamilySummary("FMN") # Get a summary for the Rfam family with accession "RF00174"" rfamFamilySummary("RF00174") # Get a brief paragraph describing the Rfam family with accession "RF00174"" rfamFamilySummary("RF00174")$comment
# Get a summary for the Rfam family with ID "FMN"" rfamFamilySummary("FMN") # Get a summary for the Rfam family with accession "RF00174"" rfamFamilySummary("RF00174") # Get a brief paragraph describing the Rfam family with accession "RF00174"" rfamFamilySummary("RF00174")$comment
Retrieves entries of the PDB database with the experimentally solved 3D structure of members of the specified Rfam family, with correspondances between residues of the PDB structure and positions in the covariance model of the Rfam family. If a filename is provided, the list of PDB entries will be saved to the path specified through filename as a tab-delimited file.
rfamPDBMapping(rfamFamily, filename=NULL)
rfamPDBMapping(rfamFamily, filename=NULL)
rfamFamily |
string with an Rfam family accession or ID for which the matching PDB entries should be retrieved. |
filename |
string indicating a path to which the retrieved PDB entries should be written as a tab-delimited file. In the default behaviour, filename=NULL, and no file will be written. Instead, the PDB entries will just be returned as a data frame. |
A data frame with PDB entries with experimentally solved 3D structures of members of the specified Rfam family, as well as correspondances between residues of the PDB structures and positions of the covariance model of the Rfam family. The data frame contains the following columns:
Rfam accession |
Rfam accession of the Rfam family |
PDB ID |
ID of the PDB entry with a 3D structure of a member of the Rfam family |
Chain |
ID of the chain of the PDB entry that contains the coordinates for the 3D structure of a member of the Rfam family |
PDB start residue |
First residue of the 3D structure that can be matched to the sequence of a member of the Rfam family |
PDB end residue |
Last residue of the 3D structure that can be matched to the sequence of a member of the Rfam family |
CM start position |
First position of the covariance model of the Rfam family that can be matched to the 3D structure |
CM end position |
Last position of the covariance model of the Rfam family that can be matched to the 3D structure |
eValue |
Expectation value for the alignment of the sequence of the 3D structure to the Rfam family |
Bit score |
Bit score for the alignment of the sequence of the 3D structure to the Rfam family |
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Generate a data frame with the PDB structures of members of the Rfam family # with ID "FMN" rfamPDBMapping("FMN") # Generate a data frame with the PDB structures of members of the Rfam family # with accession "RF00174", and save the resulting PDB entries as a # tab-delimited text file. rfamPDBMapping("RF00174", filename="RF00174PDBentries.txt")
# Generate a data frame with the PDB structures of members of the Rfam family # with ID "FMN" rfamPDBMapping("FMN") # Generate a data frame with the PDB structures of members of the Rfam family # with accession "RF00174", and save the resulting PDB entries as a # tab-delimited text file. rfamPDBMapping("RF00174", filename="RF00174PDBentries.txt")
Plots a diagram of the specified type of the secondary structure of an Rfam family. If a filename is provided, the diagram will be saved to the specified path.
rfamSecondaryStructurePlot(rfamFamily, filename=NULL, plotType="norm")
rfamSecondaryStructurePlot(rfamFamily, filename=NULL, plotType="norm")
rfamFamily |
string with an Rfam family accession or ID for which the secondary structure should be plotted. |
filename |
string indicating a path to which the plotted diagram should be written. In the default behaviour, filename=NULL, the diagram will not be saved into any file, and instead will only be plotted on R's graphics display. Must be a file type supported by magick, such as PNG, JPEG, GIF, RGB or RGBA. |
plotType |
string indicating the desired type of secondary structure
diagram. Can be one of "cons", "fcbp", "cov", "ent", "maxcm", "norm"
(default), "rchie", "rscape" or "rscape-cyk". For a description of each type
of diagram, see |
A magick image object with the requested diagram. If a filename was specified, the diagram plot will also be saved to the specified path.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Plot a normal secondary structure diagram of the Rfam family with ID "FMN" rfamSecondaryStructurePlot("FMN") # Save a basepair conservation secondary structure diagram of the Rfam family # with accession "RF00174" as a PNG file. rfamSecondaryStructurePlot("RF00174", filename="RF00174bpcons.png", plotType="fcbp")
# Plot a normal secondary structure diagram of the Rfam family with ID "FMN" rfamSecondaryStructurePlot("FMN") # Save a basepair conservation secondary structure diagram of the Rfam family # with accession "RF00174" as a PNG file. rfamSecondaryStructurePlot("RF00174", filename="RF00174bpcons.png", plotType="fcbp")
Gets a string corresponding to an SVG file (XML format) with a representation of the secondary structure of the Rfam family, which can be read by rsvg or magick. If a filename is provided, the SVG file will also be written to the specified path.
rfamSecondaryStructureXMLSVG(rfamFamily, filename=NULL, plotType="norm")
rfamSecondaryStructureXMLSVG(rfamFamily, filename=NULL, plotType="norm")
rfamFamily |
string with an Rfam family accession or ID for which a SVG file with a representation of the secondary structure should be retrieved. |
filename |
string indicating a path to which the SVG graphics should be written. In the default behaviour, filename=NULL, the SVG will not be written to any file, and instead will only be returned as a character string, which can be directly read by R packages such as rsvg and magick. |
plotType |
string indicating the desired type of secondary structure
diagram. Can be one of "cons", "fcbp", "cov", "ent", "maxcm", "norm", "rchie",
"rscape" or "rscape-cyk". For a description of each type of diagram, see |
A string with the contents of a SVG file (XML format) with the representation of a secondary structure diagram of the Rfam family. The string can be directly read by rsvg or magick. If a filename is provided, the SVG file will also be written to the specified path.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Get a SVG representing a normal secondary structure diagram of the Rfam family # with ID "FMN" rfamSecondaryStructureXMLSVG("FMN") # Get a SVG representing a basepair conservation secondary structure diagram # for the Rfam family with accession "RF00174", and save it to a file rfamSecondaryStructureXMLSVG("RF00174", filename="RF00174bpcons.svg", plotType="fcbp")
# Get a SVG representing a normal secondary structure diagram of the Rfam family # with ID "FMN" rfamSecondaryStructureXMLSVG("FMN") # Get a SVG representing a basepair conservation secondary structure diagram # for the Rfam family with accession "RF00174", and save it to a file rfamSecondaryStructureXMLSVG("RF00174", filename="RF00174bpcons.svg", plotType="fcbp")
Retrieves the seed multiple alignment of the specified Rfam family. The seed alignment is used to determine the covariance model defining each Rfam family, and comprises only a subset of all the members of each family. If a filename is provided, the alignment will be saved in the specified format.
rfamSeedAlignment(rfamFamily, filename=NULL, format="stockholm")
rfamSeedAlignment(rfamFamily, filename=NULL, format="stockholm")
rfamFamily |
string with an Rfam family accession or ID for which the seed alignment should be retrieved. |
filename |
string indicating a path to which the retrieved seed alignment should be written. The format of the written file is determined by the "format" argument. In the default behaviour, filename=NULL, and no file will be written. Instead, the seed alignment will just be returned as a Biostrings MultipleAlignment object. |
format |
string indicating the format into which the seed alignment
should be retrieved and written to filename Can be one of "stockholm"
(default), "pfam", "fasta" or "fastau". For a description of each type of
format, see |
A Biostrings MultipleAlignment object with the aligned RNA sequences included in the seed alignment. The names of the sequences contain their GenBank accession numbers and the start and end positions of the region corresponding to the sequence region member of the Rfam family.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Generate a Biostrings MultipleAlignment object with the aligned RNA sequences # used to generate the seed alignment for the Rfam family with ID "FMN" rfamSeedAlignment("FMN") # Generate a Biostrings MultipleAlignment object with the aligned RNA sequences # used to generate the seed alignment for the Rfam family with accession # "RF00174", and save it to a file in Stockholm format rfamSeedAlignment("RF00174", filename="RF00174seedAlignment.stk", format="stockholm") # Generate a Biostrings MultipleAlignment object with the aligned RNA sequences # used to generate the seed alignment for the Rfam family with accession # "RF00174", and save it to a file in FASTA format rfamSeedAlignment("RF00174", filename="RF00174seedAlignment.fasta", format="fasta")
# Generate a Biostrings MultipleAlignment object with the aligned RNA sequences # used to generate the seed alignment for the Rfam family with ID "FMN" rfamSeedAlignment("FMN") # Generate a Biostrings MultipleAlignment object with the aligned RNA sequences # used to generate the seed alignment for the Rfam family with accession # "RF00174", and save it to a file in Stockholm format rfamSeedAlignment("RF00174", filename="RF00174seedAlignment.stk", format="stockholm") # Generate a Biostrings MultipleAlignment object with the aligned RNA sequences # used to generate the seed alignment for the Rfam family with accession # "RF00174", and save it to a file in FASTA format rfamSeedAlignment("RF00174", filename="RF00174seedAlignment.fasta", format="fasta")
Gets the phylogenetic tree of the seed multiple alignment associated to the Rfam family. The tree is retrieved in the NHX format (New Hampshire extended format. The tree will be saved to the path specified through filename. The generated file can be read directly with the treeio R package.
rfamSeedTree(rfamFamily, filename)
rfamSeedTree(rfamFamily, filename)
rfamFamily |
string with an Rfam family accession or ID for which the phylogenetic tree of the seed alignment should be retrieved. |
filename |
string indicating a path to which the phylogenetic tree in the NHX format should be written. |
A string with the phylogenetic tree of the seed alignment of the Rfam family in the NHX format. The tree will also be written to the specified path, producing a file that can be read by the treeio package.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Retrieve the phylogenetic tree of the seed alignment for the Rfam family with # ID "FMN" and save it to a file rfamSeedTree("FMN", filename="FMNseedTree.nhx") # Retrieve the phylogenetic tree of the seed alignment for the Rfam family with # accession "RF00174" and save it to a file rfamSeedTree("RF00174", filename="RF00174seedTree.nhx")
# Retrieve the phylogenetic tree of the seed alignment for the Rfam family with # ID "FMN" and save it to a file rfamSeedTree("FMN", filename="FMNseedTree.nhx") # Retrieve the phylogenetic tree of the seed alignment for the Rfam family with # accession "RF00174" and save it to a file rfamSeedTree("RF00174", filename="RF00174seedTree.nhx")
Plots an image of the phylogenetic tree of the seed multiple alignment associated to the Rfam family. If a filename is provided, the image will be saved to the path specified through filename. The tree can be labeled with species names or sequence accessions.
rfamSeedTreeImage(rfamFamily, filename=NULL, label="species")
rfamSeedTreeImage(rfamFamily, filename=NULL, label="species")
rfamFamily |
string with an Rfam family accession or ID for which the phylogenetic tree of the seed alignment should be plotted. |
filename |
string indicating a path to which the plot of the phylogenetic tree should be written. |
label |
string indicating the type of labels that should be added to the plot of the phylogenetic tree. Should be either "species" for labeling with species names (default) or "acc" for labeling with sequence accessions. |
A magick image object with the requested phylogenetic tree plot in GIF format. If a filename was specified, the phylogenetic tree plot will also be saved to the specified path.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Display a plot of the phylogenetic tree of the seed alignment for the Rfam # family with ID "FMN" labeled with species names rfamSeedTreeImage("FMN", label="species") # Display a plot of the phylogenetic tree of the seed alignment for the Rfam # family with accession "RF00147" labeled with sequence accessions and save it # to a PNG file. rfamSeedTreeImage("RF00147", filename="RF00147seedTreeImage.png", label="acc")
# Display a plot of the phylogenetic tree of the seed alignment for the Rfam # family with ID "FMN" labeled with species names rfamSeedTreeImage("FMN", label="species") # Display a plot of the phylogenetic tree of the seed alignment for the Rfam # family with accession "RF00147" labeled with sequence accessions and save it # to a PNG file. rfamSeedTreeImage("RF00147", filename="RF00147seedTreeImage.png", label="acc")
Retrieves all sequence regions encoding an RNA assigned to be a member of the specified Rfam family. If a filename is provided, the list of sequence regions will be saved to the path specified through filename as a tab-delimited file.
rfamSequenceRegions(rfamFamily, filename=NULL)
rfamSequenceRegions(rfamFamily, filename=NULL)
rfamFamily |
string with an Rfam family accession or ID for which the member sequence regions should be retrieved. |
filename |
string indicating a path to which the retrieved sequence regions should be written as a tab-delimited file. In the default behaviour, filename=NULL, and no file will be written. Instead, the sequence regions will be returned as a data frame. |
A data frame with the sequence regions belonging to the specified Rfam family. The data frame contains the following columns:
Sequence GenBank accession |
GenBank accession of the sequence with the sequence region belonging to the Rfam family |
Bit score |
Bit score for the alignment of the sequence region to the Rfam family |
Region start position |
Position of the sequence specified by the GenBank accession where the sequence region belonging to the Rfam family starts |
Region end position |
Position of the sequence specified by the GenBank accession where the sequence region belonging to the Rfam family ends |
Sequence description |
Description of the sequence where the sequence region belonging to the Rfam family is found |
Species |
Name of the species where the sequence region is present |
NCBI tax ID |
NCBI taxonomy ID of the species where the sequence region is present |
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Generate a data frame with the sequence regions belonging to the Rfam family # with ID "FMN" head(rfamSequenceRegions("FMN")) # Generate a data frame with the sequence regions belonging to the Rfam family # with accession "RF00360", and save the sequence regions as a tab-delimited # text file. head(rfamSequenceRegions("RF00360", filename="RF00360sequenceRegions.txt")) ## Not run: # Some Rfam families have too many associated sequence regions, in which case # the list cannot be retrieved from the server. rfamSequenceRegions("RF00174") ## End(Not run)
# Generate a data frame with the sequence regions belonging to the Rfam family # with ID "FMN" head(rfamSequenceRegions("FMN")) # Generate a data frame with the sequence regions belonging to the Rfam family # with accession "RF00360", and save the sequence regions as a tab-delimited # text file. head(rfamSequenceRegions("RF00360", filename="RF00360sequenceRegions.txt")) ## Not run: # Some Rfam families have too many associated sequence regions, in which case # the list cannot be retrieved from the server. rfamSequenceRegions("RF00174") ## End(Not run)
Performs a search of the Rfam database by a provided RNA sequence, and retrieves high-scoring hits of Rfam families with different regions of the provided sequence.
rfamSequenceSearch(sequence, fragmentsOverlap=1000, clanCompetitionFilter=TRUE, clanOverlapThreshold=0.5)
rfamSequenceSearch(sequence, fragmentsOverlap=1000, clanCompetitionFilter=TRUE, clanOverlapThreshold=0.5)
sequence |
string with an RNA sequence to be searched against the Rfam database. Should contain only standard RNA symbols (i.e., "A", "U", "G" and "C"), and no spaces or newlines. |
fragmentsOverlap |
when a sequence larger than 10000 nucleotides is provided, it is internally split into smaller fragments before using them to search the Rfam database. This argument controls the number of overlapping bases between consecutive fragments. |
clanCompetitionFilter |
logical indicating if results should be reduced through a clan competition filter, which removes overlapping hits if they belong to Rfam families of the same clan and have an overlap above a certain threshold. |
clanOverlapThreshold |
number indicating the minimum overlap between two hits (as a fraction of the smallest hit) to remove the hit with the worst e-value if their families belong to the same Rfam clan. |
A nested list where each element of the top-level list represents a high-scoring hit with the Rfam families. Each of the top-level list elements is a list in itself, containing the following elements that describe the hit:
rfamAccession |
Rfam accession of the Rfam family with which a hit was found |
bitScore |
Bit score for the hit with an Rfam family |
eValue |
Expectation value for the with an Rfam family |
alignmentStartPositionQuerySequence |
Start position in the query sequence of the sequence region that resulted in the hit |
alignmentEndPositionQuerySequence |
End position in the query sequence of the sequence region that resulted in the hit |
alignmentStartPositionHitSequence |
Start position in the Rfam family consensus sequence region with which the hit was found |
alignmentEndPositionHitSequence |
End position in the Rfam family consensus sequence region with which the hit was found |
alignmentQuerySequence |
Sequence region of the query RNA sequence provided to search the Rfam database, aligned with the corresponding region of the consensus sequence of the Rfam family with which the hit was found |
alignmentMatch |
String describing the matches between the aligned regions of the query sequence and the consensus sequence of the Rfam family |
alignmentHitSequence |
Sequence region of the consensus sequence of the Rfam family with which the hit was found, aligned to the corresponding region of the query RNA sequence |
alignmentSecondaryStructure |
Secondary structure of the region of the consensus sequence of the Rfam family with which the hit was found |
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/rna_structure_notations.html
# Search the Rfam database for hits with a specific sequence, and store the # results in a nested list searchHits <- rfamSequenceSearch("GGAUCUUCGGGGCAGGGUGAAAUUCCCGACCGGUGGUAUAGUCCAC GAAAGUAUUUGCUUUGAUUUGGUGAAAUUCCAAAACCGACAGUAGAGUCUGGAUGAGAGAAGAUUC") # Check number of high-scoring hits length(searchHits) # Extract the Rfam family accession and ID for the first hit searchHits[[1]]$rfamAccession searchHits[[1]]$rfamID
# Search the Rfam database for hits with a specific sequence, and store the # results in a nested list searchHits <- rfamSequenceSearch("GGAUCUUCGGGGCAGGGUGAAAUUCCCGACCGGUGGUAUAGUCCAC GAAAGUAUUUGCUUUGAUUUGGUGAAAUUCCAAAACCGACAGUAGAGUCUGGAUGAGAGAAGAUUC") # Check number of high-scoring hits length(searchHits) # Extract the Rfam family accession and ID for the first hit searchHits[[1]]$rfamAccession searchHits[[1]]$rfamID
Searches the Rfam database for entries containing a specified keyword, such as "ribozyme" or "FMN", and returns the accession numbers of the Rfam families containing the keyword in its family ID, summary or description.
rfamTextSearchFamilyAccession(query)
rfamTextSearchFamilyAccession(query)
query |
string with the keyword to be searched in the Rfam database. |
A character vector with the accessions of the Rfam families matching the search.
Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038
https://docs.rfam.org/en/latest/api.html
# Search Rfam families associated to the keyword "FMN" rfamTextSearchFamilyAccession("FMN")
# Search Rfam families associated to the keyword "FMN" rfamTextSearchFamilyAccession("FMN")