Title: | Query the EDIR Database For Specific Gene |
---|---|
Description: | EDIRquery provides a tool to search for genes of interest within the Exome Database of Interspersed Repeats (EDIR). A gene name is a required input, and users can additionally specify repeat sequence lengths, minimum and maximum distance between sequences, and whether to allow a 1-bp mismatch. Outputs include a summary of results by repeat length, as well as a dataframe of query results. Example data provided includes a subset of the data for the gene GAA (ENSG00000171298). To query the full database requires providing a path to the downloaded database files as a parameter. |
Authors: | Laura D.T. Vo Ngoc [aut, cre] |
Maintainer: | Laura D.T. Vo Ngoc <[email protected]> |
License: | GPL-3 |
Version: | 1.7.0 |
Built: | 2024-11-29 05:25:31 UTC |
Source: | https://github.com/bioc/EDIRquery |
A dataset containing two formats of gene names and associated chromosome number.
gene_chr
gene_chr
A data frame with 60571 rows and 3 variables:
chromosome
Ensembl gene ID of gene
HGNC symbol of gene
This function searches for a specified gene in the EDIR dataset. A gene name is a required parameter.
gene_lookup( gene, length = NA, mindist = 0, maxdist = 1000, format = "data.frame", summary = FALSE, mismatch = TRUE, path = NA )
gene_lookup( gene, length = NA, mindist = 0, maxdist = 1000, format = "data.frame", summary = FALSE, mismatch = TRUE, path = NA )
gene |
The gene name (ENSEMBL ID or HGNC symbol) |
length |
Repeat sequence length, must be between 7 and 20. Defaults to NA. If NA, results will include all available lengths in dataset for queried gene. |
mindist |
Minimum spacer distance between repeats. Defaults to 0. |
maxdist |
Maximum spacer distance between repeats. Defaults to 1000. |
format |
Output table format. One of 'data.frame', 'GInteractions'. Defaults to 'data.frame'. |
summary |
Logical value indicating whether to store summary. Defaults to FALSE. |
mismatch |
Logical value indicating whether to allow 1 mismatch in sequences. Defaults to TRUE. |
path |
String containing path to directory holding downloaded dataset
files. Defaults to NA. If not provided ( |
Summary of results printed to console includes gene name, gene length (bp), Ensembl transcript ID, queried distance between repeats (default: 0-1000 bp), and an overview of total results for the given repeat length. Console outputs include runtime.
A data.frame of the results from the EDIR database. If
summary = TRUE
, returns a tibble containing the summary
($summary
), and query results ($results
).
## With given repeat length, gene_lookup("GAA", length = 7, mindist = 10, maxdist = 1000, mismatch = TRUE) ## Without specified repeat length gene_lookup("GAA", mindist = 0, maxdist = 1000, mismatch = TRUE) ## To access query results, store in variable output <- gene_lookup("GAA", length = 7, mindist = 10, maxdist = 1000, mismatch = FALSE) head(output) ## With summary = TRUE output <- gene_lookup("GAA", length = 10, mindist = 10, maxdist = 1000, summary = TRUE, mismatch = TRUE) output$summary head(output$results)
## With given repeat length, gene_lookup("GAA", length = 7, mindist = 10, maxdist = 1000, mismatch = TRUE) ## Without specified repeat length gene_lookup("GAA", mindist = 0, maxdist = 1000, mismatch = TRUE) ## To access query results, store in variable output <- gene_lookup("GAA", length = 7, mindist = 10, maxdist = 1000, mismatch = FALSE) head(output) ## With summary = TRUE output <- gene_lookup("GAA", length = 10, mindist = 10, maxdist = 1000, summary = TRUE, mismatch = TRUE) output$summary head(output$results)