Title: | Query data from SNPedia |
---|---|
Description: | SNPediaR provides some tools for downloading and parsing data from the SNPedia web site <http://www.snpedia.com>. The implemented functions allow users to import the wiki text available in SNPedia pages and to extract the most relevant information out of them. If some information in the downloaded pages is not automatically processed by the library functions, users can easily implement their own parsers to access it in an efficient way. |
Authors: | David Montaner [aut, cre] |
Maintainer: | David Montaner <[email protected]> |
License: | GPL-2 |
Version: | 1.33.0 |
Built: | 2024-10-31 05:22:26 UTC |
Source: | https://github.com/bioc/SNPediaR |
SNPedia pages usually have a table in the right hand side which summarizes most relevant information in the page. This functions help extracting this kind of information for the given tags or rows.
extractTags (x, tags) extractSnpTags (x, tags) extractGenotypeTags (x, tags)
extractTags (x, tags) extractSnpTags (x, tags) extractGenotypeTags (x, tags)
x |
a wiki page (single character vector) |
tags |
character vector of tags (row names) to be collected. |
extractTags
is a general purpose function aimed to work at any page.
extractSnpTags
calls extractTags
with a set of predefined
tags suitable for SNP pages.
extractGenotypeTags
does the same for genotype pages.
This functions take a character vector of length one
but return a vector with as many values as the tag list provided.
They are devised to be used with sapply
functions.
Notice that in SNPedia not all information presented in the HTML table is available in the JSON format retrieved by the R package. Risk information for instance needs to be collected from the genotype pages as it is not available in the JSON version of the SNP pages.
A character vector with the value of each of the tags if available in the page and NA otherwise.
getPages, getCategoryElements
res <- getPages (c ("Rs1234", "Rs53576")) t (sapply (res, extractSnpTags)) extractTags (res[[1]], tags = c("rsid", "Chromosome", "position")) res <- getPages (c ("Rs1234(A;A)", "Rs1234(A;C)","Rs1234(C;C)")) t (sapply (res, extractGenotypeTags)) getPages (c ("Rs1234(A;A)", "Rs1234(A;C)","Rs1234(C;C)"), wikiParseFunction = extractGenotypeTags) getPages (c ("Rs1234(A;A)", "Rs1234(A;C)","Rs1234(C;C)"), wikiParseFunction = extractGenotypeTags, tags = c("rsid", "allele1", "allele2"))
res <- getPages (c ("Rs1234", "Rs53576")) t (sapply (res, extractSnpTags)) extractTags (res[[1]], tags = c("rsid", "Chromosome", "position")) res <- getPages (c ("Rs1234(A;A)", "Rs1234(A;C)","Rs1234(C;C)")) t (sapply (res, extractGenotypeTags)) getPages (c ("Rs1234(A;A)", "Rs1234(A;C)","Rs1234(C;C)"), wikiParseFunction = extractGenotypeTags) getPages (c ("Rs1234(A;A)", "Rs1234(A;C)","Rs1234(C;C)"), wikiParseFunction = extractGenotypeTags, tags = c("rsid", "allele1", "allele2"))
A function to get all page names of SNPedia tagged under the indicated category.
getCategoryElements(category, verbose = FALSE, includeTemplates = FALSE, limit, baseURL, format, query, continue)
getCategoryElements(category, verbose = FALSE, includeTemplates = FALSE, limit, baseURL, format, query, continue)
category |
The category to be used. Just one at a time. |
verbose |
If TRUE some messages are provided. |
includeTemplates |
If TRUE page templates are kept in the output. |
limit |
The maximum number of items to be queried at a time. |
baseURL |
SNPedia boots URL. |
format |
Downloading format. Currently just JSON is available. |
query |
The query to be iterated. |
continue |
To be used in multi-page queries. |
A list of all available categories may be found at:
http://www.snpedia.com/index.php/Special:Categories
Most used categories are:
Is_a_medical_condition
Is_a_medical_condition
Is_a_medicine
Is_a_topic
Is_a_snp
In_dbSNP
Is_a_genotype
Some template pages are included in their corresponding category. By default those will be removed. Set includeTemplates to TRUE is you want to keep them.
Parameters other than category and verbose are not intended for standard users.
A character vector containing the names of the pages under the required category.
getPages, extractTags
res <- getCategoryElements(category = "Is_a_medical_condition") head(res) ## Not run: res <- getCategoryElements(category = "Is_a_snp") ## End(Not run)
res <- getCategoryElements(category = "Is_a_medical_condition") head(res) ## Not run: res <- getCategoryElements(category = "Is_a_snp") ## End(Not run)
A function to download the (wiki) text content of a list of SNPedia pages.
getPages(titles, verbose = FALSE, limit = 50, wikiParseFunction = identity, baseURL, format, query, ...)
getPages(titles, verbose = FALSE, limit = 50, wikiParseFunction = identity, baseURL, format, query, ...)
titles |
Titles of the pages to be downloaded. |
verbose |
If TRUE some messages are provided. |
limit |
The maximum number of items to be queried at a time. |
wikiParseFunction |
Function to be used to parse the wiki code
at downloading time.
Default is |
baseURL |
SNPedia boots URL. |
format |
Downloading format. Currently just JSON is available. |
query |
The query to be iterated. |
... |
any parameter to be pasted to the |
JSON format is parsed to extract the wiki text returned by the function.
If the wikiParseFunction
parameter is provided,
parsing of the pages is done internally once each batch of pages
is downloaded.
Pages do not need to be of the same class...
but users may be aware of the type of pages they are queering,
moreover when using their own wikiParseFunction
.
Parameters baseURL
, format
and query
are not intended for end users.
A list containing the wiki content of the required pages
or the formatted objects returned by the wikiParseFunction
applied to each page.
extractTags, getCategoryElements
res <- getPages(titles = "Rs1234") res res <- getPages(titles = c("Rs1234", "Rs1234(A;A)", "Rs1234(A;C)")) res myfun <- function(x) substring(x, 1, 5) lapply(res, myfun) res <- getPages(titles = c("Rs1234", "Rs1234(A;A)", "Rs1234(A;C)"), wikiParseFunction = myfun) res
res <- getPages(titles = "Rs1234") res res <- getPages(titles = c("Rs1234", "Rs1234(A;A)", "Rs1234(A;C)")) res myfun <- function(x) substring(x, 1, 5) lapply(res, myfun) res <- getPages(titles = c("Rs1234", "Rs1234(A;A)", "Rs1234(A;C)"), wikiParseFunction = myfun) res