| Title: | Collection of simple tools for learning about Bioconductor Packages |
|---|---|
| Description: | Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages. |
| Authors: | Shian Su [aut, ctb], Lori Shepherd [ctb], Marcel Ramos [aut, ctb] (ORCID: <https://orcid.org/0000-0002-3242-0582>), Felix G.M. Ernst [ctb], Jennifer Wokaty [ctb], Charlotte Soneson [ctb], Martin Morgan [ctb], Vince Carey [ctb], Sean Davis [aut, cre] (ORCID: <https://orcid.org/0000-0002-8991-6458>) |
| Maintainer: | Sean Davis <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.31.7 |
| Built: | 2026-05-23 09:34:28 UTC |
| Source: | https://github.com/bioc/BiocPkgTools |
get the ORCID id from cre field of Authors@R in packageDescription result
.get_cre_orcid(pkgname).get_cre_orcid(pkgname)
pkgname |
character(1) |
This function uses the gh package to get a list of either issues, pull
requests, or GitHub commits since the specified date for a particular GitHub
repository. The repository must have both the username / organization and the
name, e.g., "Bioconductor/S4Vectors".
activitySince( gh_repo, activity = c("issues", "pulls", "commits"), status = c("closed", "open", "all"), Date, issue_metadata = c("created_at", "number", "title"), token = NULL )activitySince( gh_repo, activity = c("issues", "pulls", "commits"), status = c("closed", "open", "all"), Date, issue_metadata = c("created_at", "number", "title"), token = NULL )
gh_repo |
|
activity |
|
status |
|
Date |
|
issue_metadata |
|
token |
|
The tibble returned by the commits activity report contains five
columns:
'committer_date'
'commit' - hash
'parents' - hash of parent for merge commits
'author'
'message'
For information on other columns, refer to the GitHub API under repository
issues or pulls (e.g., /repos/:repo/issues).
A tibble with three columns corresponding to issue metadata (i.e.,
"created_at", "number", "title")
activitySince("Bioconductor/S4Vectors", "issues", "closed", "2021-05-01") activitySince("Bioconductor/S4Vectors", "issues", "open", "2022-05-01") activitySince("Bioconductor/S4Vectors", "commits", Date = "2022-05-01")activitySince("Bioconductor/S4Vectors", "issues", "closed", "2021-05-01") activitySince("Bioconductor/S4Vectors", "issues", "open", "2022-05-01") activitySince("Bioconductor/S4Vectors", "commits", Date = "2022-05-01")
Get download statistics for Bioconductor packages distributed via Anaconda.
anacondaDownloadStats()anacondaDownloadStats()
Anaconda provide daily download counts for all software packages they distribute. These are summarised into monthly tables of counts and made available from https://github.com/grimbough/anaconda-download-stats This function provides a mechanism to download these monthly counts for Bioconductor packages distributed through Anaconda.
A data.frame of download statistics for
all Bioconductor packages distributed by Anaconda, in tidy format.
Note: Anaconda do not provide counts for unique IP addresses. This column
is listed as NA for all packages to provide continuity with data from
Bioconductor.org obtained by biocDownloadStats(). The counts are
updated monthly, so do not expect to see counts for the current month.
Mike L. Smith
anacondaDownloadStats()anacondaDownloadStats()
The biocBuildEmail function provides a template for notifying
maintainers of errors in the Bioconductor Build System (BBS). This
convenience function returns the body of the email from a template within the
package and provides a copy in the clipboard.
biocBuildEmail( pkg, version = c("release", "devel"), PS = character(1L), dry.run = TRUE, to = NULL, cc = NULL, bcc = NULL, emailTemplate = templatePath(), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, resend = FALSE, verbose = FALSE, credFile = "~/.blastula_creds" ) sentHistory()biocBuildEmail( pkg, version = c("release", "devel"), PS = character(1L), dry.run = TRUE, to = NULL, cc = NULL, bcc = NULL, emailTemplate = templatePath(), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, resend = FALSE, verbose = FALSE, credFile = "~/.blastula_creds" ) sentHistory()
pkg |
|
version |
|
PS |
|
dry.run |
|
to |
|
cc, bcc
|
|
emailTemplate |
|
core.name |
|
core.email |
|
core.id |
|
textOnly |
|
resend |
|
verbose |
|
credFile |
|
The credFile argument is a convenience for avoiding password entry
at every instance an email is sent. If the default file
~/.blastula_creds does not exist, the user will be prompted for
authorization information. Currently it is configured to emails for the
core-team:
blastula::create_smtp_creds_file(
file = "~/.blastula_creds",
user = "[email protected]",
host = "smtp.office365.com",
port = 587,
use_ssl = TRUE
)
A character string of the email
Check the history of emails sent
biocBuildEmail("MultiAssayExperiment", dry.run = TRUE)biocBuildEmail("MultiAssayExperiment", dry.run = TRUE)
The online Bioconductor build reports are great for humans to look at, but they are not easily computable. This function scrapes HTML and text files available from the build report online pages to generate a tidy data frame version of the build report.
biocBuildReport( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )biocBuildReport( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )
version |
|
pkgType |
|
stage.timings |
|
A tbl_df object with columns pkg, version,
author, commit, date, node, stage, and result.
# Set the stage--what version of Bioc am I using? BiocManager::version() latest_build <- biocBuildReport() head(latest_build)# Set the stage--what version of Bioc am I using? BiocManager::version() latest_build <- biocBuildReport() head(latest_build)
This function parses the Build Report tarball for a Bioconductor
release. By default it will pull all the report.tgz files for each
Bioconductor package type. The Bioconductor Build System (BBS) Build Report
tarball contains build status information for all packages in a
Bioconductor release. This function is mainly used by biocBuildReport().
biocBuildReportDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )biocBuildReportDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows"), stage.timings = FALSE )
version |
|
pkgType |
|
stage.timings |
|
This function downloads and parses the build status information for Bioconductor packages. The build status information is available for the current release and the previous release. Other versions may be available.
biocBuildStatusDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows") )biocBuildStatusDB( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows") )
version |
|
pkgType |
|
A data.frame with the following columns:
pkg: The name of the package
node: The builder on which the package was built
stage: The stage of the build, e.g., 'install', 'buildsrc', 'checksrc', etc.
result: The status of the build, e.g., 'OK', 'ERROR', 'WARNINGS', etc.
Get Bioconductor download statistics
biocDownloadStats( pkgType = c("software", "data-experiment", "workflows", "data-annotation") )biocDownloadStats( pkgType = c("software", "data-experiment", "workflows", "data-annotation") )
pkgType |
|
Note that Bioconductor package download stats are not version-specific.
A tibble of download statistics for all Bioconductor packages
biocDownloadStats()biocDownloadStats()
Explore Bioconductor packages through an interactive bubble plot. Click on bubbles to bring up additional information about the package. Size and proximity to center of a bubble is based on the downloads the package has in the past month.
biocExplore(top = 500L, ...)biocExplore(top = 500L, ...)
top |
maximum number of packages displayed in any biocView |
... |
parameters passed to |
A bubble plot of Bioconductor packages
List all the packages associated with a maintainer. By default, it will
return all packages associated with the [email protected] email.
hasBiocMaint returns a logical vector corresponding to the input character
vector of packages indicating whether any package is maintained by the
Bioconductor core team.
biocMaintained( main = "[email protected]", version = BiocManager::version(), pkgType = c("software", "data-experiment", "workflows", "data-annotation") ) hasBiocMaint( pkg, version = BiocManager::version(), main = "maintainer@bioconductor\\.org", repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann") )biocMaintained( main = "[email protected]", version = BiocManager::version(), pkgType = c("software", "data-experiment", "workflows", "data-annotation") ) hasBiocMaint( pkg, version = BiocManager::version(), main = "maintainer@bioconductor\\.org", repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann") )
main |
|
version |
|
pkgType |
|
pkg |
|
repo |
|
For biocMaintained: a tibble of packages associated with the
maintainer.
For hasBiocMaint: a logical vector indicating whether the
package is maintained by Bioconductor.
biocMaintained() ## maintained by Hervé and not maintainer at bioconductor dot org hasBiocMaint("BiocGenerics")biocMaintained() ## maintained by Hervé and not maintainer at bioconductor dot org hasBiocMaint("BiocGenerics")
The BiocViews-generated VIEWS file is available
for Bioconductor release and devel repositories. It
contains quite a bit more information from the
package DESCRIPTION files than the PACKAGES
file. In particular, it contains biocViews annotations
and URLs for vignettes and developer URLs.
biocPkgList( version = BiocManager::version(), repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann", "CRAN"), addBiocViewParents = TRUE )biocPkgList( version = BiocManager::version(), repo = c("BioCsoft", "BioCexp", "BioCworkflows", "BioCann", "CRAN"), addBiocViewParents = TRUE )
version |
|
repo |
|
addBiocViewParents |
|
Since packages are annotated with the most specific
views, the default functionality here is to add parent terms
for all views for each package. For example, in the bioCsoft
repository, all packages will have at least "Software" added
to their biocViews. If one wants to stick to only the most
specific terms, set addBiocViewParents to FALSE.
An object of class tbl_df with one row per package. The result
always includes a fnd list-column whose elements are character vectors
of funder names extracted from persons with role "fnd" in the
formatted Author field. Elements are NA_character_ for packages
that declare no funder (or whose Author field contains no [fnd]
role tag).
bpkgl <- biocPkgList(repo = "BioCsoft") bpkgl unlist(bpkgl[1,'Depends'], use.names = FALSE) # Get a list of all packages that # import "GEOquery" library(dplyr) bpkgl |> filter(Package == 'GEOquery') |> pull('importsMe') |> unlist()bpkgl <- biocPkgList(repo = "BioCsoft") bpkgl unlist(bpkgl[1,'Depends'], use.names = FALSE) # Get a list of all packages that # import "GEOquery" library(dplyr) bpkgl |> filter(Package == 'GEOquery') |> pull('importsMe') |> unlist()
Grab build report results from BUILD_STATUS_DB for a particular package range
biocPkgRanges( start, end, condition = c("ERROR", "WARNINGS"), phase = "buildsrc", version = c("devel", "release") )biocPkgRanges( start, end, condition = c("ERROR", "WARNINGS"), phase = "buildsrc", version = c("devel", "release") )
start |
|
end |
|
condition |
|
phase |
|
version |
|
Vincent J. Carey
## Not run: biocPkgRanges( start = "a4", end = "CMA", condition = "ERROR", version = "devel" ) ## End(Not run)## Not run: biocPkgRanges( start = "a4", end = "CMA", condition = "ERROR", version = "devel" ) ## End(Not run)
Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.
The biocBuildReport() function returns a computable
form of the Bioconductor Build Report.
The biocDownloadStats() function gets Bioconductor
download stats, allowing users to quickly find commonly used
packages. The biocPkgList() is useful for getting
a complete listing of all Bioconductor packages.
Bioconductor packages all have Digital Object Identifiers (DOIs). This package contains basic infrastructure for creating, updating, and de-referencing DOIs.
Maintainer: Sean Davis [email protected] (ORCID)
Authors:
Sean Davis [email protected] (ORCID)
Shian Su [email protected] [contributor]
Marcel Ramos [email protected] (ORCID) [contributor]
Other contributors:
Lori Shepherd [email protected] [contributor]
Felix G.M. Ernst [email protected] [contributor]
Jennifer Wokaty [email protected] [contributor]
Charlotte Soneson [email protected] [contributor]
Martin Morgan [email protected] [contributor]
Vince Carey [email protected] [contributor]
Useful links:
Report bugs at https://github.com/seandavi/BiocPkgTools/issues/new
Managing user data is important to allow use of email functions
such as biocBuildEmail and made easy with BiocFileCache.
setCache( directory = tools::R_user_dir("BiocPkgTools", "cache"), verbose = TRUE, ask = interactive() ) pkgToolsCache(...)setCache( directory = tools::R_user_dir("BiocPkgTools", "cache"), verbose = TRUE, ask = interactive() ) pkgToolsCache(...)
directory |
The file location where the cache is located. Once set future downloads will go to this folder. |
verbose |
Whether to print descriptive messages |
ask |
logical (default TRUE when interactive session) Confirm the file location of the cache directory |
... |
For |
Get the directory location of the cache. It will prompt the user to create
a cache if not already created. A specific directory can be used via
setCache.
Specify the directory location of the data cache. By default, it will
got to the user's home/.cache/R and "appname" directory as specified by
tools::R_user_dir (with package="BiocPkgTools" and which="cache").
The biocRevDepEmail function collects all the emails of the reverse
dependencies and sends a notification that upstream package(s) have been
deprecated or removed. It uses a template found in inst/resources with the
templatePath() function.
biocRevDepEmail( packages, which = c("strong", "most", "all"), PS = character(1L), version = BiocManager::version(), dry.run = TRUE, cc = NULL, emailTemplate = templatePath("revdepnote"), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, verbose = FALSE, credFile = "~/.blastula_creds", ... )biocRevDepEmail( packages, which = c("strong", "most", "all"), PS = character(1L), version = BiocManager::version(), dry.run = TRUE, cc = NULL, emailTemplate = templatePath("revdepnote"), core.name = NULL, core.email = NULL, core.id = NULL, textOnly = FALSE, verbose = FALSE, credFile = "~/.blastula_creds", ... )
packages |
|
which |
a character vector listing the types of
dependencies, a subset of
|
PS |
|
version |
|
dry.run |
|
cc |
|
emailTemplate |
|
core.name |
|
core.email |
|
core.id |
|
textOnly |
|
verbose |
|
credFile |
|
... |
Additional inputs to internal functions (not used). |
biocRevDepEmail( "FindMyFriends", version = "3.13", dry.run = TRUE, textOnly = TRUE )biocRevDepEmail( "FindMyFriends", version = "3.13", dry.run = TRUE, textOnly = TRUE )
The function parses and returns the VIEWS file for a specified Bioconductor
version, either "release" or "devel". The VIEWS file contains metadata
about Bioconductor packages, including information about their categories,
topics, and other details.
biocVIEWSdb( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows") )biocVIEWSdb( version = BiocManager::version(), pkgType = c("software", "data-experiment", "data-annotation", "workflows") )
version |
|
pkgType |
|
biocVIEWSdb(pkgType = "software")biocVIEWSdb(pkgType = "software")
Bioconductor is built using an extensive set of
core capabilities and data structures. This leads
to package developers depending on other packages
for interoperability and functionality. This
function extracts package dependency information
from biocPkgList() and returns a tidy
data.frame that can be used for analysis
and to build graph structures of package dependencies.
buildPkgDependencyDataFrame(dependencies = c("strong", "most", "all"), ...)buildPkgDependencyDataFrame(dependencies = c("strong", "most", "all"), ...)
dependencies |
|
... |
parameters passed along to |
A data.frame (also a tbl_df) of
S3 class "biocDepDF" including columns "Package", "dependency",
and "edgetype".
This function requires network access.
See buildPkgDependencyIgraph(), biocPkgList().
# performs a network call, so must be online. library(BiocPkgTools) depdf <- buildPkgDependencyDataFrame() head(depdf) library(dplyr) # filter to include only "Imports" type # dependencies imports_only <- depdf |> filter(edgetype=='Imports') # top ten most imported packages imports_only |> select(dependency) |> group_by(dependency) |> tally() |> arrange(desc(n)) # The Bioconductor packages with the # largest number of imports largest_importers <- imports_only |> select(Package) |> group_by(Package) |> tally() |> arrange(desc(n)) # not sure what these packages do. Join # to their descriptions biocPkgList() |> select(Package, Description) |> left_join(largest_importers) |> arrange(desc(n)) |> head()# performs a network call, so must be online. library(BiocPkgTools) depdf <- buildPkgDependencyDataFrame() head(depdf) library(dplyr) # filter to include only "Imports" type # dependencies imports_only <- depdf |> filter(edgetype=='Imports') # top ten most imported packages imports_only |> select(dependency) |> group_by(dependency) |> tally() |> arrange(desc(n)) # The Bioconductor packages with the # largest number of imports largest_importers <- imports_only |> select(Package) |> group_by(Package) |> tally() |> arrange(desc(n)) # not sure what these packages do. Join # to their descriptions biocPkgList() |> select(Package, Description) |> left_join(largest_importers) |> arrange(desc(n)) |> head()
Package dependencies represent a directed
graph (though Bioconductor dependencies are
not an acyclic graph). This function simply
returns an igraph graph from the package
dependency data frame from a call to
buildPkgDependencyDataFrame() or
any tidy data frame with rows of (Package, dependency)
pairs. Additional columns are added as igraph edge
attributes (see igraph::graph_from_data_frame()).
buildPkgDependencyIgraph(pkgDepDF)buildPkgDependencyIgraph(pkgDepDF)
pkgDepDF |
a tidy data frame. See description for details. |
An igraph directed graph. See the igraph package for details of what can be done.
See buildPkgDependencyDataFrame(),
igraph::graph_from_data_frame(),
inducedSubgraphByPkgs(), subgraphByDegree(),
igraph::igraph-es-indexing,
igraph::igraph-vs-indexing
library(igraph) pkg_dep_df = buildPkgDependencyDataFrame() # at this point, filter or join to manipulate # dependency data frame as you see fit. g = buildPkgDependencyIgraph(pkg_dep_df) g # Look at nodes and edges head(V(g)) # vertices head(E(g)) # edges # subset graph by attributes head(sort(degree(g, mode='in'), decreasing=TRUE)) head(sort(degree(g, mode='out'), decreasing=TRUE))library(igraph) pkg_dep_df = buildPkgDependencyDataFrame() # at this point, filter or join to manipulate # dependency data frame as you see fit. g = buildPkgDependencyIgraph(pkg_dep_df) g # Look at nodes and edges head(V(g)) # vertices head(E(g)) # edges # subset graph by attributes head(sort(degree(g, mode='in'), decreasing=TRUE)) head(sort(degree(g, mode='out'), decreasing=TRUE))
As the title says it should do something with class relationships
buildClassDepGraph(class, includeUnions = FALSE) buildClassDepData(class, includeUnions = FALSE) buildClassDepFromPackage(pkg, includeUnions = FALSE) plotClassDep(class, includeUnions = FALSE) plotClassDepData(data) plotClassDepGraph(g)buildClassDepGraph(class, includeUnions = FALSE) buildClassDepData(class, includeUnions = FALSE) buildClassDepFromPackage(pkg, includeUnions = FALSE) plotClassDep(class, includeUnions = FALSE) plotClassDepData(data) plotClassDepGraph(g)
class |
a single |
includeUnions |
|
pkg |
a single |
data |
a |
g |
an |
library("SummarizedExperiment") depData <- buildClassDepData("RangedSummarizedExperiment") depData g <- buildClassDepGraph("RangedSummarizedExperiment") plotClassDepGraph(g)library("SummarizedExperiment") depData <- buildClassDepData("RangedSummarizedExperiment") depData g <- buildClassDepGraph("RangedSummarizedExperiment") plotClassDepGraph(g)
The CRANstatus function allows users to check the status of a package
and send an email report of any failures.
CRANstatus( pkg, core.name = NULL, core.email = NULL, core.id = NULL, to.mail = "[email protected]", dry.run = TRUE, emailTemplate = templatePath("cranreport") )CRANstatus( pkg, core.name = NULL, core.email = NULL, core.id = NULL, to.mail = "[email protected]", dry.run = TRUE, emailTemplate = templatePath("cranreport") )
pkg |
|
core.name |
|
core.email |
|
core.id |
|
to.mail |
The email of the CRAN report recipient |
dry.run |
|
emailTemplate |
|
This function uses the biocDownloadStats data to approximate when a package entered Bioconductor. Note that the download stats go back only to 2009.
firstInBioc(download_stats)firstInBioc(download_stats)
download_stats |
a data.frame from |
dls <- biocDownloadStats() tail(firstInBioc(dls))dls <- biocDownloadStats() tail(firstInBioc(dls))
This function makes calls out to the DataCite REST API described here: https://support.datacite.org/docs/api-create-dois. The function creates a new DOI for a Bioconductor package (cannot already exist). The target URL for the DOI is the short Bioconductor package URL.
generateBiocPkgDOI( pkg, authors, pubyear, event = c("publish", "register", "hide"), testing = TRUE )generateBiocPkgDOI( pkg, authors, pubyear, event = c("publish", "register", "hide"), testing = TRUE )
pkg |
|
authors |
|
pubyear |
|
event |
Either "hide", "register", or publish". Typically, we use "publish" to make the DOI findable. |
testing |
|
The login information for the "real" Bioconductor account should be stored in the environment variables "DATACITE_USERNAME" and "DATACITE_PASSWORD
The GUI is available here: https://doi.datacite.org/.
The DOI as a character(1) vector.
generateBiocPkgDOI('RANDOM_TEST_PACKAGE','Sean Davis',1972, testing = TRUE)generateBiocPkgDOI('RANDOM_TEST_PACKAGE','Sean Davis',1972, testing = TRUE)
Get data from Bioconductor
get_bioc_data()get_bioc_data()
A JSON string containing Bioconductor package details
bioc_data <- get_bioc_data()bioc_data <- get_bioc_data()
get ORCID ids from cre fields of Authors@R in packageDescription results
get_cre_orcids(pkgnames)get_cre_orcids(pkgnames)
pkgnames |
|
returns NA if no ORCID provided in Authors@R for package description
get_cre_orcids(c("BiocPkgTools", "utils"))get_cre_orcids(c("BiocPkgTools", "utils"))
The actual vignette path is available using biocPkgList().
getBiocVignette( vignettePath, destfile = tempfile(), version = BiocManager::version() )getBiocVignette( vignettePath, destfile = tempfile(), version = BiocManager::version() )
vignettePath |
|
destfile |
|
version |
|
character(1) The filename of the downloaded vignette
x = biocPkgList() tmp = getBiocVignette(x$vignettes[[1]][1]) tmp ## Not run: library(pdftools) y = pdf_text(tmp) y = paste(y,collapse=" ") library(tm) v = VCorpus(VectorSource(y)) v <- v |> tm_map(stripWhitespace) |> tm_map(content_transformer(tolower)) |> tm_map(removeWords, stopwords("english")) |> tm_map(stemDocument) dtm = DocumentTermMatrix(v) inspect(DocumentTermMatrix(v, list(dictionary = as.character(x$Package)))) ## End(Not run)x = biocPkgList() tmp = getBiocVignette(x$vignettes[[1]][1]) tmp ## Not run: library(pdftools) y = pdf_text(tmp) y = paste(y,collapse=" ") library(tm) v = VCorpus(VectorSource(y)) v <- v |> tm_map(stripWhitespace) |> tm_map(content_transformer(tolower)) |> tm_map(removeWords, stopwords("english")) |> tm_map(stemDocument) dtm = DocumentTermMatrix(v) inspect(DocumentTermMatrix(v, list(dictionary = as.character(x$Package)))) ## End(Not run)
Generate needed information to create DOI from a package directory.
getPackageInfo(dir)getPackageInfo(dir)
dir |
|
A data.frame
This function determines the number of years a package has been in Bioconductor. Available information includes first Bioconductor version a package appeared and the current length of time in Bioconductor. If a package has been removed from Bioconductor, information on the last Bioconductor version and approximate time in Bioconductor before removal is available.
getPkgYearsInBioc(pkglist = NULL)getPkgYearsInBioc(pkglist = NULL)
pkglist |
List of packages to retrieve information. If default NULL, returns a tibble of all Bioconductor packages. |
'tibble' with the following columns:
package: name of Bioconductor package
category: bioc, data/experiment, data/annotation, workflow
first_version_available: Bioconductor version (e.g. 1.9, 3.21) the package first became available
first_version_release_date: Equivalent calendar date of given Bioconductor release
approx_years_in: Numeric indicator of years in Bioconductor. If empty, indicates package was removed. See final three columns for more information.
last_version_available: If package was removed from Bioconductor, the last Bioconductor version (e.g. 1.9, 3.21) the package was able to be installed
last_version_release_date: Equivalent calendar date of given Bioconductor release
years_before_rm: If removed, how many years it was in Bioconductor
Lori Shepherd Kern, Robert Shear
## Not run: ## full table all Bioconductor packages tbl <- getPkgYearsInBioc() ## example of package list. Packages active in Bioconductor tbl <- getPkgYearsInBioc(c("BiocFileCache", "BiocPkgTools")) ## example of a package that has been removed from Bioconductor tbl <- getPkgYearsInBioc("ensemblVEP") ## End(Not run)## Not run: ## full table all Bioconductor packages tbl <- getPkgYearsInBioc() ## example of package list. Packages active in Bioconductor tbl <- getPkgYearsInBioc(c("BiocFileCache", "BiocPkgTools")) ## example of a package that has been removed from Bioconductor tbl <- getPkgYearsInBioc("ensemblVEP") ## End(Not run)
For packages that live on GitHub, we can mine further details. This function returns the GitHub details for the listed packages.
githubDetails(pkgs, sleep = 0)githubDetails(pkgs, sleep = 0)
pkgs |
a |
sleep |
|
The gh::gh() function is used to
do the fetching. If the number of packages supplied
to this function is large (>40 or so), it is possible
to run into problems with API rate limits. The gh
package uses the environment variable "GITHUB_PAT"
(for personal access token) to authenticate and then
provide higher rate limits. If you run into problems
with rate limits, set sleep to some small positive
number to slow queries. Alternatively, create a Personal
Access Token on GitHub and register it. See the gh
package for details.
pkglist = biocPkgList() # example of "pkgs" format. head(pkglist$URL) gh_list = githubURLParts(pkglist$URL) gh_list = gh_list[!is.null(gh_list$user_repo),] head(gh_list$user_repo) ghd = githubDetails(gh_list$user_repo[1:5]) lapply(ghd, '[[', "stargazers")pkglist = biocPkgList() # example of "pkgs" format. head(pkglist$URL) gh_list = githubURLParts(pkglist$URL) gh_list = gh_list[!is.null(gh_list$user_repo),] head(gh_list$user_repo) ghd = githubDetails(gh_list$user_repo[1:5]) lapply(ghd, '[[', "stargazers")
Extract GitHub user and repo name from GitHub URL
githubURLParts(urls)githubURLParts(urls)
urls |
|
A data.frame with four columns:
url: The original GitHub URL
user_repo: The GitHub "username/repo", combined
user: The GitHub username
repo: The GitHub repo name
# find GitHub URL details for # Bioconductor packages bpkgl = biocPkgList() urldetails = githubURLParts(bpkgl$URL) urldetails = urldetails[!is.na(urldetails$url),] head(urldetails)# find GitHub URL details for # Bioconductor packages bpkgl = biocPkgList() urldetails = githubURLParts(bpkgl$URL) urldetails = urldetails[!is.na(urldetails$url),] head(urldetails)
Find the subgraph induced by including specific packages. The induced subgraph is the graph that includes the named packages and all edges connecting them. This is useful for a developer, for example, to examine her packages and their intervening dependencies.
inducedSubgraphByPkgs(g, pkgs, pkg_color = "red")inducedSubgraphByPkgs(g, pkgs, pkg_color = "red")
g |
an igraph graph, typically created by
|
pkgs |
|
pkg_color |
|
library(igraph) g <- buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) ## subgraph of only the first 10 packages maintained by Bioconductor biocmaintained <- head(biocMaintained()[["Package"]], 10L) g2 <- inducedSubgraphByPkgs(g, pkgs = biocmaintained) g2 V(g2) plot(g2) ## subgraph of a package's strong Bioconductor package dependencies maedeps <- unlist(pkgBiocDeps( "MultiAssayExperiment", which = "strong", recursive = TRUE, only.bioc = TRUE ), use.names = FALSE) g3 <- inducedSubgraphByPkgs(g, pkgs = maedeps) plot(g3) ## same subgraph with networkD3::forceNetwork library(networkD3) wt <- cluster_walktrap(g3) members <- membership(wt) ndg3 <- igraph_to_networkD3(g3, group = members) forceNetwork( Links = ndg3$links, Nodes = ndg3$nodes, Source = 'source', Target = 'target', NodeID = 'name', Group = 'group', zoom = TRUE, linkDistance = 200, fontSize = 20, opacity = 0.9, opacityNoHover = 0.9 )library(igraph) g <- buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) ## subgraph of only the first 10 packages maintained by Bioconductor biocmaintained <- head(biocMaintained()[["Package"]], 10L) g2 <- inducedSubgraphByPkgs(g, pkgs = biocmaintained) g2 V(g2) plot(g2) ## subgraph of a package's strong Bioconductor package dependencies maedeps <- unlist(pkgBiocDeps( "MultiAssayExperiment", which = "strong", recursive = TRUE, only.bioc = TRUE ), use.names = FALSE) g3 <- inducedSubgraphByPkgs(g, pkgs = maedeps) plot(g3) ## same subgraph with networkD3::forceNetwork library(networkD3) wt <- cluster_walktrap(g3) members <- membership(wt) ndg3 <- igraph_to_networkD3(g3, group = members) forceNetwork( Links = ndg3$links, Nodes = ndg3$nodes, Source = 'source', Target = 'target', NodeID = 'name', Group = 'group', zoom = TRUE, linkDistance = 200, fontSize = 20, opacity = 0.9, opacityNoHover = 0.9 )
The latestPkgStats function combines outputs from several functions to
generate a table of relevant statistics for a given package.
latestPkgStats( gh_repo, Date, pkgType = c("software", "data-experiment", "workflows", "data-annotation") )latestPkgStats( gh_repo, Date, pkgType = c("software", "data-experiment", "workflows", "data-annotation") )
gh_repo |
|
Date |
|
pkgType |
|
latestPkgStats("Bioconductor/BiocGenerics", "2021-05-05")latestPkgStats("Bioconductor/BiocGenerics", "2021-05-05")
Get a data.frame of employment info from ORCID
orcid_table(orcids)orcid_table(orcids)
orcids |
|
a data.frame of employment info using the ORCID API
if (interactive()) { orcid_table( orcids = c( "0000-0002-3242-0582", "0000-0003-4046-0063", "0000-0003-2725-0694" ) ) }if (interactive()) { orcid_table( orcids = c( "0000-0002-3242-0582", "0000-0003-4046-0063", "0000-0003-2725-0694" ) ) }
The function uses the pkgType argument to restrict the look up to only the
relevant Bioconductor repository. It works for multiple packages of the same
type.
pkgBiocDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "strong", only.bioc = TRUE, recursive = FALSE, version = BiocManager::version() )pkgBiocDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "strong", only.bioc = TRUE, recursive = FALSE, version = BiocManager::version() )
pkg |
|
pkgType |
|
which |
a character vector listing the types of
dependencies, a subset of
|
only.bioc |
|
recursive |
a logical indicating whether (reverse) dependencies
of (reverse) dependencies (and so on) should be included, or a
character vector like |
version |
(Optional) |
pkgBiocDeps("MultiAssayExperiment", only.bioc = TRUE) pkgBiocDeps("MultiAssayExperiment", only.bioc = FALSE)pkgBiocDeps("MultiAssayExperiment", only.bioc = TRUE) pkgBiocDeps("MultiAssayExperiment", only.bioc = FALSE)
The function returns a slightly upgraded list with dependency types as
elements and package names in each of those elements, if any. The
types of dependencies can be seen in the which argument documentation.
pkgBiocRevDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "all", only.bioc = TRUE, version = BiocManager::version(), recursive = FALSE ) ## S3 method for class 'biocrevdeps' summary(object, ...)pkgBiocRevDeps( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), which = "all", only.bioc = TRUE, version = BiocManager::version(), recursive = FALSE ) ## S3 method for class 'biocrevdeps' summary(object, ...)
pkg |
|
pkgType |
|
which |
a character vector listing the types of
dependencies, a subset of
|
only.bioc |
|
version |
(Optional) |
recursive |
a logical indicating whether (reverse) dependencies
of (reverse) dependencies (and so on) should be included, or a
character vector like |
object |
an object for which a summary is desired. |
... |
additional arguments affecting the summary produced. |
The summary method of the biocrevdeps class given by
pkgBiocRevDeps provides a tally in each dependency field.
A biocrevdeps list class object
rdeps <- pkgBiocRevDeps("MultiAssayExperiment", which = "all") rdeps summary(rdeps)rdeps <- pkgBiocRevDeps("MultiAssayExperiment", which = "all") rdeps summary(rdeps)
Calculate dependency gain achieved by excluding combinations of packages
pkgCombDependencyGain(pkg, depdf, maxNbr = 3L)pkgCombDependencyGain(pkg, depdf, maxNbr = 3L)
pkg |
character, the name of the package for which we want to estimate the dependency gain |
depdf |
a tidy data frame with package dependency information
obtained through the function |
maxNbr |
numeric, the maximal number of direct dependencies to leave out simultaneously |
A data frame with three columns: ExclPackages (the excluded direct dependencies), NbrExcl (the number of excluded direct dependencies), DepGain (the dependency gain from excluding these direct dependencies)
Charlotte Soneson
depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pcd <- pkgCombDependencyGain('GEOquery', depdf, maxNbr = 3L) head(pcd[order(pcd$DepGain, decreasing = TRUE), ])depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pcd <- pkgCombDependencyGain('GEOquery', depdf, maxNbr = 3L) head(pcd[order(pcd$DepGain, decreasing = TRUE), ])
Function adapted from 'itdepends::dep_usage_pkg' at https://github.com/r-lib/itdepends to obtain the functionality imported and used by a given package.
pkgDepImports(pkg)pkgDepImports(pkg)
pkg |
|
Certain imported elements, such as built-in constants, will not be identified as imported functionality by this function.
A tidy data frame with two columns:
pkg: name of the package dependency.
fun: name of the functionality call imported from the
the dependency in the column pkg and used within
the analyzed package.
Robert Castelo
pkgDepImports('BiocPkgTools')pkgDepImports('BiocPkgTools')
Elaborate a report on the dependency burden of a given package.
pkgDepMetrics(pkg, depdf)pkgDepMetrics(pkg, depdf)
pkg |
|
depdf |
a tidy data frame with package dependency information
obtained through the function |
A tidy data frame with different metrics on the package dependency burden. More concretely, the following columns:
ImportedAndUsed: number of functionality calls imported and used in
the package.
Exported: number of functionality calls exported by the dependency.
Usage: (ImportedAndUsedx 100) / Exported. This value provides an
estimate of what fraction of the functionality of the dependency is
actually used in the given package.
DepOverlap: Similarity between the dependency graph structure of the
given package and the one of the dependency in the corresponding row,
estimated as the Jaccard index
between the two sets of vertices of the corresponding graphs. Its values
goes between 0 and 1, where 0 indicates that no dependency is shared, while
1 indicates that the given package and the corresponding dependency depend
on an identical subset of packages.
DepGainIfExcluded: The 'dependency gain' (decrease in the total number
of dependencies) that would be obtained if this package was excluded
from the list of direct dependencies.
The reported information is ordered by the Usage column to facilitate the
identification of dependencies for which the analyzed package is using a small
fraction of their functionality and therefore, it could be easier remove them.
To aid in that decision, the column DepOverlap reports the overlap of the
dependency graph of each dependency with the one of the analyzed package. Here
a value above, e.g., 0.5, could, albeit not necessarily, imply that removing
that dependency could substantially lighten the dependency burden of the analyzed
package.
An NA value in the ImportedAndUsed column indicates that the function
pkgDepMetrics() could not identify what functionality calls in the analyzed
package are made to the dependency.
Robert Castelo
Charlotte Soneson
depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pkgDepMetrics('BiocPkgTools', depdf)depdf <- buildPkgDependencyDataFrame( dependencies=c("Depends", "Imports"), repo=c("BioCsoft", "CRAN") ) pkgDepMetrics('BiocPkgTools', depdf)
This function uses available.packages to calculate the download rank
percentile of a given package. It approximates what is observed
in the Bioconductor landing page.
pkgDownloadRank( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), version = BiocManager::version() )pkgDownloadRank( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), version = BiocManager::version() )
pkg |
|
pkgType |
|
version |
|
The package's percentile rank, in terms of download statistics, and proportion in the name
## Percentile rank for BiocGenerics (top 1%) pkgDownloadRank("BiocGenerics", "software")## Percentile rank for BiocGenerics (top 1%) pkgDownloadRank("BiocGenerics", "software")
Get Bioconductor download statistics for a package
pkgDownloadStats( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), years = format(Sys.time(), "%Y") )pkgDownloadStats( pkg, pkgType = c("software", "data-experiment", "workflows", "data-annotation"), years = format(Sys.time(), "%Y") )
pkg |
|
pkgType |
|
years |
numeric(), |
A tibble of download statistics
pkgDownloadStats("GenomicRanges")pkgDownloadStats("GenomicRanges")
This is a quick way to get an HTML report of packages maintained by a specific developer or which depend directly on a specified package. The function is keyed to filter based on either the maintainer name or by using the 'Depends', 'Suggests' and 'Imports' fields in package descriptions.
problemPage( authorPattern = "V.*Carey", dependsOn, ver = "devel", includeOK = FALSE )problemPage( authorPattern = "V.*Carey", dependsOn, ver = "devel", includeOK = FALSE )
authorPattern |
|
dependsOn |
|
ver |
|
includeOK |
|
DT::datatable call; if assigned to a variable, must evaluate to get the page to appear
Vince Carey, Mike L. Smith
if (interactive()) { problemPage() problemPage(dependsOn = "limma") }if (interactive()) { problemPage() problemPage(dependsOn = "limma") }
Summarize binary packages compatible with the Bioconductor or Terra container in use.
repositoryStats( version = BiocManager::version(), binary_repository = BiocManager::containerRepository(version), local = FALSE ) ## S3 method for class 'repositoryStats' print(x, ...)repositoryStats( version = BiocManager::version(), binary_repository = BiocManager::containerRepository(version), local = FALSE ) ## S3 method for class 'repositoryStats' print(x, ...)
version |
|
binary_repository |
|
local |
|
x |
the object returned by |
... |
further arguments passed to or from other methods (not used). |
For local repositories, use the local = TRUE argument. Local
repositories will typically start with the file:// URI. The function
checks the mtime of the output of file.info on the PACKAGES file in
the local repository. Otherwise, by default, it will check the
last-modified header of the PACKAGES file via httr2::resp_header().
a list of class repositoryStats with the following fields:
container: character(1) container label, e.g.,
bioconductor_docker, or NA if not evaluated on a supported container
bioconductor_version: package_version the
Bioconductor version provided by the user.
repository_exists: logical(1) TRUE if a binary repository
exists for the container and Bioconductor_Version version.
bioconductor_binary_repository: character(1) repository
location, if available, or NA if the repository does not exist.
n_software_packages: integer(1) number of software packages
in the Bioconductor source repository.
n_binary_packages: integer(1) number of binary packages
available. When a binary repository exists, this number is likely
to be larger than the number of source software packages, because
it includes the binary version of the source software packages, as
well as the (possibly CRAN) dependencies of the binary packages
n_binary_software_packages: integer(1) number of binary
packages derived from Bioconductor source packages. This number is
less than or equal to n_software_packages.
missing_binaries: integer(1) the number of Bioconductor
source software packages that are not present in the binary
repository.
out_of_date_binaries: integer(1) the number of Bioconductor
source software packages that are newer than their binary
counterpart. A newer source software package
might occur when the main Bioconductor build system has
updated a package after the most recent run of the binary
build system.
print(repositoryStats): Print a summary of package
availability in binary repositories.
M. Morgan
stats <- repositoryStats() # obtain statistics stats # display a summary stats$container # access an element for further computationstats <- repositoryStats() # obtain statistics stats # display a summary stats$container # access an element for further computation
While the inducedSubgraphByPkgs()
returns the subgraph with the minimal connections
between named packages, this function takes a vector of
package names, a degree (1 or more) and returns the
subgraph(s) that are within degree of the
package named.
subgraphByDegree(g, pkg, degree = 1, ...)subgraphByDegree(g, pkg, degree = 1, ...)
g |
an igraph graph, typically created by
|
pkg |
|
degree |
integer(1) degree, limit search for adjacent vertices to this degree. |
... |
passed on to |
An igraph graph, with only nodes and their edges within degree of the named package
g <- buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) g2 <- subgraphByDegree(g, 'GEOquery') plot(g2)g <- buildPkgDependencyIgraph(buildPkgDependencyDataFrame()) g2 <- subgraphByDegree(g, 'GEOquery') plot(g2)
These templates are used with biocBuildEmail to notify maintainers
regarding package errors and final deprecation warning.
templatePath( type = c("buildemail", "deprecation", "deprecguide", "cranreport", "revdepnote") )templatePath( type = c("buildemail", "deprecation", "deprecguide", "cranreport", "revdepnote") )
type |
|
templatePath("buildemail")templatePath("buildemail")