HOWTO generate repository HTML

Overview

This document assumes you have a collection of R packages on local disk that you would like to prepare for publishing to the web. The end result we are going for is:

  1. Packages organized per CRAN-style repository standard.
  2. PACKAGES files created for install.packages access.
  3. VIEWS file created for generating biocViews.
  4. A vignette directory created containing the extracted vignette pdf files from each source package in the repository.
  5. An html directory created containing html descriptions of each package with links for downloading available artifacts.
  6. A simple alphabetical listing index.html file.

CRAN-style Layout

Establish a top-level directory for the repository, we will refer to this directory as reposRoot. Place your packages as follows:

src/contrib Contains all source packages (*.tar.gz).

bin/windows/contrib/x.y Contains all win.binary packages (*.zip). Where x.y is the major.minor version number of R.

bin/macosx/contrib/x.y Contains the mac.binary (High Sierra) (*.tgz) packages.

You will need the following parameters:

reposRoot <- "path/to/reposRoot"
## The names are essential
contribPaths <- c(source="src/contrib",
                   win.binary="bin/windows/contrib/4.0",
                   mac.binary="bin/macosx/contrib/4.0")

Extracting vignettes

The extractVignettes function extracts pdf files from inst/doc. The default is to extract to a reposRoot/vignettes.

extractVignettes(reposRoot, contribPaths["source"])

Generating the control files

The genReposControlFiles function will generate the PACKAGES files for each contrib path and also create a VIEWS file with complete info for later use by biocViews.

genReposControlFiles(reposRoot, contribPaths)

Generating the HTML

The writeRepositoryHtml will generate HTML detail files for each package in reposRoot/html. The function will also create an index.html file at the top level.

Two CSS files are included with biocViews that are automatically copied along side the appropriate HTML files during the HTML generation process. These CSS files are:

reposRoot/repository-detail.css
reposRoot/html/package-detail.css

Design and extension notes

The basic idea is that using the VIEWS file and the known repository structure (location of packages and extracted vignettes), we represent the details for each package in the repository in a PackageDetail-class instance.

packageDetail-class objects know how to write themselves to HTML using the htmlValue method. We used the XML package’s xmlOutputDOM function to build up the HTML documents. Each HTML producing class extends Htmlized-class which contains a slot to hold the DOM tree and provides a place to put methods that are not specific to any given HTML outputting class.

In terms of extending this to generate the biocViews, have a look at setDependsOnMeImportsMeSuggeswhich builds up an adjacency matrix representing package dependencies, importations, and suggestions. The matrix is square with rows and columns labeled with the names of the packages. The entries are 0/1 with aij = 1 meaning that package j depends on package i.

Details on HTML generation

I started by breaking the htmlValue method for PackageDetail-class into one helper function for each logical section of the HTML we produce (author, description, details, downloads, and vignettes). That made the long method short enough to be readable.

In order to be able to mix and match the different chunks and be able to more easily create new renderings, it seemed that it would be easiest to be able to render to HTML each chunk with a method. One possibility is a function htmlChunk(object, "descriptions") where the dispatch would be done using a switch statement or similar.

A more flexible approach is to create dummy classes for each output “chunk”. Each dummy class contains (subclasses) PackageDescription and that’s it. We then can take advantage of the behavior of the as method to convert.

## Define classes like this for each logical document chunk
setClass("pdAuthorMaintainerInfo", contains="PackageDetail")
setClass("pdVignetteInfo", contains="PackageDetail")
## Then define a htmlValue method
setMethod("htmlValue", signature(object="pdDescriptionInfo"),
           function(object) {
           node <- xmlNode("p", cleanText(object@Description),
                            attrs=c(class="description"))
           node
         })
## Then you can make use of all this...
## Assume object contains a PackageDetail instance
authorInfo <- as(object, "pdAuthorMaintainerInfo")
dom$addNode(htmlValue(authorInfo))

One advantage of this setup is that we can now define a method to generate complete HTML documents that will work for all the dummy classes. Hence mix and match.

A note on the htmlValue method for PackageDetail

We could parameterize as follows. Not sure this makes things easier to follow, but it does demonstrate how you could start building up documents in a more programatic fashion.

details <- list(heading=list(tag="h3", text="Details"),
                content="pdDetailsInfo")
downloads <- list(heading=list(tag="h3", text="Download Package"),
                 content="pdDownloadInfo")
vignettes <- list(heading=list(tag="h3",
                    text="Vignettes (Documentation)"),
                 content="pdVignetteInfo")
doSection <- function(sec) {
   dom$addTag(sec$heading$tag, sec$heading$text)
   secObj <- as(object, sec$content)
   dom$addNode(htmlValue(secObj))
}
lapply(list(details, downloads, vignettes), doSection)