Package 'AnnotationForge' reference manual

Title:	Tools for building SQLite-based annotation data packages
Description:	Provides code for generating Annotation packages and their databases. Packages produced are intended to be used with AnnotationDbi.
Authors:	Marc Carlson [aut], Hervé Pagès [aut], Madelyn Carlson [ctb] ('Creating probe packages' vignette translation from Sweave to Rmarkdown / HTML), Bioconductor Package Maintainer [cre]
Maintainer:	Bioconductor Package Maintainer <[email protected]>
License:	Artistic-2.0
Version:	1.49.1
Built:	2025-03-20 20:43:23 UTC
Source:	https://github.com/bioc/AnnotationForge

available.db0pkgs

Description

Get the list of intermediate annotation data packages (.db0 data packages) that are currently available on the Bioconductor repositories for your version of R/Bioconductor.

Or get a list of schemas supported by AnnotationDbi.

Usage

  available.db0pkgs()
  available.dbschemas()
  available.chipdbschemas()
available.db0pkgs()
  available.dbschemas()
  available.chipdbschemas()

Details

The SQLForge code uses a series of intermediate database packages that are necessary to build updated custom annotation packages. These packages must be installed or updated if you want to make a custom annotation package for a particular organism. These special intermediate packages contain the latest freeze of the data needed to build custom annotation data packages and are easily identified by the fact that they end with the special ".db0" suffix. This function will list all such packages that are available for a specific version of bioconductor.

The available.dbschemas() and available.chipdbschemas() functions allow you to get a list of the schema names that are available similar to how you can list the available ".db0" packages by using available.db0pkgs(). This list of shemas is useful (for example) when you want to build a new package and need to know the name of the schema you want to use.

Value

A character vector containing the names of the available ".db0" data packages. Or a a character vector listing the names of the available schemas.

Author(s)

Hervé Pagès and Marc Carlson

Examples

  # Get the list of BSgenome data packages currently available:
  available.db0pkgs()

  ## Not run: 
    # Make your choice and install like this:
    library(BiocManager)
    install("human.db0")
  
## End(Not run)

  # Get the list of chip DB schemas:
  available.chipdbschemas()

  # Get the list of ALL DB schemas:
  available.dbschemas()

# Get the list of BSgenome data packages currently available:
  available.db0pkgs()

  ## Not run: 
    # Make your choice and install like this:
    library(BiocManager)
    install("human.db0")
  
## End(Not run)

  # Get the list of chip DB schemas:
  available.chipdbschemas()

  # Get the list of ALL DB schemas:
  available.dbschemas()

Generates the seqnames.db package and database

Description

This function is used to generate the seqnames.db package and it's database from the csv files contained in the template for this package within AnnotationForge. The csv files are converted into database tables, and the DB is packaged into a new seqnames.db package.

Usage

  generateSeqnames.db(version, outdir=".")
generateSeqnames.db(version, outdir=".")

Arguments

`version`	Character. Version number for the final package.
`outdir`	Character. Path to output directory where the package is to be placed. By default the current working directory will be used.

Details

The generateSeqnames.db function allows users to regenerate the seqnames.db package from csv sources contained in the currently installed AnnotationForge package. It is expected that the typical user will not need to use this at all, but in case they do, we have made it available. We expect that the more common use case is someone who wants to make some new chromosome conventions available for the world. It is expected that this person will more typicalyl be charitable and want to share their conventions, so they could share their .csv files with us and we would add them to AnnotationForge, install the updated package and then run this function to make a new package.

The .csv files need to be formatted the same as the ones that are currently in the template in AnnotationForge. Examples of these .csv files can be found in AnnotationForge in the "inst/seqnames-template/inst/extdata/dataFiles/" directory. Each file must be named after it's corresponding genus and species with an underscore separator and a .csv file extension. The 1st line of each file defines columns that are the names of the corresponding naming conventions. And the chromosome names are then listed below this header line such that the equivalent names for the different styles share the same row.

So for example the 1st four rows of Mus_musculus look like this (but with only one newline at the end of each row):

UCSC,NCBI,ensembl

chr1,1,1

chr2,2,2

chr3,3,3

etc.

Once you have your file ready your only need to place it in the same dir in AnnoationDbi (with the other files), install AnnotationForge, and then run this function to generate a new seqnames.db package. Of course, if you have a useful set of conventions or species to contribute, it would be best if you gave your .csv files to the Bioconductor core team so that we can add these files to the official version of AnnotationForge and so that they can occur in the official seqnames.db package.

Value

A new seqnames.db package, complete with all the latest data stored in the dataFiles subdirectory

Author(s)

Marc Carlson

Examples

## Not run: 
  generateSeqnames.db(version="1.0.0")

## End(Not run)
## Not run: 
  generateSeqnames.db(version="1.0.0")

## End(Not run)

Read a 1lq file for an Affymetrix genechip

Description

Read a 1lq file for an Affymetrix genechip

Usage

getProbeData_1lq(arraytype, datafile, pkgname = NULL)
getProbeData_1lq(arraytype, datafile, pkgname = NULL)

Arguments

`arraytype`	Character. Array type (e.g. 'Scerevisiaetiling)
`datafile`	Character. The filename of the input data file. If omitted a default name is constructed from `arraytype` (see this function's source code).
`pkgname`	Character. Package name. If NULL the name is derived from `arraytype`.

Details

This function serves as an interface between the (1) representation of array probe information data in the packages that are generated by makeProbePackage and (2) the vendor- and possibly version-specific way the data are represented in datafile.

Value

A list with three components

`dataEnv`	an environment which contains the data frame with the probe sequences and the other probe data.
`symVal`	a named list of symbol value substitutions which can be used to customize the man pages. See `createPackage`.
`pkgname`	a character with the package name; will be the same as the function parameter `pkgname` if it was specified; otherwise, the name is constructed from the parameter `arraytype`.

Examples

## makeProbePackage(
##    arraytype = "Scerevisiaetiling",
##    maintainer= "Wolfgang Huber <[email protected]>",
##    version   = "1.1.0",
##    datafile  = "S.cerevisiae_tiling.1lq",
##    importfun = "getProbeData_1lq")
## makeProbePackage(
##    arraytype = "Scerevisiaetiling",
##    maintainer= "Wolfgang Huber <[email protected]>",
##    version   = "1.1.0",
##    datafile  = "S.cerevisiae_tiling.1lq",
##    importfun = "getProbeData_1lq")

Read a data file describing the probe sequences on an Affymetrix genechip

Description

Read a data file describing the probe sequences on an Affymetrix genechip

Usage

getProbeDataAffy(arraytype, datafile, pkgname = NULL, comparewithcdf = FALSE)
getProbeDataAffy(arraytype, datafile, pkgname = NULL, comparewithcdf = FALSE)

Arguments

`arraytype`	Character. Array type (e.g. 'HG-U133A')
`datafile`	Character with the filename of the input data file, or a connection (see example). If omitted a default name is constructed from `arraytype` (for details you will need to consult this function's source code).
`pkgname`	Character. Package name. If NULL the name is derived from `arraytype`.
`comparewithcdf`	Logical. If TRUE, run a consistency check against a CDF package of the same name (what used to be Laurent's "extraparanoia".)

Details

datafile is a tabulator-separated file with one row per probe, and column names 'Probe X', 'Probe Y', 'Probe Sequence', and 'Probe.Set.Name'. See the vignette for an example.

Value

A list with three components

`dataEnv`	an environment which contains the data frame with the probe sequences and the other probe data.
`symVal`	a named list of symbol value substitutions which can be used to customize the man pages. See `createPackage`.
`pkgname`	a character with the package name; will be the same as the function parameter `pkgname` if it was specified; otherwise, the name is constructed from the parameter `arraytype`.

Examples

  ## Please refer to the vignette
## Please refer to the vignette

Create an SQLite-based annotation package

Description

Create an SQLite-based annotation package from an SQLite file.

Usage

  makeAnnDbPkg(x, dbfile, dest_dir=".", no.man=FALSE, ...)
  loadAnnDbPkgIndex(file)
makeAnnDbPkg(x, dbfile, dest_dir=".", no.man=FALSE, ...)
  loadAnnDbPkgIndex(file)

Arguments

`x`	A AnnDbPkgSeed object, a list, a string or a regular expression.
`dbfile`	The path to the SQLite containing the annotation data for the package to build.
`dest_dir`	The directory where the package will be created.
`file`	The path to a DCF file containing the list of annotation packages to build.
`no.man`	If `TRUE` then no man page is included in the package.
`...`	Extra args used for extra filtering.

Examples

  ## With a "AnnDbPkgSeed" object:
  seed <- new("AnnDbPkgSeed",
      Package="hgu133a2.db",
      Version="0.0.99",
      PkgTemplate="HUMANCHIP.DB",
      AnnObjPrefix="hgu133a2"
  )
  if (FALSE)
      makeAnnDbPkg(seed, "path/to/hgu133a2.sqlite")

  ## With package names:
  ## (Note that in this case makeAnnDbPkg() will use the package descriptions
  ## found in the master index file ANNDBPKG-INDEX.TXT located in the
  ## AnnotationDbi package.)
  if (FALSE)
      makeAnnDbPkg(c("hgu95av2.db", "hgu133a2.db"))

  ## A character vector of length 1 is treated as a regular expression:
  if (FALSE)
      makeAnnDbPkg("hgu.*")
  ## To make all the packages described in the master index:
  if (FALSE)
      makeAnnDbPkg("")
  ## Extra args can be used to narrow down the roaster of packages to make:
  if (FALSE) {
      makeAnnDbPkg("", PkgTemplate="HUMANCHIP.DB", manufacturer="Affymetrix")
      makeAnnDbPkg(".*[3k]\\.db", species=c("Mouse", "Rat"))
  }

  ## The master index file ANNDBPKG-INDEX.TXT can be loaded with:
  loadAnnDbPkgIndex()
## With a "AnnDbPkgSeed" object:
  seed <- new("AnnDbPkgSeed",
      Package="hgu133a2.db",
      Version="0.0.99",
      PkgTemplate="HUMANCHIP.DB",
      AnnObjPrefix="hgu133a2"
  )
  if (FALSE)
      makeAnnDbPkg(seed, "path/to/hgu133a2.sqlite")

  ## With package names:
  ## (Note that in this case makeAnnDbPkg() will use the package descriptions
  ## found in the master index file ANNDBPKG-INDEX.TXT located in the
  ## AnnotationDbi package.)
  if (FALSE)
      makeAnnDbPkg(c("hgu95av2.db", "hgu133a2.db"))

  ## A character vector of length 1 is treated as a regular expression:
  if (FALSE)
      makeAnnDbPkg("hgu.*")
  ## To make all the packages described in the master index:
  if (FALSE)
      makeAnnDbPkg("")
  ## Extra args can be used to narrow down the roaster of packages to make:
  if (FALSE) {
      makeAnnDbPkg("", PkgTemplate="HUMANCHIP.DB", manufacturer="Affymetrix")
      makeAnnDbPkg(".*[3k]\\.db", species=c("Mouse", "Rat"))
  }

  ## The master index file ANNDBPKG-INDEX.TXT can be loaded with:
  loadAnnDbPkgIndex()

Making a chip package from annotations available from data.frame of probes mapped to gene IDs and an existing org package.

Description

The makeChipPackage function allows the user to make an chip package from a data.frame that has two columns to define a set of probes and the gene IDs that they map to as well as an org package that contains data about those gene IDs (where the gene IDs can be used as a foreign key).

makeChipPackage is intended to be compatible with any org packages that are generated by makeOrgPackage as well as most of the older legacy org packages that were based on more popular model organisms. The one exception is the legacy org package for yeast org.Sc.sgd.db since its internal schema is just too different from everything else.

Packages produced in this way can not support the older bimap objects unless they are pointing to an older legacy org package. All packages should support select().

Usage


  makeChipPackage(prefix,
                  probeFrame, 
                  orgPkgName, 
                  version,
                  maintainer,
                  author,
                  outputDir = ".",
                  tax_id,
                  genus,
                  species,
                  optionalAccessionsFrame=NULL)
makeChipPackage(prefix,
                  probeFrame, 
                  orgPkgName, 
                  version,
                  maintainer,
                  author,
                  outputDir = ".",
                  tax_id,
                  genus,
                  species,
                  optionalAccessionsFrame=NULL)

Arguments

`prefix`	The package name
`probeFrame`	data.frame with two columns. The 1st column are the probes and the second column are genes (gene IDs). The gene IDs must be the main ID type for the org package that is the named in the 3rd argument.
`orgPkgName`	The name of the org package that you want to make the chip package to go with
`version`	What is the version number for this package? format: \'x.y.z\'
`maintainer`	Who is the package maintainer? (must include email to be valid)
`author`	Who is the creator of this package?
`outputDir`	A path where the package source should be assembled.
`tax_id`	The Taxonomy ID that represents your organism. (NCBI has a nice online browser for finding the one you need)
`genus`	Single string indicating the genus.
`species`	Single string indicating the species.
`optionalAccessionsFrame`	If you want to also include accessions for your probes (not used for mapping them) then you can include those here.

Value

The path to the package that just created. This is useful for calling install.packages as the next step.

Author(s)

M. Carlson

Examples


if(interactive()){
## 1st lets list some authentic entrez gene IDs
geneIds <- c("1","10","100","1000","10000","100008586")
probeNames <- paste("probe", 1:length(geneIds), sep="")
probeFrame <- data.frame(probes=probeNames, genes=geneIds)

makeChipPackage(prefix='testChip',
                probeFrame=probeFrame,
                orgPkgName='org.Hs.eg.db',
                version='0.99.1',
                maintainer='Some One <[email protected]>',
                author='Some One <[email protected]>',
                outputDir='.',
                tax_id='59729',
                genus='Homo',
                species='sapiens')

## then you can call install.packages based on the return value
install.packages('./testChip.db', repos=NULL)

}

if(interactive()){
## 1st lets list some authentic entrez gene IDs
geneIds <- c("1","10","100","1000","10000","100008586")
probeNames <- paste("probe", 1:length(geneIds), sep="")
probeFrame <- data.frame(probes=probeNames, genes=geneIds)

makeChipPackage(prefix='testChip',
                probeFrame=probeFrame,
                orgPkgName='org.Hs.eg.db',
                version='0.99.1',
                maintainer='Some One <[email protected]>',
                author='Some One <[email protected]>',
                outputDir='.',
                tax_id='59729',
                genus='Homo',
                species='sapiens')

## then you can call install.packages based on the return value
install.packages('./testChip.db', repos=NULL)

}

Creates a sqlite database, and then makes an annotation package with it

Description

This function 1st creates a SQLite file useful for making a SQLite based annotation package by using the correct popXXXCHIP_DB function. Next, this function produces an annotation package featuring the sqlite database produced. All makeXXXXChip_DB functions REQUIRE that you previously have installed the appropriate XXXX.db0 package. Call the function available.db0pkgs() to see what your options are, and then install the appropriate package with BiocManager::install().

Usage

   makeDBPackage(schema, ...)

#  usage case with required arguments
#  makeDBPackage(schema, affy, prefix, fileName, baseMapType, version)

#  usage case with all arguments
#  makeDBPackage(schema, affy, prefix, fileName, otherSrc, chipMapSrc,
#  chipSrc, baseMapType, outputDir, version, manufacturer, chipName,
#  manufacturerUrl, author, maintainer)
makeDBPackage(schema, ...)

#  usage case with required arguments
#  makeDBPackage(schema, affy, prefix, fileName, baseMapType, version)

#  usage case with all arguments
#  makeDBPackage(schema, affy, prefix, fileName, otherSrc, chipMapSrc,
#  chipSrc, baseMapType, outputDir, version, manufacturer, chipName,
#  manufacturerUrl, author, maintainer)

Arguments

`schema`	String listing the schema that you want to use to make the DB. You can list schemas with available.dbschemas()
`affy`	Boolean to indicate if this is starting from an affy csv file or not. If it is, then that will be parsed to make the sqlite file, if not, then you can feed a tab delimited file with IDs as was done before with AnnBuilder.
`prefix`	prefix is the first part of the eventual desired package name. (ie. "prefix.db")
`fileName`	The path and filename for the file to be parsed. This can either be an affy csv file or it can be a more classic file type.
`otherSrc`	The path and filenames to any other lists of IDs which might add information about how a probe will map.
`chipMapSrc`	The path and filename to the intermediate database containing the mapping data for allowed ID types and how these IDs relate to each other. If not provided, then the appropriate source DB from the most current .db0 package will be used instead.
`chipSrc`	The path and filename to the intermediate database containing the annotation data for the sqlite to build. If not provided, then the appropriate source DB from the most current .db0 package will be used instead.
`baseMapType`	The type of ID that is used for the initial base mapping. If using a classic base mapping file, this should be the ID type present in the fileName. This can be any of the following values: "gb" = for genbank IDs "ug" = unigene IDs "eg" = Entrez Gene IDs "refseq" = refseq IDs "gbNRef" = mixture of genbank and refseq IDs
`outputDir`	Where you would like the output files to be placed.
`version`	What is the version number for the desired package.
`manufacturer`	Who made the chip being described.
`chipName`	What is the name of the chip.
`manufacturerUrl`	URL for manufacturers website.
`author`	List of authors involved in making the package.
`maintainer`	List of package maintainers with email addresses for contact purposes.
`...`	Just used so we can have a wrapper function. Ignore this argument.

Examples

## Not run: 
##Build the hgu95av2.db package
makeDBPackage(
    "HUMANCHIP_DB",
    affy = TRUE,
    prefix = "hgu95av2",
    fileName = "~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/HG_U95Av2_annot.csv.070824",
    otherSrc = c(
        EA="~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/hgu95av2.EA.txt",
        UMICH="~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/hgu95av2_UMICH.txt"),
    baseMapType = "gbNRef",
    version = "1.0.0",
    manufacturer = "Affymetrix",
    chipName = "hgu95av2",
    manufacturerUrl = "http://www.affymetrix.com")

## End(Not run)
## Not run: 
##Build the hgu95av2.db package
makeDBPackage(
    "HUMANCHIP_DB",
    affy = TRUE,
    prefix = "hgu95av2",
    fileName = "~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/HG_U95Av2_annot.csv.070824",
    otherSrc = c(
        EA="~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/hgu95av2.EA.txt",
        UMICH="~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/hgu95av2_UMICH.txt"),
    baseMapType = "gbNRef",
    version = "1.0.0",
    manufacturer = "Affymetrix",
    chipName = "hgu95av2",
    manufacturerUrl = "http://www.affymetrix.com")

## End(Not run)

Builds an individual DB from online files at inparanoid that is in turn meant for generating Inparanoid 8 based objects.

Description

This is really meant to be used by AnnotationHubData for eventual exposure through the AnnotationHub. Users who are interested in Inparanoid 8 should look for the relevant objects to be in AnnotationHub. This function is just a database builder that allows us to put the data up there. So this is not really intended for use by end users.

Usage

  makeInpDb(dir, dataDir)
makeInpDb(dir, dataDir)

Arguments

`dir`	the full path to the online Inparanpoid 8 resource.
`dataDir`	directory where the source inparanoid.sqlite DB is

Value

A SQLite DB is produced but this is not returned from the function. loadDb is expected to be used by AnnotationHub to convert these into objects after the sqlite DB is downloaded from the hub server.

Author(s)

M. Carlson

Examples

## There are paths to resource data for each set of files at Inparanoid.
## This is how you can turn those files inta sqlite DB.
if(interactive()){
 db <- makeInpDb(
        dir="http://inparanoid.sbc.su.se/download/current/Orthologs/A.aegypti/",
        dataDir=tempdir())
}
## There are paths to resource data for each set of files at Inparanoid.
## This is how you can turn those files inta sqlite DB.
if(interactive()){
 db <- makeInpDb(
        dir="http://inparanoid.sbc.su.se/download/current/Orthologs/A.aegypti/",
        dataDir=tempdir())
}

Making an organism package from annotations available from a set of named data.frames.

Description

The makeOrgPackage function allows the user to make an organism package from any collection of data frames that are united by a common gene ID.

Usage


  makeOrgPackage(...,
                 version,
                 maintainer,
                 author,
                 outputDir=getwd(),
                 tax_id,
                 genus=NULL,
                 species=NULL,
                 goTable=NULL,
                 verbose=TRUE)
makeOrgPackage(...,
                 version,
                 maintainer,
                 author,
                 outputDir=getwd(),
                 tax_id,
                 genus=NULL,
                 species=NULL,
                 goTable=NULL,
                 verbose=TRUE)

Arguments

`...`	A set of data.frames containing annotation data. Each of these arguments must be named. Those names will become the names of the tables in the final database. Also, there are no rownames for these data.frames, and the colnames are the names that will be used as extractable fields in the final package. In other words they will be what comes back when you call `columns` and `keytypes` Finally, the 1st column of every data.frame must be labeled GID, and correspond to a gene ID that is universal for the entire set of data.frames. The GID is how the different tables will be joined internally
`version`	What is the version number for this package? format: \'x.y.z\'
`maintainer`	Who is the package maintainer? (must include email to be valid)
`author`	Who is the creator of this package?
`outputDir`	A path where the package source should be assembled.
`tax_id`	The Taxonomy ID that represents your organism. (NCBI has a nice online browser for finding the one you need)
`genus`	Single string indicating the genus.
`species`	Single string indicating the species.
`goTable`	By default, this is NULL, but if one of your '...' data.frames has GO annotations, then this name will be the name of that argument. When you specify this, `makeOrgPackage` will process that data.frame to remove extra GO terms (that are too new for the current GO.db) and also will generate a table for GOALL data (based on ancestor terms for each mapping from GO.db) and for each ontology. This table will also be checked to make sure that it has exactly THREE columns, that must be named GID, GO and EVIDENCE. These must correspond to the gene IDs, GO IDs and evidence codes respectively. GO IDs should be formatted like this to work with other DBs in the project: \'GO:XXXXXXX\'.
`verbose`	When TRUE progress messages are displayed.

Value

The path to the package that just created. This is useful for calling install.packages as the next step.

Author(s)

M. Carlson

Examples


if(interactive()){

## Makes an organism package for Zebra Finch data.frames:
finchFile <- system.file("extdata","finch_info.txt",package="AnnotationForge")
finch <- read.table(finchFile,sep="\t")

## not that this is how it should always be, but that it *could* be this way.
fSym <- finch[,c(2,3,9)]
fSym <- fSym[fSym[,2]!="-",]
fSym <- fSym[fSym[,3]!="-",]
colnames(fSym) <- c("GID","SYMBOL","GENENAME")

fChr <- finch[,c(2,7)]
fChr <- fChr[fChr[,2]!="-",]
colnames(fChr) <- c("GID","CHROMOSOME")

finchGOFile <- system.file("extdata","GO_finch.txt",package="AnnotationForge")
fGO <- read.table(finchGOFile,sep="\t")
fGO <- fGO[fGO[,2]!="",]
fGO <- fGO[fGO[,3]!="",]
colnames(fGO) <- c("GID","GO","EVIDENCE")

makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
               version="0.1",
               maintainer="Some One <[email protected]>",
               author="Some One <[email protected]>",
               outputDir = ".",
               tax_id="59729",
               genus="Taeniopygia",
               species="guttata",
               goTable="go")

## then you can call install.packages based on the return value
install.packages("./org.Tguttata.eg.db", repos=NULL)

}

if(interactive()){

## Makes an organism package for Zebra Finch data.frames:
finchFile <- system.file("extdata","finch_info.txt",package="AnnotationForge")
finch <- read.table(finchFile,sep="\t")

## not that this is how it should always be, but that it *could* be this way.
fSym <- finch[,c(2,3,9)]
fSym <- fSym[fSym[,2]!="-",]
fSym <- fSym[fSym[,3]!="-",]
colnames(fSym) <- c("GID","SYMBOL","GENENAME")

fChr <- finch[,c(2,7)]
fChr <- fChr[fChr[,2]!="-",]
colnames(fChr) <- c("GID","CHROMOSOME")

finchGOFile <- system.file("extdata","GO_finch.txt",package="AnnotationForge")
fGO <- read.table(finchGOFile,sep="\t")
fGO <- fGO[fGO[,2]!="",]
fGO <- fGO[fGO[,3]!="",]
colnames(fGO) <- c("GID","GO","EVIDENCE")

makeOrgPackage(gene_info=fSym, chromosome=fChr, go=fGO,
               version="0.1",
               maintainer="Some One <[email protected]>",
               author="Some One <[email protected]>",
               outputDir = ".",
               tax_id="59729",
               genus="Taeniopygia",
               species="guttata",
               goTable="go")

## then you can call install.packages based on the return value
install.packages("./org.Tguttata.eg.db", repos=NULL)

}

Make an organism package from annotations available from NCBI.

Description

The makeOrgPackageFromNCBI function allows the user to make an organism package from NCBI annotations available from the NCBI.

Usage


  makeOrgPackageFromNCBI(
    version=,
    maintainer,
    author,
    outputDir=getwd(),
    tax_id,
    genus=NULL,
    species=NULL,
    NCBIFilesDir=getwd(),
    databaseOnly=FALSE,
    useDeprecatedStyle=FALSE,
    rebuildCache=TRUE,
    verbose=TRUE,
    ensemblVersion=NULL)
makeOrgPackageFromNCBI(
    version=,
    maintainer,
    author,
    outputDir=getwd(),
    tax_id,
    genus=NULL,
    species=NULL,
    NCBIFilesDir=getwd(),
    databaseOnly=FALSE,
    useDeprecatedStyle=FALSE,
    rebuildCache=TRUE,
    verbose=TRUE,
    ensemblVersion=NULL)

Arguments

`version`	Package version in 'x.y.z' format.
`maintainer`	Package maintainer followed by email
`author`	Creator of package.
`outputDir`	Path where the package source should be assembled.
`tax_id`	The Taxonomy ID that represents the organism.
`genus`	Single string indicating the genus.
`species`	Single string indicating the species.
`NCBIFilesDir`	When a path is given, the files used to create the DB are saved locally.
`databaseOnly`	When TRUE, a DB is created without the package infrastructure. Used for OrgDb packages hosted on AnnotationHub.
`useDeprecatedStyle`	Legacy support for older package style with bimaps.
`rebuildCache`	When TRUE, the files used to create the DB are refreshed (i.e., re-downloaded) if the timestamp is greater than 24 hours old. When FALSE, the temporary NCBI.sqlite DB and final package are re-generated from local files in `outputDir`. Used internally and for testing.
`verbose`	When TRUE, status messages are printed.
`ensemblVersion`	Ensmbl version to use. When NULL, uses the current version.

Details

makeOrgPackageFromNCBI downloads multiple files and assembles a 33 GB database in NCBIFilesDir. The first time the function is run it may take well over an hour; subsequent calls reuse files from the cache and are much faster. The default behavior of makeOrgPackageFromNCBI attempts to refresh the cached files each day (suppress with rebuildCache = FALSE).

The files that are downloaded from NCBI may take longer to download than the default timeout permits. We encourage users to set a options(timeout=xxx) to encourage the files to finish downloading. Adjust the timelimit according to download speed and capacity.

Depending on the organism, the database file could reach up to 49 G. You will need ~62G free for downloading files and creating the largest database as of February 2022.

Some orgDbs are already provided through AnnotationHub. See package AnnotationHub::AnnotationHub

Value

Nothing returned to the R session. Just creates an organism annotation package.

Author(s)

M. Carlson

Examples


## Not run: 
## Makes an organism package for Zebra Finch from NCBI:

makeOrgPackageFromNCBI(version = "0.1",
                       author = "Some One <[email protected]>",
                       maintainer = "Some One <[email protected]>",
                       outputDir = ".",
                       tax_id = "59729",
                       genus = "Taeniopygia",
                       species = "guttata")


## End(Not run)

## Not run: 
## Makes an organism package for Zebra Finch from NCBI:

makeOrgPackageFromNCBI(version = "0.1",
                       author = "Some One <[email protected]>",
                       maintainer = "Some One <[email protected]>",
                       outputDir = ".",
                       tax_id = "59729",
                       genus = "Taeniopygia",
                       species = "guttata")


## End(Not run)

Make a package with probe sequence related data for microarrays

Description

Make a package with probe sequence related data for microarrays

Usage

makeProbePackage(arraytype,
    importfun = "getProbeDataAffy",
    maintainer,
    version,
    species,
    pkgname = NULL,
    outdir  = ".",
    quiet = FALSE,
    check = TRUE, build = TRUE, unlink = TRUE, ...)
makeProbePackage(arraytype,
    importfun = "getProbeDataAffy",
    maintainer,
    version,
    species,
    pkgname = NULL,
    outdir  = ".",
    quiet = FALSE,
    check = TRUE, build = TRUE, unlink = TRUE, ...)

Arguments

`arraytype`	Character. Name of array type (typically a vendor's name like "HG-U133A").
`importfun`	Character. Name of a function that can read the probe sequence data e.g. from a file. See `getProbeDataAffy` for an example.
`maintainer`	Character. Name and email address of the maintainer.
`version`	Character. Version number for the package.
`species`	Character. Species name in the format Genus_species (e.g., Homo_sapiens)
`pkgname`	Character. Name of the package. If missing, a name is created from `arraytype`.
`outdir`	Character. Path where the package is to be written.
`quiet`	Logical. If `TRUE` do not print statements on progress on the console
`check`	Logical. If `TRUE` call `R CMD check` on the package
`build`	Logical. If `TRUE` call `R CMD build` on the package
`unlink`	Logical. If `TRUE` unlink (remove) the `check` directory (only relevant if `check=TRUE`)
`...`	Further arguments that get passed along to `importfun`

Details

See vignette.

Important note for Windows users: Building and checking packages requires some tools outside of R (e.g. a Perl interpreter). While these tools are standard with practically every Unix, they do not come with MS-Windows and need to be installed separately on your computer. See http://www.murdoch-sutherland.com/Rtools. If you just want to build probe packages, you will not need the compilers, and the "Windows help" stuff is optional.

Examples

filename <- system.file("extdata", "HG-U95Av2_probe_tab.gz", 
     package="AnnotationDbi")
outdir   <- tempdir()
me       <- "Wolfgang Huber <[email protected]>"
makeProbePackage("HG-U95Av2",
                 datafile   = gzfile(filename, open="r"),
                 outdir     = outdir,
                 maintainer = me, 
                 version    = "0.0.1",
                 species    = "Homo_sapiens", 
                 check      = FALSE)
dir(outdir)
filename <- system.file("extdata", "HG-U95Av2_probe_tab.gz", 
     package="AnnotationDbi")
outdir   <- tempdir()
me       <- "Wolfgang Huber <[email protected]>"
makeProbePackage("HG-U95Av2",
                 datafile   = gzfile(filename, open="r"),
                 outdir     = outdir,
                 maintainer = me, 
                 version    = "0.0.1",
                 species    = "Homo_sapiens", 
                 check      = FALSE)
dir(outdir)

Populates an SQLite DB with and produces a schema definition

Description

Creates SQLite file useful for making a SQLite based annotation package. Also produces the schema file which details the schema for the database produced.

Usage

   populateDB(schema, ...)

#  usage case with required arguments
#  populateDB(schema, prefix, chipSrc, metaDataSrc)

#  usage case with all possible arguments
#  populateDB(schema, affy, prefix, fileName, chipMapSrc, chipSrc,
#  metaDataSrc, otherSrc, baseMapType, outputDir, printSchema)
populateDB(schema, ...)

#  usage case with required arguments
#  populateDB(schema, prefix, chipSrc, metaDataSrc)

#  usage case with all possible arguments
#  populateDB(schema, affy, prefix, fileName, chipMapSrc, chipSrc,
#  metaDataSrc, otherSrc, baseMapType, outputDir, printSchema)

Arguments

`schema`	String listing the schema that you want to use to make the DB. You can list schemas with available.dbschemas()
`affy`	Boolean to indicate if this is starting from an affy csv file or not. If it is, then that will be parsed to make the sqlite file, if not, then you can feed a tab delimited file with IDs as was done before with AnnBuilder.
`prefix`	prefix is the first part of the eventual desired package name. (ie. "prefix.sqlite")
`fileName`	The path and filename for the mapping file to be parsed. This can either be an affy csv file or it can be a more classic file type. This is only needed when making chip packages.
`chipMapSrc`	The path and filename to the intermediate database containing the mapping data for allowed ID types and how these IDs relate to each other. If not provided, then the appropriate source DB from the most current .db0 package will be used instead.
`chipSrc`	The path and filename to the intermediate database containing the annotation data for the sqlite to build. If not provided, then the appropriate source DB from the most current .db0 package will be used instead.
`metaDataSrc`	Either a named character vector containing pertinent information about the metadata OR the path and filename to the intermediate database containing the metadata information for the package. If this is a custom package, then it must be a named vector with the following fields: metaDataSrc <- c( DBSCHEMA="the DB schema", ORGANISM="the organism", SPECIES="the species", MANUFACTURER="the manufacturer", CHIPNAME="the chipName", MANUFACTURERURL="the manufacturerUrl")
`otherSrc`	The path and filenames to any other lists of IDs which might add information about how a probe will map.
`baseMapType`	The type of ID that is used for the initial base mapping. If using a classic base mapping file, this should be the ID type present in the fileName. This can be any of the following values: "gb" = for genbank IDs "ug" = unigene IDs "eg" = Entrez Gene IDs "refseq" = refseq IDs "gbNRef" = mixture of genbank and refseq IDs
`outputDir`	Where you would like the output files to be placed.
`printSchema`	Boolean to indicate whether or not to produce an output of the schema (default is FALSE).
`...`	Just used so we can have a wrapper function. Ignore this argument.

Examples

## Not run: 
  ##Set up the metadata
  my_metaDataSrc <- c( DBSCHEMA="the DB schema",
                    ORGANISM="the organism",
                    SPECIES="the species",
                    MANUFACTURER="the manufacturer",
                    CHIPNAME="the chipName",
                    MANUFACTURERURL="the manufacturerUrl")  

  ##Builds the org.Hs.eg sqlite:
  populateDB(
    "HUMAN_DB",
    prefix="org.Hs.eg",
    chipSrc = "~/proj/mcarlson/sqliteGen/annosrc/db/chipsrc_human.sqlite",
    metaDataSrc = my_metaDataSrc,
    printSchema=TRUE)


  ##Builds hgu95av2.sqlite:
  populateDB(
      "HUMANCHIP_DB",
      affy=TRUE,
      prefix="hgu95av2",
      fileName="~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/HG_U95Av2.na27.annot.csv",
      metaDataSrc=my_metaDataSrc,
      baseMapType="gbNRef")


  ##Builds the ag.sqlite:
  populateDB("ARABIDOPSISCHIP_DB",
             affy=TRUE,
             prefix="ag",
             metaDataSrc=my_metaDataSrc)


  ##Builds yeast2.sqlite:  
  populateDB(
      "YEASTCHIP_DB",
      affy=TRUE,
      prefix="yeast2",
      fileName="~/proj/mcarlson/sqliteGen/srcFiles/yeast2/Yeast_2.na27.annot.csv",
      metaDataSrc=metaDataSrc)


## End(Not run)
## Not run: 
  ##Set up the metadata
  my_metaDataSrc <- c( DBSCHEMA="the DB schema",
                    ORGANISM="the organism",
                    SPECIES="the species",
                    MANUFACTURER="the manufacturer",
                    CHIPNAME="the chipName",
                    MANUFACTURERURL="the manufacturerUrl")  

  ##Builds the org.Hs.eg sqlite:
  populateDB(
    "HUMAN_DB",
    prefix="org.Hs.eg",
    chipSrc = "~/proj/mcarlson/sqliteGen/annosrc/db/chipsrc_human.sqlite",
    metaDataSrc = my_metaDataSrc,
    printSchema=TRUE)


  ##Builds hgu95av2.sqlite:
  populateDB(
      "HUMANCHIP_DB",
      affy=TRUE,
      prefix="hgu95av2",
      fileName="~/proj/mcarlson/sqliteGen/srcFiles/hgu95av2/HG_U95Av2.na27.annot.csv",
      metaDataSrc=my_metaDataSrc,
      baseMapType="gbNRef")


  ##Builds the ag.sqlite:
  populateDB("ARABIDOPSISCHIP_DB",
             affy=TRUE,
             prefix="ag",
             metaDataSrc=my_metaDataSrc)


  ##Builds yeast2.sqlite:  
  populateDB(
      "YEASTCHIP_DB",
      affy=TRUE,
      prefix="yeast2",
      fileName="~/proj/mcarlson/sqliteGen/srcFiles/yeast2/Yeast_2.na27.annot.csv",
      metaDataSrc=metaDataSrc)


## End(Not run)

Wrap up all the Base Databases into Packages for distribution

Description

Creates extremely simple packages from the base database files for distribution. This is a convenience function for wrapping up these packages in a consistent way each time.

Usage

  wrapBaseDBPackages(dbPath, destDir, version)
wrapBaseDBPackages(dbPath, destDir, version)

Arguments

`dbPath`	dbPath is just the path to the location of the latest intermediate sqlite source files. These files are then used to make base DB packages.
`destDir`	destination path for the newly minted packages.
`version`	version number to stamp onto these newly minted packages.

Examples

## Not run: 
  ##Make all of the intermediate DBs and place the new packages right here.
  wrapBaseDBPackages(dbPath, destDir = ".")

## End(Not run)
## Not run: 
  ##Make all of the intermediate DBs and place the new packages right here.
  wrapBaseDBPackages(dbPath, destDir = ".")

## End(Not run)

Package 'AnnotationForge'

Help Index

available.db0pkgs

Description

Usage

Details

Value

Author(s)

Examples

Generates the seqnames.db package and database

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Read a 1lq file for an Affymetrix genechip

Description

Usage

Arguments

Details

Value

See Also

Examples

Read a data file describing the probe sequences on an Affymetrix genechip

Description

Usage

Arguments

Details

Value

See Also

Examples

Create an SQLite-based annotation package

Description

Usage

Arguments

See Also

Examples

Making a chip package from annotations available from data.frame of probes mapped to gene IDs and an existing org package.

Description

Usage

Arguments

Value

Author(s)

Examples

Creates a sqlite database, and then makes an annotation package with it

Description

Usage

Arguments

Examples

Builds an individual DB from online files at inparanoid that is in turn meant for generating Inparanoid 8 based objects.

Description

Usage

Arguments

Value

Author(s)

Examples

Making an organism package from annotations available from a set of named data.frames.

Description

Usage

Arguments

Value

Author(s)

Examples

Make an organism package from annotations available from NCBI.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Make a package with probe sequence related data for microarrays

Description

Usage

Arguments

Details

Examples

Populates an SQLite DB with and produces a schema definition