Title: | Add resources to ExperimentHub |
---|---|
Description: | Functions to add metadata to ExperimentHub db and resource files to AWS S3 buckets. |
Authors: | Bioconductor Maintainer [cre] |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.33.0 |
Built: | 2024-10-30 07:17:47 UTC |
Source: | https://github.com/bioc/ExperimentHubData |
Add resource metadata to a local ExperimentHub database
addResources(pathToPackage, fileName=character(), insert = FALSE, ...)
addResources(pathToPackage, fileName=character(), insert = FALSE, ...)
pathToPackage |
Full path to data package including package name. |
fileName |
Name of single metadata file located in "inst/extdata". If none is provided the function looks for a file named "metadata.csv". |
insert |
A When |
... |
TDB. Currently not used. |
This function is used by the Bioconductor Core team to add new metadata to the production database.
When insert
is TRUE, addResources
attempts to add the metadata
to the local database. (NOTE: A local database can be created with the
ExperimentHub docker). Records in ExperimentHub must have unique file names.
If the new metadata have duplicate file names a warning is thrown
and the records are omitted from those added to the database.
This function does not add data to an AWS S3 bucket. ExperimentHub packages do not have 'recipes' that generate data on the fly. Instead, data files are provided by the maintainer in final form and added to the appropriate S3 location in a separate step.
A list of ExperimentHubMetadata objects.
## Not run: ## Generate metadata for inspection addResources("/home/vobencha/mypackage", insert=FALSE) ## Inset metadata into ExperimentHub database addResources("/home/vobencha/mypackage", insert=TRUE) ## End(Not run)
## Not run: ## Generate metadata for inspection addResources("/home/vobencha/mypackage", insert=FALSE) ## Inset metadata into ExperimentHub database addResources("/home/vobencha/mypackage", insert=TRUE) ## End(Not run)
The ExperimentHubMetadata object is used to represent records in the server data base.
ExperimentHubMetadata(ExperimentHubRoot=NA_character_, BiocVersion=BiocManager::version(), SourceUrl=NA_character_, SourceType=NA_character_, SourceVersion=NA_character_, SourceLastModifiedDate=as.POSIXct(NA_character_), SourceMd5=NA_character_, SourceSize=NA_real_, DataProvider=NA_character_, Title=NA_character_, Description=NA_character_, Maintainer=NA_character_, Species=NA_character_, TaxonomyId=NA_integer_, Genome=NA_character_, Tags=NA_character_, RDataClass=NA_character_, RDataDateAdded=as.POSIXct(NA_character_), RDataPath=NA_character_, Coordinate_1_based=TRUE, Notes=NA_character_, DispatchClass=NA_character_, PreparerClass=NA_character_, Location_Prefix='https://bioconductorhubs.blob.core.windows.net/experimenthub/')
ExperimentHubMetadata(ExperimentHubRoot=NA_character_, BiocVersion=BiocManager::version(), SourceUrl=NA_character_, SourceType=NA_character_, SourceVersion=NA_character_, SourceLastModifiedDate=as.POSIXct(NA_character_), SourceMd5=NA_character_, SourceSize=NA_real_, DataProvider=NA_character_, Title=NA_character_, Description=NA_character_, Maintainer=NA_character_, Species=NA_character_, TaxonomyId=NA_integer_, Genome=NA_character_, Tags=NA_character_, RDataClass=NA_character_, RDataDateAdded=as.POSIXct(NA_character_), RDataPath=NA_character_, Coordinate_1_based=TRUE, Notes=NA_character_, DispatchClass=NA_character_, PreparerClass=NA_character_, Location_Prefix='https://bioconductorhubs.blob.core.windows.net/experimenthub/')
ExperimentHubRoot |
|
SourceUrl |
|
SourceType |
|
SourceVersion |
|
SourceLastModifiedDate |
|
SourceMd5 |
|
SourceSize |
|
DataProvider |
|
Title |
|
Description |
|
Species |
|
TaxonomyId |
|
Genome |
|
Tags |
‘Tags’ are search terms used to define a subset of
resources in a For ExperimentHub resources, ‘Tags’ are automatically generated from the ‘biocViews’ in the DESCRIPTION file of the accompanying software package. ‘Tags’ values supplied by the user are not be entered in the database and are not part of the formal metadata. This 'controlled vocabulary' approach was taken to limit the search terms to a well defined set and may change in the future. |
RDataClass |
|
RDataDateAdded |
|
RDataPath |
|
Maintainer |
|
BiocVersion |
|
Coordinate_1_based |
|
DispatchClass |
A number of dispatch classes are pre-defined in
AnnotationHub/R/AnnotationHubResource-class.R with the suffix
‘Resource’. For example, if you have sqlite files, the
AnnotationHubResource-class.R defines SQLiteFileResource so the
DispatchClass would be SQLiteFile. Contact [email protected] if
you are not sure which class to use.The function
|
Location_Prefix |
|
Notes |
|
PreparerClass |
|
In practice, instances of this class are generated by a call to
addResources
or makeExperimentHubMetadata
instead of
a direct call to the constructor.
addResources
is a function used by the Bioconductor Core team when
adding new metadata records to the production database.
makeExperimentHubMetadata
and the low-level helper
A ExperimentHubMetadata object.
showClass("ExperimentHubMetadata")
showClass("ExperimentHubMetadata")
Make ExperimentHubMetadata objects from metadata.csv file located in the "inst/extdata/" package directory of an ExperimentHub package.
makeExperimentHubMetadata(pathToPackage, fileName=character())
makeExperimentHubMetadata(pathToPackage, fileName=character())
pathToPackage |
Full path to data package including the package name; no trailing slash |
fileName |
Name of single metadata file located in "inst/extdata". If none is provided the function looks for a file named "metadata.csv". |
makeExperimentHubMetadata: Reads the resource metadata in the metadata.csv file into a ExperimentHubMetadata object. The ExperimentHubMetadata is inserted in the ExperimentHub database. Intended for internal use or package authors checking the validity of package metadata.
Formatting metadata files:
makeExperimentHubMetadata
reads .csv files of metadata
located in "inst/extdata". Internal functions perform checks for
required columns and data types and can be used by package authors
to validate their metadata before submitting the package for
review.
The rows of the .csv file(s) represent individual Hub
resources (i.e., data objects) and the columns are the metadata
fields. All fields should be a single character string of length 1.
Required Fields in metadata file:
Title: character(1)
. Name of the resource. This can be
the exact file name (if self-describing) or a more complete
description.
Description: character(1)
. Brief description of the
resource, similar to the 'Description' field in a package
DESCRIPTION file.
BiocVersion: character(1)
. The first Bioconductor version
the resource was made available for. Unless removed from
the hub, the resource will be available for all versions
greater than or equal to this field. Generally the current
devel version of Bioconductor.
Genome: character(1)
. Genome. Can be NA
SourceType: character(1)
. Format of original data, e.g., FASTA,
BAM, BigWig, etc. getValidSourceTypes()
list currently
acceptable values. If nothing seems appropiate for your data
reach out to [email protected].
SourceUrl: character(1)
. Optional location of original
data files. Multiple urls should be provided as a comma separated
string.
SourceVersion: character(1)
. Version of original data.
Species: character(1)
. Species.For help on valid
species see getSpeciesList, validSpecies, or
suggestSpecies. Can be NA.
TaxonomyId: character(1)
. Taxonomy ID. There are
checks for valid taxonomyId given the Species which produce
warnings. See GenomeInfoDb::loadTaxonomyDb() for full validation
table. Can be NA.
Coordinate_1_based: logical
. TRUE if data are
1-based. Can be NA
DataProvider: character(1)
. Name of company or institution
that supplied the original (raw) data.
Maintainer: character(1)
. Maintainer name and email in the
following format: Maintainer Name <username@address>.
RDataClass: character(1)
. R / Bioconductor class the data
are stored in, e.g., GRanges, SummarizedExperiment,
ExpressionSet etc. If the file is loaded or read into R
what is the class of the object.
DispatchClass: character(1)
. Determines how data are
loaded into R. The value for this field should be
‘Rda’ if the data were serialized with save()
and
‘Rds’ if serialized with saveRDS
. The filename
should have the appropriate ‘rda’ or ‘rds’
extension. There are other available DispathClass types
and the function AnnotationHub::DispatchClassList()
A number of dispatch classes are pre-defined in
AnnotationHub/R/AnnotationHubResource-class.R with the suffix
‘Resource’. For example, if you have sqlite files, the
AnnotationHubResource-class.R defines SQLiteFileResource so
the DispatchClass would be SQLiteFile. Contact
[email protected] if you are not sure which class
to use. The function
AnnotationHub::DispatchClassList()
will output a
matrix of currently implemented DispatchClass and brief
description of utility. If a predefine class does not seem
appropriate contact [email protected]. An all
purpose DispathClass is FilePath
that instead of trying
to load the file into R, will only return the path to the
locally downloaded file.
Location_Prefix: character(1)
. Do not include this field
if data are stored in the Bioconductor AWS S3; it will be
generated automatically.
If data will be accessed from a location other than AWS S3 this field should be the base url.
RDataPath: character()
.This field should be the
remainder of the path to the resource. The
Location_Prefix
will be prepended to
RDataPath
for the full path to the resource.
If the resource is stored in Bioconductor's AWS S3
buckets, it should start with the name of the package associated
with the metadata and should not start with a leading
slash. It should include the resource file name. For
strongly associated files, like a bam file and its index
file, the two files should be separates with a colon
:
. This will link a single hub id with the multiple files.
Tags: character() vector
.
‘Tags’ are search terms used to define a subset of
resources in a Hub
object, e.g, in a call to query
.
‘Tags’ are automatically generated from the ‘biocViews’ in the DESCRIPTION and applied to all resources of the metadata file. Optionally, maintainers can define ‘Tags’ column of the metadata to define tags for each resource individually. Multiple ‘Tags’ are specified as a colon separated string, e.g., tags for two resources would look like this:
Tags=c("tag1:tag2:tag3", "tag1:tag3")
NOTE: The metadata file can have additional columns beyond the 'Required Fields' listed above. These values are not added to the Hub database but they can be used in package functions to provide an additional level of metadata on the resources.
More on Location_Prefix
and RDataPath
. These two fields make up
the complete file path url for downloading the data file. If using
the Bioconductor AWS S3 bucket the Location_Prefix should not be
included in the metadata file[s] as this field will be populated
automatically. The RDataPath
will be the directory structure you
uploaded to S3. If you uploaded a directory ‘MyAnnotation/’, and
that directory had a subdirectory ‘v1/’ that contained two files
‘counts.rds’ and ‘coldata.rds’, your metadata file will contain
two rows and the RDataPaths would be ‘MyAnnotation/v1/counts.rds’
and ‘MyAnnotation/v1/coldata.rds’. If you host your data on a
publicly accessible site you must include a base url as the
Location_Prefix
. If your data file was at
‘ftp://myinstiututeserver/biostats/project2/counts.rds’, your
metadata file will have one row and the Location_Prefix
would be
‘ftp://myinstiututeserver/’ and the RDataPath
would be
‘biostats/project2/counts.rds’.
A list of ExperimentHubMetadata
objects.
## makeExperimentHubMetadata() reads data from inst/scripts/<files>.csv ## into ExperimentHubMetadata objects. These objects are used to insert ## metadata into the production database. This function is used internally ## by addResources() and is not intended to be called directly. ## For an example of how this works we can use the GSE62944 ExperimentHub ## package. Download the source tarball from: # http://www.bioconductor.org/packages/devel/data/experiment/html/GSE62944.html ## and unpack it. Set 'pathToPackage' to point to the downloaded source. ## Then call the function: ## Not run: makeExperimentHubMetadata("path/to/mypackage") ## End(Not run)
## makeExperimentHubMetadata() reads data from inst/scripts/<files>.csv ## into ExperimentHubMetadata objects. These objects are used to insert ## metadata into the production database. This function is used internally ## by addResources() and is not intended to be called directly. ## For an example of how this works we can use the GSE62944 ExperimentHub ## package. Download the source tarball from: # http://www.bioconductor.org/packages/devel/data/experiment/html/GSE62944.html ## and unpack it. Set 'pathToPackage' to point to the downloaded source. ## Then call the function: ## Not run: makeExperimentHubMetadata("path/to/mypackage") ## End(Not run)