| Title: | A compilation of metadata from NCBI GEO |
|---|---|
| Description: | The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data of interest can be challenging using current tools. GEOmetadb is an attempt to make access to the metadata associated with samples, platforms, and datasets much more feasible. This is accomplished by parsing all the NCBI GEO metadata into a SQLite database that can be stored and queried locally. GEOmetadb is simply a thin wrapper around the SQLite database along with associated documentation. Finally, the SQLite database is updated regularly as new data is added to GEO and can be downloaded at will for the most up-to-date metadata. GEOmetadb paper: http://bioinformatics.oxfordjournals.org/cgi/content/short/24/23/2798 |
| Authors: | Jack Zhu and Sean Davis |
| Maintainer: | Jack Zhu <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.75.0 |
| Built: | 2026-05-30 08:16:42 UTC |
| Source: | https://github.com/bioc/GEOmetadb |
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data of interest can be challenging using current tools. GEOmetadb is an attempt to make access to the metadata associated with samples, platforms, and datasets much more feasible. This is accomplished by parsing all the NCBI GEO metadata into a SQLite database that can be stored and queried locally. GEOmetadb is simply a thin wrapper around the SQLite database along with associated documentation. Finally, the SQLite database is updated regularly as new data is added to GEO and can be downloaded at will for the most up-to-date metadata.
| Package: | GEOmetadb |
| Type: | Package |
| Version: | 1.1.5 |
| Date: | 2008-09-09 |
| License: | Artistic-2.0 |
Jack Zhu and Sean Davis
Maintainer: Jack Zhu <[email protected]>
## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } columnDescriptions(demo_sqlfile)[1:5,] a <- columnDescriptions(demo_sqlfile)[1:5,] b <- geoConvert('GPL96', out_type='GSM', sqlite_db_name=demo_sqlfile) ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } columnDescriptions(demo_sqlfile)[1:5,] a <- columnDescriptions(demo_sqlfile)[1:5,] b <- geoConvert('GPL96', out_type='GSM', sqlite_db_name=demo_sqlfile) ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()
Searching the GEOmetadb database requires a bit of knowledge about the structure of the database and column descriptions. This function returns those column descriptions for all columns in all tables in the database.
columnDescriptions(sqlite_db_name='GEOmetadb.sqlite')columnDescriptions(sqlite_db_name='GEOmetadb.sqlite')
sqlite_db_name |
The filename of the GEOmetadb sqlite database file |
A three-column data.frame including TableName, FieldName, and Description.
Sean Davis <[email protected]>
## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } columnDescriptions(demo_sqlfile)[1:5,] ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } columnDescriptions(demo_sqlfile)[1:5,] ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()
A common task is to find all the GEO entities of one type associated with another GEO entity (eg., find all GEO samples associated with GEO platform 'GPL96'). This function provides a very fast mapping between entity types to facilitate queries of this type.
geoConvert(in_list, out_type = c("gse", "gpl", "gsm", "gds", "smatrix"), sqlite_db_name = "GEOmetadb.sqlite")geoConvert(in_list, out_type = c("gse", "gpl", "gsm", "gds", "smatrix"), sqlite_db_name = "GEOmetadb.sqlite")
in_list |
Character vector of GEO entities to convert from. |
out_type |
Character vector of GEO entity types to which to convert. |
sqlite_db_name |
The filename of the GEOmetadb sqlite database file |
A list of data.frames.
Jack Zhu <[email protected]>
## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } ls = geoConvert('GPL96', out_type=c("GSE", 'GSM'), sqlite_db_name=demo_sqlfile) names(ls) head(ls[[1]]) ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } ls = geoConvert('GPL96', out_type=c("GSE", 'GSM'), sqlite_db_name=demo_sqlfile) names(ls) head(ls[[1]]) ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()
Query the gpl table and get GPL information of a given list of Bioconductor microarry annotation packages. Note currently the GEOmetadb does not contains all the mappings, but we are trying to construct a relative complete list.
getBiocPlatformMap(con, bioc='all')getBiocPlatformMap(con, bioc='all')
con |
Connection to the GEOmetadb.sqlite database |
bioc |
Character vector of Biocondoctor microarry annotation packages, e.g. c('hgu133plus2','hgu95av2'). 'all' returns all mappings. |
A six-column data.frame including GPL title, GPL accession, bioc_package, manufacturer, organism, data_row_count.
Jack Zhu <[email protected]>, Sean Davis <[email protected]>
## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } con <- dbConnect(SQLite(), demo_sqlfile) getBiocPlatformMap(con)[1:5,] getBiocPlatformMap(con, bioc=c('hgu133a','hgu95av2')) dbDisconnect(con) ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()## Use the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } con <- dbConnect(SQLite(), demo_sqlfile) getBiocPlatformMap(con)[1:5,] getBiocPlatformMap(con, bioc=c('hgu133a','hgu95av2')) dbDisconnect(con) ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()
This function is the standard method for downloading and unzipping the most recent GEOmetadb SQLite file from the server. Note: size of the full GEOmetadb.sqlite.gz could be over 10GB and the demo database is 25MB (use type="demo")
getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "normal")getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "normal")
destdir |
The destination directory of the downloaded file |
destfile |
The filename of the downloaded file. This filename should end in ".gz" as the unzipping assumes that is the case |
type |
type of GEOmetadb.sqlite to download, if it is 'normal', a full database will be downloaded, otherwise a demo database will be downloaded, which is 25MB. |
Prints some diagnostic information to the screen.
Returns the local filename for use later.
Sean Davis <[email protected]>
## Download the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()## Download the demo GEOmetadb database: if( !file.exists("GEOmetadb.sqlite") ) { demo_sqlfile <- getSQLiteFile(destdir = getwd(), destfile = "GEOmetadb.sqlite.gz", type = "demo") } else { demo_sqlfile <- "GEOmetadb.sqlite" } ## Download the full GEOmetadb database: ## Not run: geometadbfile <- getSQLiteFile()