Title: | Access the ArrayExpress Collection at EMBL-EBI Biostudies and build Bioconductor data structures: ExpressionSet, AffyBatch, NChannelSet |
---|---|
Description: | Access the ArrayExpress Collection at EMBL-EBI Biostudies and build Bioconductor data structures: ExpressionSet, AffyBatch, NChannelSet. |
Authors: | Audrey Kauffmann, Ibrahim Emam, Michael Schubert, Jose Marugan |
Maintainer: | Jose Marugan <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.67.0 |
Built: | 2024-11-19 04:44:08 UTC |
Source: | https://github.com/bioc/ArrayExpress |
ae2bioc
converts local MAGE-TAB files into a AffyBatch, an
ExpressionSet or a NChannelSet.
ae2bioc(mageFiles, dataCols = NULL, drop = TRUE)
ae2bioc(mageFiles, dataCols = NULL, drop = TRUE)
mageFiles |
A list as given from
|
dataCols |
by default, the columns are automatically selected according to the scanner type. If the scanner is unknown or if the user wants to use different columns than the default, the argument 'dataCols' can be set. For two colour arrays it must be a list with the fields 'R', 'G', 'Rb' and 'Gb' giving the column names to be used for red and green foreground and background. For one colour arrays, it must be a character string with the column name to be used. These column names must correspond to existing column names of the expression files. |
drop |
if TRUE and only one platform in series, the platform name will be dropped. |
An object of class
AffyBatch
,
ExpressionSet
or
NChannelSet
with the raw expression
values in the 'assayData' of the object, the information
contained in the sdrf file in the 'phenoData', the adf file content in
the 'featureData' and the idf file content in the 'experimentData'.
If several array designs are used in the dataset, the output is a list with an object for each array design.
Ibrahim Emam
Maintainer: Jose Marugan <[email protected]>
# An example can be found in the help of the getAE function.
# An example can be found in the help of the getAE function.
ArrayExpress
produces an
AffyBatch
, an
ExpressionSet
or a NChannelSet
from a raw
dataset from the ArrayExpress collection of the Biostudies database.
ArrayExpress
needs an Internet connection.
ArrayExpress(accession, path = tempdir(), save = FALSE, dataCols = NULL, drop = TRUE)
ArrayExpress(accession, path = tempdir(), save = FALSE, dataCols = NULL, drop = TRUE)
accession |
an ArrayExpress experiment identifier. |
path |
the name of the directory in which the files downloaded on the ArrayExpress repository will be extracted. The default is the current directory. |
save |
if TRUE, the files downloaded from the database will not be deleted from path after executing the function. |
dataCols |
by default, for the raw data, the columns are automatically selected according to the scanner type. If the scanner is unknown or if the user wants to use different columns than the default, the argument 'dataCols' can be set. For two colour arrays it must be a list with the fields 'R', 'G', 'Rb' and 'Gb' giving the column names to be used for red and green foreground and background. For one colour arrays, it must be a character string with the column name to be used. These column names must correspond to existing column names of the expression files. |
drop |
if TRUE and only one platform in series, the platform name will be dropped. |
The output is an object of class
AffyBatch
or
ExpressionSet
or
NChannelSet
with the raw expression
values in the assayData of the object, the information
contained in the .sdrf file in the phenoData, the adf file in the
featureData and the idf file content in the experimentData.
If several array designs are used in the data set, the output is a list with an object for each array design.
Audrey Kauffmann, Ibrahim Emam
Maintainer: Jose Marugan <[email protected]>
queryAE
,
getAE
,
ae2bioc
,
getcolproc
,
procset
ETABM25.affybatch = ArrayExpress("E-TABM-25") print(ETABM25.affybatch) sampleNames(ETABM25.affybatch) colnames(pData(ETABM25.affybatch))
ETABM25.affybatch = ArrayExpress("E-TABM-25") print(ETABM25.affybatch) sampleNames(ETABM25.affybatch) colnames(pData(ETABM25.affybatch))
extract.zip
extracts the files from a .zip archive in a
specific directory.
extract.zip(file, extractpath = dirname(file)[1])
extract.zip(file, extractpath = dirname(file)[1])
file |
A file name. |
extractpath |
A path to define where the files are to be extracted. |
Success is indicated by returning the directory in which the
files have been extracted. If it fails, it returns an empty
character string.
Audrey Kauffmann
Maintainer: Jose Marugan <[email protected]>
getAE
downloads and extracts the MAGE-TAB files from an
ArrayExpress dataset.
getAE(accession, path = getwd(), type = "full", extract = TRUE, sourcedir = path, overwrite = FALSE)
getAE(accession, path = getwd(), type = "full", extract = TRUE, sourcedir = path, overwrite = FALSE)
accession |
is an ArrayExpress experiment identifier. |
path |
is the name of the directory in which the files downloaded on the ArrayExpress repository will be extracted. |
type |
can be 'raw' to download and extract only the raw data, 'processed' to download and extract only the processed data or 'full' to have both raw and processed data. |
extract |
if FALSE, the files are not extracted from the zip archive. |
sourcedir |
when local = TRUE, files will be read from this directory. |
overwrite |
if TRUE, overwrite files if they already exist in path, default FALSE. |
A list with the names of the files that
have been downloaded and extracted.
Ibrahim Emam, Audrey Kauffmann
Maintainer: Jose Marugan <[email protected]>
ArrayExpress
,
ae2bioc
,
getcolproc
,
procset
mexp21 = getAE("E-MEXP-21", type = "full") ## Build a an ExpressionSet from the raw data MEXP21raw = ae2bioc(mageFiles = mexp21) ## Build a an ExpressionSet from the processed data cnames = getcolproc(mexp21) MEXP21proc = procset(mexp21, cnames[2])
mexp21 = getAE("E-MEXP-21", type = "full") ## Build a an ExpressionSet from the raw data MEXP21raw = ae2bioc(mageFiles = mexp21) ## Build a an ExpressionSet from the processed data cnames = getcolproc(mexp21) MEXP21proc = procset(mexp21, cnames[2])
getcolproc
extracts the column names from processed MAGE-TAB and
return them. The output is needed to call the function procset
.
getcolproc(files)
getcolproc(files)
files |
A list as given from
|
Audrey Kauffmann
Maintainer: Jose Marugan <[email protected]>
ArrayExpress
,
queryAE
,
getAE
,
procset
getcolraw
extracts the column names from raw MAGE-TAB and
return them. The output can be use to set the argument 'rawcol'
of the function magetab2bioc
.
getcolraw(rawfiles)
getcolraw(rawfiles)
rawfiles |
rawfiles are the name of the raw MAGE-TAB files to be read. |
Audrey Kauffmann
Maintainer: Jose Marugan <[email protected]>
procset
converts local MAGE-TAB files into an
ExpressionSet
.
procset(files, procol)
procset(files, procol)
files |
is the list with the names of the processed, the sdrf,
the adf and the idf files and the path of the data as given by
|
procol |
the name of the column to be extracted from the
file. Obtained using |
Ibrahim Emam, Audrey Kauffmann
Maintainer: Jose Marugan <[email protected]>
# An example can be found in the help of the getAE function.
# An example can be found in the help of the getAE function.
queryAE
queries the ArrayExpress collection with keywords and
give a dataframe with ArrayExpress identifiers and related
information, as an output.
queryAE(keywords = NULL, species = NULL)
queryAE(keywords = NULL, species = NULL)
keywords |
the keyword(s) of interest. To use several words, they must be separated by a "+" as shown in the examples. |
species |
the specie(s) of interest. |
A dataframe with all the ArrayExpress dataset
identifiers which correspond to the query in the first column.
The following columns contain information about these datasets,
such as the number of files, the release date on the database,
the title, the author and content.
Ibrahim Emam, Audrey Kauffmann
Maintainer: Jose Marugan <[email protected]>
## To retrieve all the identifiers of pneumonia data sets pneumo = queryAE(keywords = "pneumonia") ## To retrieve all the identifiers of pneumonia data sets studied in human pneumoHS = queryAE(keywords = "pneumonia", species = "homo+sapiens")
## To retrieve all the identifiers of pneumonia data sets pneumo = queryAE(keywords = "pneumonia") ## To retrieve all the identifiers of pneumonia data sets studied in human pneumoHS = queryAE(keywords = "pneumonia", species = "homo+sapiens")