Title: | Retrieve and analyze data from the Human Protein Atlas |
---|---|
Description: | Provide functions for retrieving, exploratory analyzing and visualizing the Human Protein Atlas data. |
Authors: | Anh Nhat Tran [aut, cre] |
Maintainer: | Anh Nhat Tran <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.25.0 |
Built: | 2024-10-30 07:27:34 UTC |
Source: | https://github.com/bioc/HPAanalyze |
Dataset downloaded with hpaDownload('histology', version = 'latest')
.
This should be the most updated dataset at the time of generation. Check
metadata for more information.
hpa_histology_data
hpa_histology_data
A list of 3 tibbles
Normal tissue IHC data
Cancer IHC data
Subcellular location IF data
# load data data("hpa_histology_data") # access data frames normal_tissue_data <- hpa_histology_data$normal_tissue cancer_data <- hpa_histology_data$pathology subcell_location_data <- hpa_histology_data$subcellular_location # see metadata hpa_histology_data$metadata
# load data data("hpa_histology_data") # access data frames normal_tissue_data <- hpa_histology_data$normal_tissue cancer_data <- hpa_histology_data$pathology subcell_location_data <- hpa_histology_data$subcellular_location # see metadata hpa_histology_data$metadata
Download the latest version of HPA datasets and import them in R. It is recommended to only download the datasets you need, as some of them may be very big.
hpaDownload(downloadList = "histology", version = "latest")
hpaDownload(downloadList = "histology", version = "latest")
downloadList |
A vector or string indicate which datasets to download. Possible value:
You can also use the following shortcuts:
See https://www.proteinatlas.org/about/download for more information. |
version |
A string indicate which version to be downloaded. Possible value:
|
This function will return a list of tibbles corresponding to requested datasets.
hpaDownload
hpa_histology_data
Other downloadable datasets functions:
hpaExport()
,
hpaSubset()
histologyData <- hpaDownload(downloadList='histology', version='example') # tissueTranscriptData <- hpaDownload('RNA transcript tissue')
histologyData <- hpaDownload(downloadList='histology', version='example') # tissueTranscriptData <- hpaDownload('RNA transcript tissue')
Export the list object generated by hpaSubset()
into xlsx format. Due
to the size of some HPA datasets, as well as the limitation of the output
format, exporting the full datasets generated by hpaDownload()
is not
recommended. This is a convenient wrapper for 'write.' functions.
hpaExport(data, fileName, fileType = "xlsx")
hpaExport(data, fileName, fileType = "xlsx")
data |
Input the list object generated by |
fileName |
A string indicate the desired output file name. Do not
include file extension such as |
fileType |
The format as which the data will be exported. Choose one of
these options: |
'xlsx'
: return one .xlsx file named 'fileName.xlsx'
.
One individual sheet for each dataset in the input list object.
'csv'
: return .csv files, one for each dataset in the input
list object, named 'fileName_datasetName.csv'
'tsv'
: return .tsv files, one for each dataset in the input
list object, named 'fileName_datasetName.tsv'
Other downloadable datasets functions:
hpaDownload()
,
hpaSubset()
downloadedData <- hpaDownload(downloadList='histology', version='example') geneList <- c('TP53', 'EGFR') tissueList <- c('breast', 'cerebellum', 'skin 1') cancerList <- c('breast cancer', 'glioma', 'melanoma') subsetData <- hpaSubset(data=downloadedData, targetGene=geneList, targetTissue=tissueList, targetCancer=cancerList) hpaExport(data=subsetData, fileName='TP53_EGFR_in_tissue_cancer.xlsx', fileType='xlsx')
downloadedData <- hpaDownload(downloadList='histology', version='example') geneList <- c('TP53', 'EGFR') tissueList <- c('breast', 'cerebellum', 'skin 1') cancerList <- c('breast cancer', 'glioma', 'melanoma') subsetData <- hpaSubset(data=downloadedData, targetGene=geneList, targetTissue=tissueList, targetCancer=cancerList) hpaExport(data=subsetData, fileName='TP53_EGFR_in_tissue_cancer.xlsx', fileType='xlsx')
hpaSubset()
subsets data by gene name, tissue, cell type, cancer
and/or cell line. The input is the list object generated by
hpaDownload()
or as the output of another hpaSubset()
. Use
hpaListParam()
to see the list of available parameters for a specific
list object. This is a convenient wrapper for 'lapply/filter' and works on
any table which contain 'gene', 'tissue', 'cell_type', 'cancer', and
'cell_line' columns.
hpaListParam()
list available variables in downloaded data that can be
used as parameters to subset the data via hpaSubset()
. This function
work with the data object generated by hpaDownload()
or a previous
call of hpaSubset()
. This is a convenient wrapper for 'lapply/unique'
and works on any table which contain 'tissue', 'cell_type', 'cancer', and
'cell_line' columns.
hpaSubset( data = NULL, targetGene = NULL, targetTissue = NULL, targetCellType = NULL, targetCancer = NULL, targetCellLine = NULL ) hpaListParam(data = NULL)
hpaSubset( data = NULL, targetGene = NULL, targetTissue = NULL, targetCellType = NULL, targetCancer = NULL, targetCellLine = NULL ) hpaListParam(data = NULL)
data |
Input the list object generated by |
targetGene |
Vector of strings of HGNC gene symbols. It will be used to subset every dataset in the list object. You can also mix HGNC gene symbols and ensemnbl ids (start with ENSG) and they will be converted to HGNC gene symbols. |
targetTissue |
Vector of strings of normal tissues. Will be used to
subset the |
targetCellType |
Vector of strings of normal cell types. Will be used to
subset the |
targetCancer |
Vector of strings of cancer types. Will be used to subset
the |
targetCellLine |
Vector of strings of cell lines. Will be used to subset
the |
hpaSubset
will return a list of tibbles as the result of
subsetting, depending on the input data.
The output of hpaListParam()
is a list of vectors containing
all subset parameter for the downloaded data.
Other downloadable datasets functions:
hpaDownload()
,
hpaExport()
downloadedData <- hpaDownload(downloadList='histology', version='example') geneList <- c('TP53', 'EGFR') tissueList <- c('breast', 'cerebellum', 'skin 1') cancerList <- c('breast cancer', 'glioma', 'melanoma') subsetData <- hpaSubset(data=downloadedData, targetGene=geneList, targetTissue=tissueList, targetCancer=cancerList) downloadedData <- hpaDownload(downloadList='histology', version='example') params <- hpaListParam(data=downloadedData) params$normal_tissue
downloadedData <- hpaDownload(downloadList='histology', version='example') geneList <- c('TP53', 'EGFR') tissueList <- c('breast', 'cerebellum', 'skin 1') cancerList <- c('breast cancer', 'glioma', 'melanoma') subsetData <- hpaSubset(data=downloadedData, targetGene=geneList, targetTissue=tissueList, targetCancer=cancerList) downloadedData <- hpaDownload(downloadList='histology', version='example') params <- hpaListParam(data=downloadedData) params$normal_tissue
This function is an universal visualization function that allow calling other
hpaVis functions via a single function call. By default, this function will
use the dataset bundled with HPAanalyze, and provide a grid of all available
plots. The types of plots in the output can be specified via the
visType
argument. If only one plot type is specified, this function
will return the exact same output as the specific hpaVis function used to
create the plot.
hpaVis( data = NULL, targetGene = NULL, targetTissue = NULL, targetCellType = NULL, targetCancer = NULL, visType = c("Tissue", "Patho", "Subcell"), color = c("#FCFDBF", "#FE9F6D", "#DE4968", "#8C2981"), customTheme = FALSE, ... )
hpaVis( data = NULL, targetGene = NULL, targetTissue = NULL, targetCellType = NULL, targetCancer = NULL, visType = c("Tissue", "Patho", "Subcell"), color = c("#FCFDBF", "#FE9F6D", "#DE4968", "#8C2981"), customTheme = FALSE, ... )
data |
Input the list object generated by |
targetGene |
Vector of strings of HGNC gene symbols. By default it is
set to |
targetTissue |
Vector of strings of normal tissue names. By default it
is set to |
targetCellType |
Vector of strings of normal cell types. By default inludes all available cell types in the target tissues. |
targetCancer |
Vector of strings of normal tissues. By default it
is set to |
visType |
Vector of strings indicating which plots will be generated.
Currently available values are |
color |
Vector of 4 colors used to depict different expression levels. |
customTheme |
Logical argument. If |
... |
Additional arguments to be passed downstream to other hpaVis
functions being called behind the scene. These arguments includes
|
If multiple visType is chosen, this function will return multiple graphs in one panel. If only one visType is chosen, this function will return a ggplot2 plot object, which can be further modified if desirable. See help file for each of the hpaVis function for more information about individual graphs.
Other visualization functions:
hpaVisPatho()
,
hpaVisSubcell()
,
hpaVisTissue()
hpaVis()
hpaVis()
Visualize the expression of genes of interest in each cancer.
hpaVisPatho( data = NULL, targetGene = NULL, targetCancer = NULL, facetBy = "cancer", color = c("#FCFDBF", "#FE9F6D", "#DE4968", "#8C2981"), customTheme = FALSE )
hpaVisPatho( data = NULL, targetGene = NULL, targetCancer = NULL, facetBy = "cancer", color = c("#FCFDBF", "#FE9F6D", "#DE4968", "#8C2981"), customTheme = FALSE )
data |
Input the list object generated by |
targetGene |
Vector of strings of HGNC gene symbols. By default it is
set to |
targetCancer |
Vector of strings of normal tissues. The function will plot all available cancer by default. |
facetBy |
Determine how multiple graphs would be faceted. Either
|
color |
Vector of 4 colors used to depict different expression levels. |
customTheme |
Logical argument. If |
This function will return a ggplot2 plot object, which can be further modified if desirable. The pathology data is visualized as multiple bar graphs, one for each type of cancer. For each bar graph, x axis contains the inquired protein and y axis contains the proportion of patients.
Other visualization functions:
hpaVisSubcell()
,
hpaVisTissue()
,
hpaVis()
data("hpa_histology_data") geneList <- c('TP53', 'EGFR', 'CD44', 'PTEN', 'IDH1', 'IDH2', 'CYCS') cancerList <- c('breast cancer', 'glioma', 'melanoma') ## A typical function call hpaVisPatho(data=hpa_histology_data, targetGene=geneList)
data("hpa_histology_data") geneList <- c('TP53', 'EGFR', 'CD44', 'PTEN', 'IDH1', 'IDH2', 'CYCS') cancerList <- c('breast cancer', 'glioma', 'melanoma') ## A typical function call hpaVisPatho(data=hpa_histology_data, targetGene=geneList)
Visualize the the confirmed subcellular locations of genes of interest.
hpaVisSubcell( data = NULL, targetGene = NULL, reliability = c("enhanced", "supported", "approved", "uncertain"), color = c("#FCFDBF", "#8C2981"), customTheme = FALSE )
hpaVisSubcell( data = NULL, targetGene = NULL, reliability = c("enhanced", "supported", "approved", "uncertain"), color = c("#FCFDBF", "#8C2981"), customTheme = FALSE )
data |
Input the list object generated by |
targetGene |
Vector of strings of HGNC gene symbols. By default it is
set to |
reliability |
Vector of string indicate which reliability scores you want to plot. The
default is everything |
color |
Vector of 2 colors used to depict if the protein expresses in a location or not. |
customTheme |
Logical argument. If |
This function will return a ggplot2 plot object, which can be further modified if desirable. The subcellular location data is visualized as a tile graph, in which the x axis includes the inquired proteins and the y axis contain the subcellular locations.
Other visualization functions:
hpaVisPatho()
,
hpaVisTissue()
,
hpaVis()
data("hpa_histology_data") geneList <- c('TP53', 'EGFR', 'CD44', 'PTEN', 'IDH1', 'IDH2', 'CYCS') ## A typical function call hpaVisSubcell(data=hpa_histology_data, targetGene=geneList)
data("hpa_histology_data") geneList <- c('TP53', 'EGFR', 'CD44', 'PTEN', 'IDH1', 'IDH2', 'CYCS') ## A typical function call hpaVisSubcell(data=hpa_histology_data, targetGene=geneList)
Visualize the expression of protein of interest in each target tissue by cell types.
hpaVisTissue( data = NULL, targetGene = NULL, targetTissue = NULL, targetCellType = NULL, color = c("#FCFDBF", "#FE9F6D", "#DE4968", "#8C2981"), customTheme = FALSE )
hpaVisTissue( data = NULL, targetGene = NULL, targetTissue = NULL, targetCellType = NULL, color = c("#FCFDBF", "#FE9F6D", "#DE4968", "#8C2981"), customTheme = FALSE )
data |
Input the list object generated by |
targetGene |
Vector of strings of HGNC gene symbols. By default it is
set to |
targetTissue |
Vector of strings of normal tissues. Default to all. |
targetCellType |
Vector of strings of normal cell types. Default to all. |
color |
Vector of 4 colors used to depict different expression levels. |
customTheme |
Logical argument. If |
This function will return a ggplot2 plot object, which can be further modified if desirable. The tissue data is visualized as a heatmap: x axis contains inquired protein and y axis contains tissue/cells of interest.
Other visualization functions:
hpaVisPatho()
,
hpaVisSubcell()
,
hpaVis()
data("hpa_histology_data") geneList <- c('TP53', 'EGFR', 'CD44', 'PTEN', 'IDH1', 'IDH2', 'CYCS') tissueList <- c('breast', 'cerebellum', 'skin 1') ## A typical function call hpaVisTissue(data=hpa_histology_data, targetGene=geneList, targetTissue=tissueList)
data("hpa_histology_data") geneList <- c('TP53', 'EGFR', 'CD44', 'PTEN', 'IDH1', 'IDH2', 'CYCS') tissueList <- c('breast', 'cerebellum', 'skin 1') ## A typical function call hpaVisTissue(data=hpa_histology_data, targetGene=geneList, targetTissue=tissueList)
This function is the umbrella function for the hpaXml function family. It
take the input of either one Ensembl gene id or a imported XML object
resulting from a hpaXmlGet()
function call. By default, it will
extract all information available for HPAanalyze user from the XML file by
calling every hpaXml function and put all results into a list.
hpaXml( inputXml, extractType = c("ProtClass", "TissueExprSum", "Antibody", "TissueExpr"), ... )
hpaXml( inputXml, extractType = c("ProtClass", "TissueExprSum", "Antibody", "TissueExpr"), ... )
inputXml |
Input can be either one Ensembl gene id (start with ENSG) or
a imported XML object resulting from a |
extractType |
A vector of strings indicate which information is desired
for extraction. By default this function will call all |
... |
Additional arguments to be passed downstream to other hpaXml functions being called behind the scene. See help files of other hpaXml functions for more information. |
This function returns a list. Each element of the list is information extracted from the XML file specified using other hpaXml functions. See help file for each XML function for more information.
Other xml functions:
hpaXmlAntibody()
,
hpaXmlGet()
,
hpaXmlProtClass()
,
hpaXmlTissueExprSum()
,
hpaXmlTissueExpr()
hpaXml(inputXml='ENSG00000131979', extractType=c('ProtClass', 'TissueExprSum', 'Antibody'))
hpaXml(inputXml='ENSG00000131979', extractType=c('ProtClass', 'TissueExprSum', 'Antibody'))
Extract information about the antibodies used for a specific protein. It is important to note that the data that HPA provides on their website and through xml files are not one-to-one equivalents.
hpaXmlAntibody(importedXml)
hpaXmlAntibody(importedXml)
importedXml |
Input an xml document object resulted from a
|
This function returns a tibble of 4 columns, containing information about the antibodies used in the project for the inquired protein: id, releaseDate, releaseVersion, and RRID.
Other xml functions:
hpaXmlGet()
,
hpaXmlProtClass()
,
hpaXmlTissueExprSum()
,
hpaXmlTissueExpr()
,
hpaXml()
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlAntibody(GCH1xml) ## End(Not run)
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlAntibody(GCH1xml) ## End(Not run)
Download and import individual xml file for a specified protein. This
function calls xml2::read_xml()
under the hood. It is important to
note that the data that HPA provides on their website and through xml files
are not one-to-one equivalents.
hpaXmlGet(targetEnsemblId, version = "latest")
hpaXmlGet(targetEnsemblId, version = "latest")
targetEnsemblId |
A string of one ensembl ID, start with ENSG. For
example |
version |
A string indicate which version to be downloaded. Possible value:
|
This function return an object of class "xml_document"
"xml_node"
containing the content of the imported XML file. (See
documentations for package xml2
for more information.)
Other xml functions:
hpaXmlAntibody()
,
hpaXmlProtClass()
,
hpaXmlTissueExprSum()
,
hpaXmlTissueExpr()
,
hpaXml()
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') ## End(Not run)
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') ## End(Not run)
Extract protein class information from imported xml document resulted from
hpaXmlGet()
. It is important to note that the data that HPA provides
on their website and through xml files are not one-to-one equivalents.
hpaXmlProtClass(importedXml)
hpaXmlProtClass(importedXml)
importedXml |
Input an xml document object resulted from a
|
This function return a tibble of 4 columns.
Other xml functions:
hpaXmlAntibody()
,
hpaXmlGet()
,
hpaXmlTissueExprSum()
,
hpaXmlTissueExpr()
,
hpaXml()
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlProtClass(GCH1xml) ## End(Not run)
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlProtClass(GCH1xml) ## End(Not run)
Extract tissue expression information for each sample and url to download
images from imported xml document resulted from hpaXmlGet()
. It is
important to note that the data that HPA provides on their website and
through xml files are not one-to-one equivalents. For example, xml files
usually only provide one of the two histology image for each patient.
hpaXmlTissueExpr(importedXml)
hpaXmlTissueExpr(importedXml)
importedXml |
Input an xml document object resulted from a
|
This function returns a list of tibbles, each for an antibody. Each tibble contains information about all individual samples and their staining. Due to the variation in amount of information available for these samples, the number of columns differs, but the tibble essentially includes: patientId, age, sex, staining, intensity, quantity, location, imageUrl, snomedCode, and tissueDescription. The last two items may have more than one column each.
Other xml functions:
hpaXmlAntibody()
,
hpaXmlGet()
,
hpaXmlProtClass()
,
hpaXmlTissueExprSum()
,
hpaXml()
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlTissueExpr(GCH1xml) ## End(Not run)
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlTissueExpr(GCH1xml) ## End(Not run)
Extract tissue expression information and url to download images from
imported xml document resulted from hpaXmlGet()
. It is important to
note that the data that HPA provides on their website and through xml files
are not one-to-one equivalents.
hpaXmlTissueExprSum(importedXml, downloadImg = FALSE)
hpaXmlTissueExprSum(importedXml, downloadImg = FALSE)
importedXml |
Input an xml document object resulted from a
|
downloadImg |
Logical argument. The function will download all image from the extracted urls into the working folder. |
This function return a list consists of a summary string, which is a very brief description of the protein, and a tibble of 2 columns: tissue (name of tissue available) and imageUrl (link to download the perspective image)
Other xml functions:
hpaXmlAntibody()
,
hpaXmlGet()
,
hpaXmlProtClass()
,
hpaXmlTissueExpr()
,
hpaXml()
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlTissueExprSum(GCH1xml) ## End(Not run)
## Not run: GCH1xml <- hpaXmlGet('ENSG00000131979') hpaXmlTissueExprSum(GCH1xml) ## End(Not run)