The Human Protein Atlas allow you to download very detailed data for
each protein in the form of an xml file, and hpaXmlGet
and
hpaXml
allow you to retrieve those files automatically from
HPA server and parse them. However, due to technical limitation, you
will not be able to save those "xml_document"/"xml_node"
objects. The question is: How do you keep a version of these files to
use when you are not connected to the internet, or for
reproducibility?
Look at the “Downloadable
data” page from HPA website, you will see how these files are
downloaded. Basically, you add [ensembl_id].xml
to
http://www.proteinatlas.org
to download individual entries
(that’s what hpaXmlGet
does behind the scene), or download
the whole
big set.
From there, you can import the file using
xml2::read_xml()
. The output should be exactly the same as
hpaXmlGet
.
hpaXml
functionsSince the umbrella function hpaXml
take either the
ensembl id or the imported xml_document
object,
you can feed what you just imported to it and get the expected
result.
You can obviously use other hpaXml
functions as
well.
Anh Tran, 2018-2023
Please cite: Tran, A.N., Dussaq, A.M., Kennell, T. et al. HPAanalyze: an R package that facilitates the retrieval and analysis of the Human Protein Atlas data. BMC Bioinformatics 20, 463 (2019) https://doi.org/10.1186/s12859-019-3059-z