‘HuBMAP’ data portal (https://portal.hubmapconsortium.org/) provides an open,
global bio-molecular atlas of the human body at the cellular level.
HuBMAPR
package provides an alternative interface to
explore the data via R.
The HuBMAP Consortium offers several APIs. To achieve
the main objectives, HuBMAPR
package specifically
integrates three APIs:
Search
API, Entity
API, and Ontology
API. Each API serves a distinct purpose with unique query
capabilities, tailored to meet various needs. Utilizing the
httr2
and rjsoncons
packages,
HuBMAPR
effectively manages, modifies, and executes
multiple requests via these APIs, presenting responses in formats such
as tibble or character. These outputs are further modified for clarity
in the final results from the HuBMAPR
functions. The
Search API is primarily searching relevant data
information and is referenced to the Elasticsearch
API. The Entity API is specifically utilized in the
bulk_data_transfer()
function for Globus URL retrieval,
while the Ontology API is applied in the
organ()
function. Referencing to HuBMAP Data Portal, the
functions in HuBAMPR
package reflects the data information
of HuBMAP as much as possible.
HuBMAP Data incorporates three different identifiers: HuBMAP
ID, Universally Unique Identifier (UUID), and Digital Object Identifiers
(DOI). The HuBMAPR
package utilizes the UUID - a 32-digit
hexadecimal number - and the more human-readable HuBMAP ID as two common
identifiers in the retrieved results. Considering precision and
compatibility with software implementation and data storage, UUID serves
as the primary identifier to retrieve data across various functions,
with the UUID mapping uniquely to its corresponding HuBMAP ID. The
systematic nomenclature is adopted for functions in the package by
appending the entity category prefix to the concise description of the
specific functionality. Most of the functions are grouped by entity
categories, thereby simplifying the process of selecting the appropriate
functions to retrieve the desired information associated with the given
UUID from the specific entity category. The structure of these functions
is heavily consistent across all entity categories with some exceptions
for collection and publication.
HuBMAPR
is a R package. The package can be installed
by
if (!requireNamespace("BiocManager")) {
install.packages("BiocManager")
}
BiocManager::install("HuBMAPR")
Install development version from GitHub:
Load additional packages. dplyr
package is widely used
in this vignettes to conduct data wrangling and specific information
extraction.
HuBMAP
data portal page displays five categories of
entity data, which are Dataset, Sample, Donor, Publication, and
Collection, chronologically (last modified date time). Using
corresponding functions to explore entity data.
datasets_df <- datasets()
#> Pruning cache
datasets_df
#> # A tibble: 3,487 × 14
#> uuid hubmap_id dataset_type dataset_type_additio…¹ organ analyte_class
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 f58f712bee… HBM658.S… Visium (no … "" Ovar… ""
#> 2 8fd6bb5cd1… HBM476.G… Visium (no … "" Uter… ""
#> 3 7b878dd4ec… HBM742.W… 10X Multiome "" Uter… ""
#> 4 6ee21d9b55… HBM873.P… 10X Multiome "" Ovar… ""
#> 5 1779b3a6d4… HBM578.C… 10X Multiome "" Uter… ""
#> 6 985a9372ee… HBM355.C… Visium (no … "" Ovar… ""
#> 7 72f6415499… HBM839.P… Visium (no … "" Ovar… ""
#> 8 b8def263ba… HBM592.B… 10X Multiome "" Fall… ""
#> 9 2c58019856… HBM564.D… Visium (no … "" Uter… ""
#> 10 8fa7279936… HBM443.L… Visium (no … "" Ovar… ""
#> # ℹ 3,477 more rows
#> # ℹ abbreviated name: ¹dataset_type_additional_information
#> # ℹ 8 more variables: sample_category <chr>, status <chr>,
#> # dataset_processing_category <chr>, pipeline <chr>, registered_by <chr>,
#> # donor_hubmap_id <chr>, group_name <chr>, last_modified_timestamp <chr>
The default tibble produced by corresponding entity function only
reflects selected information. To see the names of selected information,
use following commands for each entity category. Specify as
parameter to display information in the format of
"character"
or "tibble"
.
# as = "tibble" (default)
datasets_col_tbl <- datasets_default_columns(as = "tibble")
datasets_col_tbl
#> # A tibble: 14 × 1
#> columns
#> <chr>
#> 1 uuid
#> 2 hubmap_id
#> 3 group_name
#> 4 dataset_type_additional_information
#> 5 dataset_type
#> 6 organ
#> 7 analyte_class
#> 8 dataset_processing_category
#> 9 sample_category
#> 10 registered_by
#> 11 status
#> 12 pipeline
#> 13 last_modified_timestamp
#> 14 donor_hubmap_id
# as = "character"
datasets_col_char <- datasets_default_columns(as = "character")
datasets_col_char
#> [1] "uuid" "hubmap_id"
#> [3] "group_name" "dataset_type_additional_information"
#> [5] "dataset_type" "organ"
#> [7] "analyte_class" "dataset_processing_category"
#> [9] "sample_category" "registered_by"
#> [11] "status" "pipeline"
#> [13] "last_modified_timestamp" "donor_hubmap_id"
samples_default_columns()
,
donors_default_columns()
,
collections_default_columns()
, and
publications_default_columns()
work same as above.
A brief overview of selected information for five entity categories is:
tbl <- bind_cols(
dataset = datasets_default_columns(as = "character"),
sample = c(samples_default_columns(as = "character"), rep(NA, 7)),
donor = c(donors_default_columns(as = "character"), rep(NA, 6)),
collection = c(collections_default_columns(as = "character"),
rep(NA, 10)),
publication = c(publications_default_columns(as = "character"),
rep(NA, 7))
)
tbl
#> # A tibble: 14 × 5
#> dataset sample donor collection publication
#> <chr> <chr> <chr> <chr> <chr>
#> 1 uuid uuid hubm… uuid uuid
#> 2 hubmap_id hubmap_id uuid hubmap_id hubmap_id
#> 3 group_name group_name grou… title title
#> 4 dataset_type_additional_information sample_cate… Sex last_modi… publicatio…
#> 5 dataset_type organ Age <NA> last_modif…
#> 6 organ last_modifi… Body… <NA> publicatio…
#> 7 analyte_class donor_hubma… Race <NA> publicatio…
#> 8 dataset_processing_category <NA> last… <NA> <NA>
#> 9 sample_category <NA> <NA> <NA> <NA>
#> 10 registered_by <NA> <NA> <NA> <NA>
#> 11 status <NA> <NA> <NA> <NA>
#> 12 pipeline <NA> <NA> <NA> <NA>
#> 13 last_modified_timestamp <NA> <NA> <NA> <NA>
#> 14 donor_hubmap_id <NA> <NA> <NA> <NA>
Use organ()
to read through the available organs
included in HuBMAP
. It can be helpful to filter retrieved
data based on organ information.
organs <- organ()
organs
#> # A tibble: 43 × 2
#> abbreviation name
#> <chr> <chr>
#> 1 BD Blood
#> 2 BL Bladder
#> 3 BM Bone Marrow
#> 4 BR Brain
#> 5 BV Blood Vasculature
#> 6 HT Heart
#> 7 LA Larynx
#> 8 LB Bronchus (Left)
#> 9 LE Eye (Left)
#> 10 LF Fallopian Tube (Left)
#> # ℹ 33 more rows
Data wrangling and filter are welcome to retrieve data based on interested information.
# Example from datasets()
datasets_df |>
filter(organ == 'Small Intestine') |>
count()
#> # A tibble: 1 × 1
#> n
#> <int>
#> 1 424
Any dataset, sample, donor, collection, and publication has special HuBMAP ID and UUID, and UUID is the main ID to be used in most of functions for specific detail retrievals.
The column of donor_hubmap_id is included in the
retrieved tibbles from samples()
and
datasets()
, which can help to join the tibble.
donors_df <- donors()
donor_sub <- donors_df |>
filter(Sex == "Female",
Age <= 76 & Age >= 55,
Race == "White",
`Body Mass Index` <= 25,
last_modified_timestamp >= "2020-01-08" &
last_modified_timestamp <= "2020-06-30") |>
head(1)
# Datasets
donor_sub_dataset <- donor_sub |>
left_join(datasets_df |>
select(-c(group_name, last_modified_timestamp)) |>
rename("dataset_uuid" = "uuid",
"dataset_hubmap_id" = "hubmap_id"),
by = c("hubmap_id" = "donor_hubmap_id"))
donor_sub_dataset
#> # A tibble: 0 × 19
#> # ℹ 19 variables: uuid <chr>, hubmap_id <chr>, group_name <chr>, Sex <chr>,
#> # Age <dbl>, Body Mass Index <dbl>, Race <chr>,
#> # last_modified_timestamp <chr>, dataset_uuid <chr>, dataset_hubmap_id <chr>,
#> # dataset_type <chr>, dataset_type_additional_information <chr>, organ <chr>,
#> # analyte_class <chr>, sample_category <chr>, status <chr>,
#> # dataset_processing_category <chr>, pipeline <chr>, registered_by <chr>
# Samples
samples_df <- samples()
donor_sub_sample <- donor_sub |>
left_join(samples_df |>
select(-c(group_name, last_modified_timestamp)) |>
rename("sample_uuid" = "uuid",
"sample_hubmap_id" = "hubmap_id"),
by = c("hubmap_id" = "donor_hubmap_id"))
donor_sub_sample
#> # A tibble: 0 × 12
#> # ℹ 12 variables: uuid <chr>, hubmap_id <chr>, group_name <chr>, Sex <chr>,
#> # Age <dbl>, Body Mass Index <dbl>, Race <chr>,
#> # last_modified_timestamp <chr>, sample_uuid <chr>, sample_hubmap_id <chr>,
#> # sample_category <chr>, organ <chr>
You can use *_detail(uuid)
to retrieve all available
information for any entry of any entity category given
UUID. Use select()
and
unnest_*()
functions to expand list-columns. It will be
convenient to view tables with multiple columns but one row using
glimpse()
.
dataset_uuid <- datasets_df |>
filter(dataset_type == "Auto-fluorescence",
organ == "Kidney (Right)") |>
head(1) |>
pull(uuid)
# Full Information
dataset_detail(dataset_uuid) |> glimpse()
#> Rows: 1
#> Columns: 35
#> $ ancestor_ids <list> <"de1076a42144debd94ba4f5d1f9b6d57",…
#> $ ancestors <list> [["de1076a42144debd94ba4f5d1f9b6d57"…
#> $ contacts <list> [["Biomolecular Multimodal Imaging C…
#> $ contains_human_genetic_sequences <lgl> FALSE
#> $ contributors <list> [["Biomolecular Multimodal Imaging Ce…
#> $ created_by_user_displayname <chr> "HuBMAP Process"
#> $ created_by_user_email <chr> "[email protected]"
#> $ created_timestamp <dbl> 1.711126e+12
#> $ creation_action <chr> "Create Dataset Activity"
#> $ data_access_level <chr> "public"
#> $ dataset_type <chr> "Auto-fluorescence"
#> $ descendant_ids <list> "61530ed23518ee5f6cdaa818e91dde65"
#> $ descendants <list> [["Auto-fluorescence [Image Pyramid]…
#> $ description <chr> "Autofluorescence Microscopy collecte…
#> $ display_subtype <chr> "Auto-fluorescence"
#> $ doi_url <chr> "https://doi.org/10.35079/HBM223.DJQM…
#> $ donor <list> ["Jamie Allen", "jamie.l.allen@vander…
#> $ entity_type <chr> "Dataset"
#> $ files <list> []
#> $ group_name <chr> "Vanderbilt TMC"
#> $ group_uuid <chr> "73bb26e4-ed43-11e8-8f19-0a7c1eab007a"
#> $ hubmap_id <chr> "HBM223.DJQM.264"
#> $ immediate_ancestor_ids <list> "de1076a42144debd94ba4f5d1f9b6d57"
#> $ immediate_descendant_ids <list> "61530ed23518ee5f6cdaa818e91dde65"
#> $ index_version <chr> "3.5.3"
#> $ last_modified_timestamp <dbl> 1.71691e+12
#> $ metadata <list> [[["a9099c6", "https://github.com/hub…
#> $ origin_samples <list> [["Jamie Allen", "jamie.l.allen@vande…
#> $ provider_info <chr> "VAN0042-RK-3 block AF : ./VAN0042-RK…
#> $ published_timestamp <dbl> 1.715267e+12
#> $ registered_doi <chr> "10.35079/HBM223.DJQM.264"
#> $ source_samples <list> [["Jamie Allen", "jamie.l.allen@vand…
#> $ status <chr> "Published"
#> $ title <chr> "Auto-fluorescence data from the kid…
#> $ uuid <chr> "993bb1d6fa02e2755fd69613bb9d6e08"
# Specific Information
dataset_detail(uuid = dataset_uuid) |>
select(contributors) |>
unnest_longer(contributors) |>
unnest_wider(everything())
#> # A tibble: 16 × 11
#> affiliation display_name email first_name is_contact is_operator
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Biomolecular Multimodal… Jamie L. Al… jami… Jamie No Yes
#> 2 Delft Center for System… Lukasz Migas l.g.… Lukasz No Yes
#> 3 Biomolecular Multimodal… Nathan Heat… nath… Nathan No Yes
#> 4 Biomolecular Multimodal… Jeffrey M. … jeff… Jeffrey Yes No
#> 5 Delft Center for System… Leonor Tide… l.e.… Leonoor No Yes
#> 6 Delft Center for System… Raf Van de … Raf.… Raf No No
#> 7 Biomolecular Multimodal… Melissa A. … meli… Melissa No Yes
#> 8 Biomolecular Multimodal… Madeline E.… made… Madeline No Yes
#> 9 Biomolecular Multimodal… Ellie L. Pi… elli… Ellie No Yes
#> 10 Delft Center for System… Felipe Moser f.a.… Felipe No Yes
#> 11 Division of Nephrology … Mark deCaes… mark… Mark No Yes
#> 12 Division of Nephrology … Agnes B. Fo… agne… Agnes No Yes
#> 13 Division of Nephrology … Haichun Yang haic… Haichun No Yes
#> 14 Biomolecular Multimodal… Tina Tsui tina… Tina No Yes
#> 15 Biomolecular Multimodal… Katerina V.… kate… Katerina No Yes
#> 16 Biomolecular Multimodal… Allison B. … alli… Allison No Yes
#> # ℹ 5 more variables: is_principal_investigator <chr>, last_name <chr>,
#> # metadata_schema_id <chr>, middle_name_or_initial <chr>, orcid <chr>
sample_detail()
, donor_detail()
,
collection_detail()
, and publication_detail()
work same as above.
To retrieve the metadata for Dataset,
Sample, and Donor metadata, use
dataset_metadata()
, sample_metadata()
, and
donor_metadata()
.
dataset_metadata("993bb1d6fa02e2755fd69613bb9d6e08")
#> # A tibble: 19 × 2
#> Key Value
#> <chr> <chr>
#> 1 acquisition_instrument_model "Axio Scan.Z1"
#> 2 acquisition_instrument_vendor "Zeiss Microscopy"
#> 3 analyte_class "endogenous fluorophores"
#> 4 antibodies_path "extras/antibodies.tsv"
#> 5 contributors_path "extras/contributors.tsv"
#> 6 data_path "."
#> 7 dataset_type "Auto-fluorescence"
#> 8 is_image_preprocessing_required "no"
#> 9 is_targeted "No"
#> 10 metadata_schema_id "c9c6a02b-010e-4217-96dc-f7ef71dd14c4"
#> 11 parent_sample_id "HBM836.RXKQ.893"
#> 12 preparation_protocol_doi "https://dx.doi.org/10.17504/protocols.io.7e…
#> 13 source_storage_duration_unit "hour"
#> 14 source_storage_duration_value "2"
#> 15 tile_configuration "Not applicable"
#> 16 donor.Age "57 years"
#> 17 donor.Body Mass Index "25.3 kg/m2"
#> 18 donor.Race "White "
#> 19 donor.Sex "Male "
sample_metadata("8ecdbdc3e2d04898e2563d666658b6a9")
#> # A tibble: 5 × 2
#> Key Value
#> <chr> <chr>
#> 1 donor.Age "71 years"
#> 2 donor.Apolipoprotein E phenotype "Apolipoprotein E phenotype "
#> 3 donor.Pathology note "Pathology note "
#> 4 donor.Race "White "
#> 5 donor.Sex "Male "
donor_metadata("b2c75c96558c18c9e13ba31629f541b6")
#> # A tibble: 8 × 2
#> Key Value
#> <chr> <chr>
#> 1 Age "41 years"
#> 2 Body mass index "37.1 kg/m^2"
#> 3 Cause of death "Cerebrovascular accident "
#> 4 Death event "Natural causes "
#> 5 Mechanism of injury "Intracranial hemorrhage "
#> 6 Race "White "
#> 7 Sex "Female "
#> 8 Social history "Smoker "
Some datasets from Dataset entity has derived
(support) dataset(s). Use dataset_derived()
to retrieve. A
tibble with selected details will be retrieved as if the given dataset
has support dataset; otherwise, nothing returns.
# no derived/support dataset
dataset_uuid_1 <- "3acdb3ed962b2087fbe325514b098101"
dataset_derived(uuid = dataset_uuid_1)
#> NULL
# has derived/support dataset
dataset_uuid_2 <- "baf976734dd652208d13134bc5c4594b"
dataset_derived(uuid = dataset_uuid_2) |> glimpse()
#> Rows: 1
#> Columns: 6
#> $ uuid <chr> "bbbf5a5b29986dd57910daab00193f35"
#> $ hubmap_id <chr> ""
#> $ data_types <chr> ""
#> $ dataset_type <chr> "Histology [Image Pyramid]"
#> $ status <chr> ""
#> $ last_modified_timestamp <chr> "NA"
Sample and Donor have derived
samples and datasets. In HuBAMPR
package,
sample_derived()
and donor_derived()
functions
are available to use to see the derived datasets and samples from one
sample given sample UUID or one donor given donor UUID. Specify
entity_type
parameter to retrieve derived
Dataset
or Sample
.
sample_uuid <- samples_df |>
filter(last_modified_timestamp >= "2023-01-01" &
last_modified_timestamp <= "2023-10-01",
organ == "Kidney (Left)") |>
head(1) |>
pull(uuid)
sample_uuid
#> [1] "c40774aa2f52a2811db15c5ca1949314"
# Derived Datasets
sample_derived(uuid = sample_uuid, entity_type = "Dataset")
#> # A tibble: 12 × 2
#> uuid derived_dataset_count
#> <chr> <int>
#> 1 4fddf6de0f42a7e2648b547affefc234 1
#> 2 b6fd505b8e8e1829a2783570f9f25256 0
#> 3 c3db2027e148e92fecb85e7d6a1fd708 1
#> 4 3a10030d3323e5353cfdc3ada45cad86 0
#> 5 71642e4c4a9cc12f59f3317b4a19adc9 1
#> 6 bd42ab2f422e45ce6b0f3f55171de8aa 0
#> 7 c8ad223f01b45b25e0dcb07c48a42762 1
#> 8 f7b49444b974c98c6300e0bfe5fc3a75 0
#> 9 beb1b65624fe85b527ee2ce80ef208b2 1
#> 10 c25d6febe5b007ad32bc59246c99833d 0
#> 11 744647801573d1d5700ee7523089734c 1
#> 12 4a98c43ab3b20b06c11dfbed5fd9034b 0
# Derived Samples
sample_derived(uuid = sample_uuid, entity_type = "Sample")
#> # A tibble: 3 × 2
#> uuid organ
#> <chr> <chr>
#> 1 ec54b7d4ab4545166a0d121b3dc1ec3f Kidney (Left)
#> 2 ae98f6ca4f1f9950f7e7e1dedc2acc10 Kidney (Left)
#> 3 b099a37195f532e4b384020dc0e94bb5 Kidney (Left)
donor_derived()
works same as above.
For individual entries from Dataset and
Sample entities, uuid_provenance()
helps
to retrieve the provenance of the entry as a list of characters (UUID,
HuBMAP ID, and entity type) from the most recent ancestor to the
furthest ancestor. There is no ancestor for Donor UUID, and an empty
list will be returned.
# dataset provenance
dataset_uuid <- "3e4c568d9ce8df9d73b8cddcf8d0fec3"
uuid_provenance(dataset_uuid)
#> [[1]]
#> [1] "eba120ab7bbd864a6f6f3ad41e598d25, Sample"
#>
#> [[2]]
#> [1] "468d73d28b9e8c43ffa5fbd56d8e46e3, Sample"
#>
#> [[3]]
#> [1] "1c749716d32310351cb9557c7e2937a0, Sample"
#>
#> [[4]]
#> [1] "c09f875545a64694d70a28091ffbcf8b, Donor"
# sample provenance
sample_uuid <- "35e16f13caab262f446836f63cf4ad42"
uuid_provenance(sample_uuid)
#> [[1]]
#> [1] "0b43d8d0dbbc5e3923a8b963650ab8e3, Sample"
#>
#> [[2]]
#> [1] "eed96170f42554db84d97d1652bb23ef, Sample"
#>
#> [[3]]
#> [1] "1628b6f7eb615862322d6274a6bc9fa0, Donor"
# donor provenance
donor_uuid <- "0abacde2443881351ff6e9930a706c83"
uuid_provenance(donor_uuid)
#> list()
To read the textual description of one Collection or
Publication, use collection_information()
or publication_information()
respectively.
collection_information(uuid = collection_uuid)
#> Title
#> Spatiotemporal coordination at the maternal-fetal interface promotes trophoblast invasion and vascular remodeling in the first half of human pregnancy
#> Description
#> Beginning in the first trimester, fetally derived extravillous trophoblasts (EVTs) invade the uterus and remodel its spiral arteries, transforming them into large, dilated blood vessels left with a thin, discontinuous smooth muscle layer and partially lined with EVTs. Several mechanisms have been proposed to explain how EVTs coordinate with the maternal decidua to promote a tissue microenvironment conducive to spiral artery remodeling (SAR). However, it remains a matter of debate which immune and stromal cell types participate in these interactions and how this process evolves with respect to gestational age. Here, we used a multiomic approach that combined the strengths of spatial proteomics and transcriptomics to construct the first spatiotemporal atlas of the human maternal-fetal interface in the first half of pregnancy. We used multiplexed ion beam imaging by time of flight (MIBI-TOF) and a 37-plex antibody panel to analyze ∼500,000 cells and 588 spiral arteries within intact decidua from 66 patients between 6-20 weeks of gestation, integrating this with coregistered transcriptomic profiles. Gestational age substantially influenced the frequency of many maternal immune and stromal cells, with tolerogenic subsets expressing CD206, CD163, TIM-3, Galectin-9, and IDO-1 increasingly enriched and colocalized at later time points. In contrast, SAR progression preferentially correlated with EVT invasion and was transcriptionally defined by 78 gene ontology pathways exhibiting unique monotonic and biphasic trends. Lastly, we developed an integrated model of SAR supporting an intravasation mechanism where invasion is accompanied by upregulation of pro-angiogenic, immunoregulatory EVT programs that promote interactions with vascular endothelium while avoiding activation of immune cells in circulating maternal blood. Taken together, these results support a coordinated model of decidualization in which increasing gestational age drives a transition in maternal decidua towards a tolerogenic niche conducive to locally regulated, EVT-dependent SAR.
#> DOI
#> - https://doi.org/10.35079/hbm585.qpdv.454
#> URL
#> - 10.35079/hbm585.qpdv.454
publication_information(uuid = publication_uuid)
#> Title
#> An atlas of healthy and injured cell states and niches in the human kidney
#> Abstract
#> Understanding kidney disease relies on defining the complexity of cell types and states, their associated molecular profiles and interactions within tissue neighbourhoods1. Here we applied multiple single-cell and single-nucleus assays (>400,000 nuclei/cells) and spatial imaging technologies to a broad spectrum of healthy reference kidneys (45 donors) and diseased kidneys (48 patients). This has provided a high-resolution cellular atlas of 51 main cell types that include rare and previously undescribed cell populations. The multi-omic approach provides detailed transcriptomic profiles, epigenomic regulatory factors and spatial localizations spanning the entire kidney. We also define 28 cellular states across nephron segments and interstitium that were altered in kidney injury, encompassing cycling, adaptive or maladaptive repair, transitioning and degenerative states. Molecular signatures permitted the localization of these states within injury neighbourhoods using spatial transcriptomics, while large-scale 3D imaging analysis (around 1.2 million neighbourhoods) provided corresponding linkages to active immune responses. These analyses defined biological pathways that are relevant to injury time-course and niches, including signatures underlying epithelial repair that predicted maladaptive states associated with a decline in kidney function. This integrated multimodal spatial cell atlas of healthy and diseased human kidneys represents a comprehensive benchmark of cellular states, neighbourhoods, outcome-associated signatures and publicly available interactive visualizations.
#> Manuscript
#> - Nature: https://doi.org/10.1038/s41586-023-05769-3
#> Corresponding Authors
#> - Michael T. Eadon 0000-0003-3066-2876
#> - Pierre C. Dagher 0000-0003-3321-5561
#> - Tarek M. El-Achkar 0000-0003-4645-3614
#> - Kun Zhang 0000-0002-7596-5224
#> - Matthias Kretzler 0000-0003-4064-0582
#> - Sanjay Jain 0000-0003-2804-127X
#> Data Types
#> - RNAseq
#> Organs
#> - Kidney (Right)
Some additional contact/author/contributor information can be
retrieved using dataset_contributor()
for
Dataset entity, collection_contact()
and
collection_contributors()
for Collection
entity, or publication_authors()
for
Publication entity.
# Dataset
dataset_contributors(uuid = dataset_uuid)
#> # A tibble: 2 × 5
#> display_name affiliation email orcid is_principal_investi…¹
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Xingzhao Wen "University of California, Sa… xzwe… 0000… No
#> 2 Sheng Zhong "University of California, Sa… szho… 0000… Yes
#> # ℹ abbreviated name: ¹is_principal_investigator
# Collection
collection_contacts(uuid = collection_uuid)
#> # A tibble: 2 × 3
#> name affiliation orcid_id
#> <chr> <chr> <chr>
#> 1 Shirley Greenbaum Department of Pathology, Stanford University 0000-0002-0680…
#> 2 Michael Angelo Department of Pathology, Stanford University 0000-0003-1531…
collection_contributors(uuid = collection_uuid)
#> # A tibble: 13 × 3
#> name affiliation orcid_id
#> <chr> <chr> <chr>
#> 1 Shirley Greenbaum Department of Pathology, Stanford University 0000-00…
#> 2 Inna Averbukh Department of Pathology, Stanford University 0000-00…
#> 3 Erin Soon Department of Pathology, Stanford University 0000-00…
#> 4 Gabrielle Rizzuto Department of Pathology, UCSF 0000-00…
#> 5 Noah Greenwald Department of Pathology, Stanford University 0000-00…
#> 6 Marc Bosse Department of Pathology, Stanford University 0000-00…
#> 7 Eleni G. Jaswa Department of Obstetrics, Gynecology & Reproducti… 0000-00…
#> 8 Zumana Khair Department of Pathology, Stanford University 0000-00…
#> 9 David Van Valen Division of Biology and Bioengineering, Californi… 0000-00…
#> 10 Leeat Keren Department of Molecular Cell Biology, Weizmann In… 0000-00…
#> 11 Travis Hollmann Department of Pathology, Memorial Sloan Kettering… 0000-00…
#> 12 Matt van de Rjin Department of Pathology, Stanford University 0000-00…
#> 13 Michael Angelo Department of Pathology, Stanford University 0000-00…
# Publication
publication_authors(uuid = publication_uuid)
#> # A tibble: 50 × 3
#> name affiliation orcid_id
#> <chr> <chr> <chr>
#> 1 Blue B. Lake Department of Bioengineering, University of C… 0000-00…
#> 2 Rajasree Menon Department of Computational Medicine and Bioi… 0000-00…
#> 3 Seth Winfree Department of Pathology and Microbiology, Uni… 0000-00…
#> 4 Qiwen Hu Department of Biomedical Informatics, Harvard… 0000-00…
#> 5 Ricardo Melo Ferreira Department of Medicine, Indiana University Sc… 0000-00…
#> 6 Kian Kalhor Department of Bioengineering, University of C… 0000-00…
#> 7 Daria Barwinska Department of Medicine, Indiana University Sc… 0000-00…
#> 8 Edgar A. Otto Department of Internal Medicine, Division of … 0000-00…
#> 9 Michael Ferkowicz Department of Medicine, Indiana University Sc… 0000-00…
#> 10 Dinh Diep Department of Bioengineering, University of C… 0000-00…
#> # ℹ 40 more rows
For each dataset, there are corresponding data files. Most of the datasets’ files are available on HuBMAP Globus with corresponding URL. Some of the datasets’ files are not available via Globus, but can be accessed via dbGAP (database of Genotypes and Phenotypes) and/or SRA (Sequence Read Archive). But some of the datasets’ files are not available in any authorized platform.
Each dataset available on Globus has different components of data-related files to preview and download, include but not limited to images, metadata files, downstream analysis reports, raw data products, etc.
Use bulk_data_transfer()
to know whether data files are
open-accessed or restricted. The function will direct you to chrome if
the files are accessible; otherwise, the error messages will be printed
out with addition instructions, either providing dbGAP/SRA URLs or
pointing out the unavailability.
uuid_globus <- "d1dcab2df80590d8cd8770948abaf976"
bulk_data_transfer(uuid_globus)
uuid_dbGAP_SRA <- "d926c41ac08f3c2ba5e61eec83e90b0c"
bulk_data_transfer(uuid_dbGAP_SRA)
uuid_not_avail <- "0eb5e457b4855ce28531bc97147196b6"
bulk_data_transfer(uuid_not_avail)
You can choose to download data files on Globus webpage by clicking download button after choosing the desired document. You can also preview and download data files using rglobus package. Follow the instructions here.
R
session information#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] HuBMAPR_1.1.4 ggplot2_3.5.1 tidyr_1.3.1 dplyr_1.1.4 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] rappdirs_0.3.3 sass_0.4.9 utf8_1.2.4 generics_0.1.3 stringi_1.8.4
#> [6] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1 grid_4.4.1 fastmap_1.2.0
#> [11] jsonlite_1.8.9 whisker_0.4.1 BiocManager_1.30.25 purrr_1.0.2 fansi_1.0.6
#> [16] scales_1.3.0 httr2_1.0.5 jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4
#> [21] munsell_0.5.1 withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.4.1
#> [26] colorspace_2.1-1 curl_5.2.3 buildtools_1.0.0 vctrs_0.6.5 rjsoncons_1.3.1
#> [31] R6_2.5.1 lifecycle_1.0.4 stringr_1.5.1 pkgconfig_2.0.3 pillar_1.9.0
#> [36] bslib_0.8.0 gtable_0.3.6 glue_1.8.0 xfun_0.49 tibble_3.2.1
#> [41] tidyselect_1.2.1 highr_0.11 sys_3.4.3 knitr_1.48 farver_2.1.2
#> [46] htmltools_0.5.8.1 rmarkdown_2.28 maketools_1.3.1 labeling_0.4.3 compiler_4.4.1