---
title: "cellNexus"
author: "Mangiola et al."
package: cellNexus
output:
BiocStyle::html_document:
toc: true
toc_float: true
vignette: >
%\VignetteIndexEntry{cellNexus}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
[](https://lifecycle.r-lib.org/articles/stages.html#maturing)
# Introduction
`cellNexus` extends the functionality of `CuratedAtlasQueryR` by providing a unified interface for querying and accessing the harmonised, curated, and reannotated CELLxGENE human cell atlas. It enables reproducible, programmatic exploration of large-scale single-cell datasets, supporting data retrieval at the cell, sample, and dataset levels with flexible filtering based on tissue, cell type, experimental condition, and other metadata. Retrieved data are returned in formats ready for downstream analysis.
The package integrates over 44 million human cells processed through a standardised pipeline, including consistent quality control, normalisation, and unified abundance representations (e.g., single-cell, counts-per-million, normalised expression, and pseudobulk). This harmonisation facilitates efficient cross-dataset comparison and integration.
Data are hosted on the ARDC Nectar Research Cloud, and most functions access them via web requests; therefore, an active network connection is required for typical use.
While both cellNexus and CuratedAtlasQueryR rely on precomputed expression layers, cellNexus adopts a more standardised and transparent processing workflow. This includes explicit removal of empty droplets and dead cells, followed by harmonised quality control, normalisation, and multi-layer data generation, ensuring alignment with evolving CELLxGENE releases.
# Query interface
## Installation
``` r
devtools::install_github("MangiolaLaboratory/cellNexus")
```
## Load the package
``` r
library(cellNexus)
```
## Load additional packages
``` r
suppressPackageStartupMessages({
library(ggplot2)
})
```
## Load and explore the metadata
### Load the metadata
By default, `get_metadata()` loads harmonised annotations. Users can retrieve original Census annotations by the function `join_census_table()`.
``` r
metadata <- get_metadata() |>
join_census_table()
#> ℹ Downloading 1 file, totalling 0.43 GB
#> ℹ Downloading https://object-store.rc.nectar.org.au/v1/AUTH_06d6e008e3e642da99d806ba3ea629c5/cellNexus-metadata/census_cell_metadata.2.3.0.parquet to /vast/scratch/users/shen.m/r_cache/R/cellNexus/census_cell_metadata.2.3.0.parquet
#> ℹ Downloading 1 file, totalling 0.9 GB
#> ℹ Downloading https://object-store.rc.nectar.org.au/v1/AUTH_06d6e008e3e642da99d806ba3ea629c5/cellNexus-metadata/cellnexus_metadata.2.3.0.parquet to /vast/scratch/users/shen.m/r_cache/R/cellNexus/cellnexus_metadata.2.3.0.parquet
metadata
#> # Source: SQL [?? x 76]
#> # Database: DuckDB 1.4.3 [unknown@Linux 5.14.0-570.112.1.el9_6.x86_64:R 4.5.3/:memory:]
#> cell_id observation_joinid dataset_id sample_id sample_ experiment___ run_from_cell_id sample_heuristic age_days tissue_groups
#>
#> 1 3670 -08E8se6Ii 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 2 519 &7%VnKpYm6 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 3 3674 dz@)@k# HGR0000124___bl… 28835 blood
#> 4 3430 9v{tS;L+wQ 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 5 3471 Z*;~@?yoi( 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 6 3474 Qa9?c3UqN^ 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 7 3477 7|N`hFx0Qr 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 8 3483 `!a(s{Z6^s 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 9 3748 (g+_FXEA&4 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> 10 3500 _oid2*7KJQ 30cd5311-6c09-46c9-94f1-71fe4b9… 420ce6ff… 420ce6… "" HGR0000124___bl… 28835 blood
#> # ℹ more rows
#> # ℹ 66 more variables: nFeature_expressed_in_sample , nCount_RNA , empty_droplet , cell_type_unified_ensemble , is_immune ,
#> # subsets_Mito_percent , subsets_Ribo_percent , high_mitochondrion , high_ribosome , scDblFinder.class ,
#> # sample_chunk , cell_chunk , sample_pseudobulk_chunk , file_id_cellNexus_single_cell , file_id_cellNexus_pseudobulk ,
#> # count_upper_bound , nfeature_expressed_thresh , inverse_transform , alive , cell_annotation_blueprint_singler ,
#> # cell_annotation_monaco_singler , cell_annotation_azimuth_l2 , ethnicity_flagging_score , low_confidence_ethnicity ,
#> # .aggregated_cells , imputed_ethnicity , atlas_id , cell_type , cell_type_ontology_term_id , assay , …
```
Metadata is saved to `get_default_cache_dir()` unless a custom path is provided via the cache_directory argument. The `metadata` variable can then be re-used for all subsequent queries.
### Explore the tissue
``` r
metadata |>
dplyr::distinct(tissue, cell_type_unified_ensemble)
#> # Source: SQL [?? x 2]
#> # Database: DuckDB 1.4.3 [unknown@Linux 5.14.0-570.112.1.el9_6.x86_64:R 4.5.3/:memory:]
#> tissue cell_type_unified_ensemble
#>
#> 1 cerebellum pericyte
#> 2 cerebellum endothelial
#> 3 cerebellum other
#> 4 cerebellum dc
#> 5 cerebellum immune
#> 6 spleen b
#> 7 spleen nk
#> 8 spleen b memory
#> 9 spleen cdc
#> 10 spleen t cd4
#> # ℹ more rows
```
## Quality control
cellNexus metadata applies standardised quality control to filter out empty droplets, dead or damaged cells, doublets, and samples with low gene counts.
``` r
metadata <- metadata |>
keep_quality_cells()
metadata <- metadata |>
dplyr::filter(feature_count >= 5000)
```
## Download single-cell RNA sequencing counts
### Query raw counts
``` r
single_cell_counts <-
metadata |>
dplyr::filter(
self_reported_ethnicity == "African American" &
assay == "10x 3' v3" &
tissue == "breast" &
cell_type == "T cell"
) |>
get_single_cell_experiment()
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 10 files, totalling 0.03 GB
#> ℹ Downloading 10 files in parallel...
#> ℹ Reading files.
#>
Reading counts ■■■■ 10% | ETA: 15s
Reading counts ■■■■■■■ 20% | ETA: 12s
Reading counts ■■■■■■■■■■ 30% | ETA: 12s
Reading counts ■■■■■■■■■■■■■ 40% | ETA: 10s
Reading counts ■■■■■■■■■■■■■■■■ 50% | ETA: 8s
Reading counts ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 6s
Reading counts ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 5s
Reading counts ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 3s
Reading counts ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 2s
ℹ Compiling Experiment.
single_cell_counts
#> # A SingleCellExperiment-tibble abstraction: 2,924 × 77
#> # [90mFeatures=33145 | Cells=2924 | Assays=counts[0m
#> .cell observation_joinid dataset_id sample_id sample_ experiment___ run_from_cell_id sample_heuristic age_days tissue_groups nFeature_expressed_i…¹
#>
#> 1 1_1 I8a42<8st4 842c6f5d-4… 184fa234… 184fa2… "" c2aa4d8d-e9df-4… 14600 breast 3395
#> 2 76_1 bTlx!HK=oS 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 1671
#> 3 77_1 E4g5+)v;AV 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 2340
#> 4 78_1 +q?29B%2nH 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 1714
#> 5 79_1 zuJ#MBMWy; 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 1506
#> 6 72_1 8wGs7JgUjj 842c6f5d-4… 6b194412… 6b1944… "" b3ff1aad-40fd-4… 14600 breast 2548
#> 7 75_1 F9G7A+GgjA 842c6f5d-4… db5a69ed… db5a69… "" 49beb83c-66a1-4… 14600 breast 1291
#> 8 73_1 z_=CTOs4{z 842c6f5d-4… 4b5e66fa… 4b5e66… "" 04983012-bb56-4… 14600 breast 2866
#> 9 74_1 fNzorxA`Mf 842c6f5d-4… 4b5e66fa… 4b5e66… "" 04983012-bb56-4… 14600 breast 1942
#> 10 80_1 zz-!e5_XAo 842c6f5d-4… 1de3f3ba… 1de3f3… "" 7fabaf1c-52fd-4… 14600 breast 1749
#> # ℹ 2,914 more rows
#> # ℹ abbreviated name: ¹nFeature_expressed_in_sample
#> # ℹ 66 more variables: nCount_RNA , empty_droplet , cell_type_unified_ensemble , is_immune , subsets_Mito_percent ,
#> # subsets_Ribo_percent , high_mitochondrion , high_ribosome , scDblFinder.class , sample_chunk , cell_chunk ,
#> # sample_pseudobulk_chunk , file_id_cellNexus_single_cell , file_id_cellNexus_pseudobulk , count_upper_bound ,
#> # nfeature_expressed_thresh , inverse_transform , alive , cell_annotation_blueprint_singler ,
#> # cell_annotation_monaco_singler , cell_annotation_azimuth_l2 , ethnicity_flagging_score , low_confidence_ethnicity , …
```
### Query counts scaled per million
``` r
single_cell_cpm <-
metadata |>
dplyr::filter(
self_reported_ethnicity == "African American" &
assay == "10x 3' v3" &
tissue == "breast" &
cell_type == "T cell"
) |>
get_single_cell_experiment(assays = "cpm")
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 10 files, totalling 0.03 GB
#> ℹ Downloading 10 files in parallel...
#> ℹ Reading files.
#>
Reading cpm ■■■■ 10% | ETA: 12s
Reading cpm ■■■■■■■ 20% | ETA: 18s
Reading cpm ■■■■■■■■■■ 30% | ETA: 13s
Reading cpm ■■■■■■■■■■■■■ 40% | ETA: 10s
Reading cpm ■■■■■■■■■■■■■■■■ 50% | ETA: 8s
Reading cpm ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 6s
Reading cpm ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 5s
Reading cpm ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 3s
Reading cpm ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 1s
ℹ Compiling Experiment.
single_cell_cpm
#> # A SingleCellExperiment-tibble abstraction: 2,924 × 77
#> # [90mFeatures=33145 | Cells=2924 | Assays=cpm[0m
#> .cell observation_joinid dataset_id sample_id sample_ experiment___ run_from_cell_id sample_heuristic age_days tissue_groups nFeature_expressed_i…¹
#>
#> 1 72_1 8wGs7JgUjj 842c6f5d-4… 6b194412… 6b1944… "" b3ff1aad-40fd-4… 14600 breast 2548
#> 2 75_1 F9G7A+GgjA 842c6f5d-4… db5a69ed… db5a69… "" 49beb83c-66a1-4… 14600 breast 1291
#> 3 73_1 z_=CTOs4{z 842c6f5d-4… 4b5e66fa… 4b5e66… "" 04983012-bb56-4… 14600 breast 2866
#> 4 74_1 fNzorxA`Mf 842c6f5d-4… 4b5e66fa… 4b5e66… "" 04983012-bb56-4… 14600 breast 1942
#> 5 80_1 zz-!e5_XAo 842c6f5d-4… 1de3f3ba… 1de3f3… "" 7fabaf1c-52fd-4… 14600 breast 1749
#> 6 81_1 -mb&DWckf( 842c6f5d-4… 1de3f3ba… 1de3f3… "" 7fabaf1c-52fd-4… 14600 breast 1993
#> 7 1_1 I8a42<8st4 842c6f5d-4… 184fa234… 184fa2… "" c2aa4d8d-e9df-4… 14600 breast 3395
#> 8 76_1 bTlx!HK=oS 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 1671
#> 9 77_1 E4g5+)v;AV 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 2340
#> 10 78_1 +q?29B%2nH 842c6f5d-4… 52ab9222… 52ab92… "" 7ce86149-8906-4… 14600 breast 1714
#> # ℹ 2,914 more rows
#> # ℹ abbreviated name: ¹nFeature_expressed_in_sample
#> # ℹ 66 more variables: nCount_RNA , empty_droplet , cell_type_unified_ensemble , is_immune , subsets_Mito_percent ,
#> # subsets_Ribo_percent , high_mitochondrion , high_ribosome , scDblFinder.class , sample_chunk , cell_chunk ,
#> # sample_pseudobulk_chunk , file_id_cellNexus_single_cell , file_id_cellNexus_pseudobulk , count_upper_bound ,
#> # nfeature_expressed_thresh , inverse_transform , alive , cell_annotation_blueprint_singler ,
#> # cell_annotation_monaco_singler , cell_annotation_azimuth_l2 , ethnicity_flagging_score , low_confidence_ethnicity , …
```
### Query SCT normalised counts
``` r
single_cell_sct <-
metadata |>
dplyr::filter(
self_reported_ethnicity == "African American" &
assay == "10x 3' v3" &
tissue == "breast" &
cell_type == "T cell"
) |>
get_single_cell_experiment(assays = "sct")
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 10 files, totalling 0.03 GB
#> ℹ Downloading 10 files in parallel...
#> ℹ Reading files.
#> ! The number of cells in the SingleCellExperiment will be less than the number of cells you have selected from the metadata. Are cell IDs duplicated? Or, do cell IDs correspond to the counts file?
#>
Reading sct ■■■■ 10% | ETA: 14s
Reading sct ■■■■■■■ 20% | ETA: 11s
Reading sct ■■■■■■■■■■ 30% | ETA: 11s
Reading sct ■■■■■■■■■■■■■ 40% | ETA: 9s
! The number of cells in the SingleCellExperiment will be less than the number of cells you have selected from the metadata. Are cell IDs duplicated? Or, do cell IDs correspond to the counts file?
#> Reading sct ■■■■■■■■■■■■■ 40% | ETA: 9s
Reading sct ■■■■■■■■■■■■■■■■ 50% | ETA: 7s
! The number of cells in the SingleCellExperiment will be less than the number of cells you have selected from the metadata. Are cell IDs duplicated? Or, do cell IDs correspond to the counts file?
#> Reading sct ■■■■■■■■■■■■■■■■ 50% | ETA: 7s
Reading sct ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 6s
Reading sct ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 4s
! The number of cells in the SingleCellExperiment will be less than the number of cells you have selected from the metadata. Are cell IDs duplicated? Or, do cell IDs correspond to the counts file?
#> Reading sct ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 4s
Reading sct ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 3s
Reading sct ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 1s
ℹ Compiling Experiment.
single_cell_sct
#> # A SingleCellExperiment-tibble abstraction: 1,244 × 77
#> # [90mFeatures=33145 | Cells=1244 | Assays=sct[0m
#> .cell observation_joinid dataset_id sample_id sample_ experiment___ run_from_cell_id sample_heuristic age_days tissue_groups nFeature_expressed_i…¹
#>
#> 1 72_1 8wGs7JgUjj 842c6f5d-4… 6b194412… 6b1944… "" b3ff1aad-40fd-4… 14600 breast 2548
#> 2 75_1 F9G7A+GgjA 842c6f5d-4… db5a69ed… db5a69… "" 49beb83c-66a1-4… 14600 breast 1291
#> 3 1_1 I8a42<8st4 842c6f5d-4… 184fa234… 184fa2… "" c2aa4d8d-e9df-4… 14600 breast 3395
#> 4 73_1 z_=CTOs4{z 842c6f5d-4… 4b5e66fa… 4b5e66… "" 04983012-bb56-4… 14600 breast 2866
#> 5 74_1 fNzorxA`Mf 842c6f5d-4… 4b5e66fa… 4b5e66… "" 04983012-bb56-4… 14600 breast 1942
#> 6 80_1 zz-!e5_XAo 842c6f5d-4… 1de3f3ba… 1de3f3… "" 7fabaf1c-52fd-4… 14600 breast 1749
#> 7 81_1 -mb&DWckf( 842c6f5d-4… 1de3f3ba… 1de3f3… "" 7fabaf1c-52fd-4… 14600 breast 1993
#> 8 22_2 2lQ`<&l3-A 842c6f5d-4… 30967738… 309677… "" 7d4045ff-3f48-4… 14600 breast 2058
#> 9 23_2 sqV-|vcI4R 842c6f5d-4… d8ecdd92… d8ecdd… "" bc909bba-be16-4… 14600 breast 2375
#> 10 5_2 +p4uNj_7$S 842c6f5d-4… a91e6814… a91e68… "" 700a819c-03f9-4… 14600 breast 1870
#> # ℹ 1,234 more rows
#> # ℹ abbreviated name: ¹nFeature_expressed_in_sample
#> # ℹ 66 more variables: nCount_RNA , empty_droplet , cell_type_unified_ensemble , is_immune , subsets_Mito_percent ,
#> # subsets_Ribo_percent , high_mitochondrion , high_ribosome , scDblFinder.class , sample_chunk , cell_chunk ,
#> # sample_pseudobulk_chunk , file_id_cellNexus_single_cell , file_id_cellNexus_pseudobulk , count_upper_bound ,
#> # nfeature_expressed_thresh , inverse_transform , alive , cell_annotation_blueprint_singler ,
#> # cell_annotation_monaco_singler , cell_annotation_azimuth_l2 , ethnicity_flagging_score , low_confidence_ethnicity , …
```
### Query pseudobulk
``` r
pseudobulk_counts <-
metadata |>
dplyr::filter(
assay == "10x 5' v1" &
tissue == "lung" &
cell_type == "classical monocyte"
) |>
get_pseudobulk()
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Downloading 6 files, totalling 0.98 GB
#> ℹ Downloading 6 files in parallel...
#> ℹ Reading files.
#>
Reading counts ■■■■■ 14% | ETA: 19s
Reading counts ■■■■■■■■■■ 29% | ETA: 20s
Reading counts ■■■■■■■■■■■■■■ 43% | ETA: 16s
Reading counts ■■■■■■■■■■■■■■■■■■ 57% | ETA: 12s
Reading counts ■■■■■■■■■■■■■■■■■■■■■■ 71% | ETA: 8s
Reading counts ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 4s
! cellNexus says: Not all genes completely overlap across the provided objects. Counts are generated by genes intersection.
#> ℹ Compiling Experiment.
pseudobulk_counts
#> # A SingleCellExperiment-tibble abstraction: 139 × 60
#> # [90mFeatures=15888 | Cells=139 | Assays=counts[0m
#> .cell dataset_id sample_id sample_ experiment___ run_from_cell_id sample_heuristic age_days tissue_groups cell_type_unified_en…¹ sample_chunk
#>
#> 1 2e8c9911c9b… 0ba16f4b-… 2e8c9911… 2e8c99… "" HDBR15279,HDBR1… NA respiratory … cd14 mono 1
#> 2 f71af64a552… 1e6a6ef9-… f71af64a… f71af6… "" Leader_Merad_20… 27010 respiratory … monocytic 1
#> 3 f71af64a552… 1e6a6ef9-… f71af64a… f71af6… "" Leader_Merad_20… 27010 respiratory … cd14 mono 1
#> 4 f71af64a552… 1e6a6ef9-… f71af64a… f71af6… "" Leader_Merad_20… 27010 respiratory … cd8 tem 1
#> 5 0d874636bc7… 1e6a6ef9-… 0d874636… 0d8746… "" Leader_Merad_20… 29930 respiratory … cd14 mono 1
#> 6 0d874636bc7… 1e6a6ef9-… 0d874636… 0d8746… "" Leader_Merad_20… 29930 respiratory … monocytic 1
#> 7 0d874636bc7… 1e6a6ef9-… 0d874636… 0d8746… "" Leader_Merad_20… 29930 respiratory … cd16 mono 1
#> 8 0d874636bc7… 1e6a6ef9-… 0d874636… 0d8746… "" Leader_Merad_20… 29930 respiratory … macrophage 1
#> 9 0d874636bc7… 1e6a6ef9-… 0d874636… 0d8746… "" Leader_Merad_20… 29930 respiratory … other 1
#> 10 11721339cb1… 1e6a6ef9-… 11721339… 117213… "" Leader_Merad_20… 26645 respiratory … monocytic 1
#> # ℹ 129 more rows
#> # ℹ abbreviated name: ¹cell_type_unified_ensemble
#> # ℹ 49 more variables: cell_chunk , sample_pseudobulk_chunk , file_id_cellNexus_pseudobulk , count_upper_bound ,
#> # nfeature_expressed_thresh , inverse_transform , ethnicity_flagging_score , low_confidence_ethnicity , .aggregated_cells ,
#> # imputed_ethnicity , atlas_id , assay , assay_ontology_term_id , development_stage ,
#> # development_stage_ontology_term_id , disease , disease_ontology_term_id , donor_id , is_primary_data , organism ,
#> # organism_ontology_term_id , self_reported_ethnicity , self_reported_ethnicity_ontology_term_id , sex , …
```
## Download cell communication metadata
Cell communication metadata was generated based on post-QC cells per sample using `CellChat v2` method. It uses our harmonised cell type annotation (cell_type_unified_ensemble) to infer the communication. It captures inferred communication at both the ligand–receptor pair level and the signalling pathway level.
- interaction_count: The number of inferred interactions between each pair of cell groups.
- interaction_weight: The aggregated communication strength between each pair of cell groups.
For definitions of additional annotations, please refer to the CellChat v2 documentation: https://github.com/jinworks/CellChat.
For demonstration purpose, read cell communication metadata from a demo file here. Users do not need to specify cloud_metadata argument in this case.
``` r
get_cell_communication_strength(cloud_metadata = get_metadata_url("cellNexus_lr_signaling_pathway_strength_DEMO.parquet"))
#> # Source: SQL [?? x 16]
#> # Database: DuckDB 1.4.3 [unknown@Linux 5.14.0-570.112.1.el9_6.x86_64:R 4.5.3/:memory:]
#> source target ligand receptor lr_prob lr_pval interaction_name interaction_name_2 pathway_name annotation evidence pathway_prob pathway_pval
#>
#> 1 b b TGFB1 TGFbR1_R2 0.000116 1 TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TG… TGFb Secreted … KEGG: h… 0.000420 1
#> 2 b memory b TGFB1 TGFbR1_R2 0.000865 1 TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TG… TGFb Secreted … KEGG: h… 0.00185 1
#> 3 b naive b TGFB1 TGFbR1_R2 0.000696 0.99 TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TG… TGFb Secreted … KEGG: h… 0.00146 0.994
#> 4 cd14 mono b TGFB1 TGFbR1_R2 0.00240 0.81 TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TG… TGFb Secreted … KEGG: h… 0.00472 0.924
#> 5 cd4 naive b TGFB1 TGFbR1_R2 0.000957 1 TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TG… TGFb Secreted … KEGG: h… 0.00201 0.998
#> 6 cd4 tem b TGFB1 TGFbR1_R2 0.00242 0.76 TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TG… TGFb Secreted … KEGG: h… 0.00467 0.797
#> # ℹ 3 more variables: sample_id , interaction_count , interaction_weight
```
### Extract only a subset of genes
This is helpful if just few genes are of interest (e.g ENSG00000134644 (PUM1)), as they can be compared across samples. cellNexus uses ENSEMBL gene ID(s).
``` r
single_cell_cpm <-
metadata |>
dplyr::filter(
self_reported_ethnicity == "African American" &
assay == "10x 3' v3" &
tissue == "breast" &
cell_type == "T cell"
) |>
get_single_cell_experiment(assays = "cpm", features = "ENSG00000134644")
#> ℹ Realising metadata.
#> ℹ Synchronising files
#> ℹ Reading files.
#>
Reading cpm ■■■■ 10% | ETA: 12s
Reading cpm ■■■■■■■ 20% | ETA: 10s
Reading cpm ■■■■■■■■■■ 30% | ETA: 10s
Reading cpm ■■■■■■■■■■■■■ 40% | ETA: 8s
Reading cpm ■■■■■■■■■■■■■■■■ 50% | ETA: 7s
Reading cpm ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 5s
Reading cpm ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 4s
Reading cpm ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 3s
Reading cpm ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 1s
ℹ Compiling Experiment.
single_cell_cpm
#> # A SingleCellExperiment-tibble abstraction: 2,924 × 77
#> # [90mFeatures=1 | Cells=2924 | Assays=cpm[0m
#> .cell observation_joinid dataset_id sample_id sample_ experiment___ run_from_cell_id sample_heuristic age_days tissue_groups nFeature_expressed_i…¹
#>