BiocFHIR – infrastructure for parsing and analyzing FHIR data

Introduction

FHIR stands for Fast Health Interoperability Resources.

The Wikipedia article is a useful overview. The official website is fhir.org.

This R package addresses very basic tasks of parsing FHIR R4 documents in JSON format. The overall information model of FHIR documents is complex and various decisions are made to help extract and annotate fields presumed to have high value. Submit github issues if important fields are not being propagated.

Install this package using

BiocManager::install("BiocFHIR")

The basic structure of FHIR R4 JSON

We use jsonlite::fromJSON to import a randomly selected FHIR document from a collection simulated by the MITRE corporation. See the associated site for details.

We’ll drill down through the hierarchy of elements collected in a FHIR document with some base R commands, after importing the JSON.

testf = dir(system.file("json", package="BiocFHIR"), full=TRUE)
tt = fromJSON(testf)
names(tt)
## [1] "resourceType" "type"         "entry"
tt[1:2]
## $resourceType
## [1] "Bundle"
## 
## $type
## [1] "transaction"
tte = tt$entry
class(tte)
## [1] "data.frame"
dim(tte)
## [1] 301   3
head(names(tte))
## [1] "fullUrl"  "resource" "request"
tter = tte$resource
dim(tter)
## [1] 301  72
head(names(tter))
## [1] "resourceType" "id"           "text"         "extension"    "identifier"  
## [6] "name"
table(tter$resourceType)
## 
##   AllergyIntolerance             CarePlan             CareTeam 
##                    8                    3                    3 
##                Claim            Condition     DiagnosticReport 
##                   46                   15                    3 
##            Encounter ExplanationOfBenefit         Immunization 
##                   37                   37                   10 
##    MedicationRequest          Observation         Organization 
##                    9                  114                    3 
##              Patient         Practitioner            Procedure 
##                    1                    3                    9

It is by filtering the data frame tter that we acquire information that may be useful in data analysis. The data frame is sparse: many fields are not used in many records. Code in this package attempts to produce useful tables from the sparse information.

As a prologue to table extraction, we do some basic decomposition of tter using process_fhir_bundle.

bu1 = process_fhir_bundle(testf) # just give file path
bu1
## BiocFHIR FHIR.bundle instance.
##   resource types are:
##    AllergyIntolerance CarePlan ... Patient Procedure

bu1 is just a list of data.frames, but with considerable nesting of data.frames and lists within the basic data.frames corresponding to the major FHIR concepts. “Flattening” of such structures is not fully automatic.

Example: a table on Conditions recorded on the patient.

We use process_Condition to extract information.

cond1 = process_Condition(bu1$Condition)
datatable(cond1)

A family of documents

We have collected 50 documents from the synthea resource. These were obtained using random draws from the 1180 records provided. A temporary folder holding them can be produced as follows:

tset = make_test_json_set()
tset[1]
## [1] "/tmp/RtmpZlfiuD/jsontest/Angel97_Swift555_c072e6ad-b03f-4eee-abe0-2dbc93bbadfe.json"

We import ten documents into a list.

myl = lapply(tset[1:10], process_fhir_bundle)
myl[1:2]
## [[1]]
## BiocFHIR FHIR.bundle instance.
##   resource types are:
##    AllergyIntolerance CarePlan ... Patient Procedure
## 
## [[2]]
## BiocFHIR FHIR.bundle instance.
##   resource types are:
##    CarePlan Claim ... Patient Procedure
sapply(myl,length)
##  [1] 10  9  7  9  9  9  9  9  9 10

We see with the last command that documents can have different numbers of components present.

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rjsoncons_1.3.1  jsonlite_1.8.9   DT_0.33          BiocFHIR_1.9.0  
## [5] BiocStyle_2.33.1
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9          utf8_1.2.4          generics_0.1.3     
##  [4] tidyr_1.3.1         digest_0.6.37       magrittr_2.0.3     
##  [7] evaluate_1.0.1      fastmap_1.2.0       graph_1.83.0       
## [10] promises_1.3.0      BiocManager_1.30.25 purrr_1.0.2        
## [13] fansi_1.0.6         crosstalk_1.2.1     jquerylib_0.1.4    
## [16] cli_3.6.3           shiny_1.9.1         rlang_1.1.4        
## [19] visNetwork_2.1.2    cachem_1.1.0        yaml_2.3.10        
## [22] BiocBaseUtils_1.7.3 tools_4.4.1         dplyr_1.1.4        
## [25] httpuv_1.6.15       BiocGenerics_0.51.3 buildtools_1.0.0   
## [28] vctrs_0.6.5         R6_2.5.1            mime_0.12          
## [31] stats4_4.4.1        lifecycle_1.0.4     htmlwidgets_1.6.4  
## [34] pkgconfig_2.0.3     pillar_1.9.0        bslib_0.8.0        
## [37] later_1.3.2         glue_1.8.0          Rcpp_1.0.13        
## [40] xfun_0.48           tibble_3.2.1        tidyselect_1.2.1   
## [43] sys_3.4.3           knitr_1.48          xtable_1.8-4       
## [46] htmltools_0.5.8.1   rmarkdown_2.28      maketools_1.3.1    
## [49] compiler_4.4.1