New Implementation of the HIBAG Algorithm with Latest Intel Intrinsics

Benchmarks on building the training models

Benchmarks were run on the compute nodes of Cascade Lake microarchitecture (Intel Xeon Gold 6248 [email protected]). The HIBAG package was compiled with GCC v8.3.0 in R-v4.0.2. In HIBAG (>= v1.26.1), users can use hlaSetKernelTarget() to select the target intrinsics, or hlaSetKernelTarget("max") for maximizing the algorithm efficiency.

GCC (>= v6.0) is recommended to compile the HIBAG package. In the benchmarks, the kernel version of HIBAG_v1.24 is v1.4, and the kernel version of newer package is v1.5.

# continue without interrupting
IgnoreError <- function(cmd) tryCatch(cmd, error=function(e) { message("Not support"); invisible() })
IgnoreError(hlaSetKernelTarget("sse4"))
## [1] "64-bit, SSE4.2, POPCNT"                                               
## [2] "13.3.0, GNUG_v13.3.0"                                                 
## [3] "Algorithm SIMD: SSE2 SSE4.2 AVX AVX2 AVX512F AVX512BW AVX512VPOPCNTDQ"
IgnoreError(hlaSetKernelTarget("avx"))
## [1] "64-bit, AVX"                                                          
## [2] "13.3.0, GNUG_v13.3.0"                                                 
## [3] "Algorithm SIMD: SSE2 SSE4.2 AVX AVX2 AVX512F AVX512BW AVX512VPOPCNTDQ"
IgnoreError(hlaSetKernelTarget("avx2"))
## [1] "64-bit, AVX2"                                                         
## [2] "13.3.0, GNUG_v13.3.0"                                                 
## [3] "Algorithm SIMD: SSE2 SSE4.2 AVX AVX2 AVX512F AVX512BW AVX512VPOPCNTDQ"
IgnoreError(hlaSetKernelTarget("avx512f"))
## Not support
IgnoreError(hlaSetKernelTarget("avx512bw"))
## Not support

The CPU may reduce the frequency of the cores dynamically to keep power usage of AVX512 within bounds, hlaSetKernelTarget("auto.avx2") can automatically select AVX2 even if the CPU supports the AVX512F and AVX512BW intrinsics. Please check the CPU throttling with AVX512 intrinsics.

1) Speedup factor using small training sets

2) Speedup factor using medium training sets

3) Speedup factor using large training sets

Multithreading

The multi-threaded implementation can be enabled by specifying the number of threads via nthread in the function hlaAttrBagging(), or hlaParallelAttrBagging(cl=nthread, ...).

Here are the performance of multithreading and the comparison between AVX2 and AVX512BW:

Session Info

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.5.1 HIBAG_1.43.1 
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5        cli_3.6.3          knitr_1.49         rlang_1.1.4        xfun_0.49         
##  [6] jsonlite_1.8.9     labeling_0.4.3     RcppParallel_5.1.9 glue_1.8.0         colorspace_2.1-1  
## [11] buildtools_1.0.0   htmltools_0.5.8.1  maketools_1.3.1    sys_3.4.3          sass_0.4.9        
## [16] scales_1.3.0       rmarkdown_2.29     grid_4.4.2         tibble_3.2.1       evaluate_1.0.1    
## [21] munsell_0.5.1      jquerylib_0.1.4    fastmap_1.2.0      yaml_2.3.10        lifecycle_1.0.4   
## [26] compiler_4.4.2     RColorBrewer_1.1-3 pkgconfig_2.0.3    farver_2.1.2       digest_0.6.37     
## [31] R6_2.5.1           pillar_1.10.0      magrittr_2.0.3     bslib_0.8.0        withr_3.0.2       
## [36] tools_4.4.2        gtable_0.3.6       cachem_1.1.0

References