Vignette of the a4Preproc package

Introduction

This document explains the functionalities available in the a4Preproc package.

This package contains utility functions to pre-process data for the Automated Affymetrix Array Analysis suite of packages.

Get feature annotation for an ExpressionSet

The feature annotation for a specific dataset, as required by the pipeline is extracted with the addGeneInfo function.

library(ALL)
data(ALL)
a4ALL <- addGeneInfo(eset = ALL)
print(head(fData(a4ALL)))
##           ENTREZID       ENSEMBLID  SYMBOL
## 1000_at       5595 ENSG00000102882   MAPK3
## 1001_at       7075 ENSG00000066056    TIE1
## 1002_f_at     1557 ENSG00000165841 CYP2C19
## 1003_s_at      643 ENSG00000160683   CXCR5
## 1004_at        643 ENSG00000160683   CXCR5
## 1005_at       1843 ENSG00000120129   DUSP1
##                                                                  GENENAME
## 1000_at                                mitogen-activated protein kinase 3
## 1001_at   tyrosine kinase with immunoglobulin like and EGF like domains 1
## 1002_f_at                  cytochrome P450 family 2 subfamily C member 19
## 1003_s_at                                C-X-C motif chemokine receptor 5
## 1004_at                                  C-X-C motif chemokine receptor 5
## 1005_at                                    dual specificity phosphatase 1
print(head(featureData(a4ALL)))
## An object of class 'AnnotatedDataFrame'
##   featureNames: 1000_at 1001_at ... 1005_at (6 total)
##   varLabels: ENTREZID ENSEMBLID SYMBOL GENENAME
##   varMetadata: labelDescription

Appendix

Session information

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] hgu95av2.db_3.13.0   org.Hs.eg.db_3.19.1  AnnotationDbi_1.67.0
##  [4] IRanges_2.39.2       S4Vectors_0.43.2     ALL_1.47.0          
##  [7] Biobase_2.65.1       BiocGenerics_0.51.0  a4Preproc_1.53.0    
## [10] rmarkdown_2.28      
## 
## loaded via a namespace (and not attached):
##  [1] bit_4.0.5               jsonlite_1.8.8          compiler_4.4.1         
##  [4] crayon_1.5.3            blob_1.2.4              Biostrings_2.73.1      
##  [7] jquerylib_0.1.4         png_0.1-8               yaml_2.3.10            
## [10] fastmap_1.2.0           R6_2.5.1                XVector_0.45.0         
## [13] GenomeInfoDb_1.41.1     knitr_1.48              maketools_1.3.0        
## [16] GenomeInfoDbData_1.2.12 DBI_1.2.3               bslib_0.8.0            
## [19] rlang_1.1.4             KEGGREST_1.45.1         cachem_1.1.0           
## [22] xfun_0.47               sass_0.4.9              sys_3.4.2              
## [25] bit64_4.0.5             RSQLite_2.3.7           memoise_2.0.1          
## [28] cli_3.6.3               zlibbioc_1.51.1         digest_0.6.37          
## [31] lifecycle_1.0.4         vctrs_0.6.5             evaluate_0.24.0        
## [34] buildtools_1.0.0        httr_1.4.7              pkgconfig_2.0.3        
## [37] UCSC.utils_1.1.0        tools_4.4.1             htmltools_0.5.8.1