Core Utils for Mass Spectrometry Data

Introduction

The MsCoreUtils package low-level functions for mass spectrometry data and is independent of any high-level data structures [@rainer_modular_2022]. These functions include mass spectra processing functions (noise estimation, smoothing, binning), quantitative aggregation functions (median polish, robust summarisation, …), missing data imputation, data normalisation (quantiles, vsn, …) as well as misc helper functions, that are used across high level data structure within the R for Mass Spectrometry packages.

For a full list of function, see

library("MsCoreUtils")
ls(pos = "package:MsCoreUtils")
##  [1] "%between%"                  "aggregate_by_matrix"       
##  [3] "aggregate_by_vector"        "asInteger"                 
##  [5] "between"                    "bin"                       
##  [7] "breaks_ppm"                 "closest"                   
##  [9] "coefMA"                     "coefSG"                    
## [11] "coefWMA"                    "colCounts"                 
## [13] "colMeansMat"                "colSumsMat"                
## [15] "common"                     "common_path"               
## [17] "entropy"                    "estimateBaseline"          
## [19] "estimateBaselineConvexHull" "estimateBaselineMedian"    
## [21] "estimateBaselineSnip"       "estimateBaselineTopHat"    
## [23] "force_sorted"               "formatRt"                  
## [25] "getImputeMargin"            "gnps"                      
## [27] "group"                      "i2index"                   
## [29] "imputeMethods"              "impute_MinDet"             
## [31] "impute_MinProb"             "impute_QRILC"              
## [33] "impute_RF"                  "impute_bpca"               
## [35] "impute_fun"                 "impute_knn"                
## [37] "impute_matrix"              "impute_min"                
## [39] "impute_mixed"               "impute_mle"                
## [41] "impute_neighbour_average"   "impute_with"               
## [43] "impute_zero"                "isPeaksMatrix"             
## [45] "join"                       "join_gnps"                 
## [47] "localMaxima"                "maxi"                      
## [49] "medianPolish"               "navdist"                   
## [51] "ndotproduct"                "nentropy"                  
## [53] "neuclidean"                 "noise"                     
## [55] "normalizeMethods"           "normalize_matrix"          
## [57] "nspectraangle"              "ppm"                       
## [59] "rbindFill"                  "refineCentroids"           
## [61] "rla"                        "robustSummary"             
## [63] "rowRla"                     "rt2character"              
## [65] "rt2numeric"                 "smooth"                    
## [67] "sumi"                       "validPeaksMatrix"          
## [69] "valleys"                    "vapply1c"                  
## [71] "vapply1d"                   "vapply1l"                  
## [73] "which.first"                "which.last"

or the reference page on the package webpage.

Examples

The functions defined in this package utilise basic classes with the aim of being reused in packages that provide a more formal, high-level interface.

As an examples, let’s take the robustSummary() function, that calculates the robust summary of the columns of a matrix:

x <- matrix(rnorm(30), nrow = 3)
colnames(x) <- letters[1:10]
rownames(x) <- LETTERS[1:3]
x
##            a          b           c          d         e          f          g
## A -0.2390398  0.7441602  0.66671086 -0.3417190 1.0376089  0.7015737 -1.0101218
## B  1.2431537  1.5122926 -0.09666792 -0.3917515 0.5201331 -0.2060331 -1.4995261
## C -1.0423441 -1.2559203  0.60933228 -0.1617715 1.6093377 -0.8806621 -0.4912914
##            h          i          j
## A  0.6027517  0.6041237  1.3676467
## B -0.6075998 -1.5778086 -2.0651038
## C -0.2237094  0.3872690 -0.9650001
robustSummary(x)
## Warning in rlm.default(X, expression, ...): 'rlm' failed to converge in 20
## steps
##           a           b           c           d           e           f 
## -0.48150267  0.33461016  0.39312508 -0.29841399  1.05569325 -0.11642669 
##           g           h           i           j 
## -1.00031313 -0.07618582 -0.03656771 -0.92819752

This function is typicall to be used to summarise peptide quantitation values into protein intensities1. This functionality is available in

Contributions

If you would like to contribute any low-level functionality, please open a GitHub issue to discuss it. Please note that any contributions should follow the style guide and will require an appropriate unit test.

If you wish to reuse any functions in this package, please just go ahead. If you would like any advice or seek help, please either open a GitHub issue.

Session information

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MsCoreUtils_1.19.0 BiocStyle_2.35.0  
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.3           knitr_1.49          rlang_1.1.4        
##  [4] xfun_0.50           generics_0.1.3      jsonlite_1.8.9     
##  [7] clue_0.3-66         S4Vectors_0.45.2    buildtools_1.0.0   
## [10] htmltools_0.5.8.1   maketools_1.3.1     sys_3.4.3          
## [13] sass_0.4.9          stats4_4.4.2        rmarkdown_2.29     
## [16] evaluate_1.0.3      jquerylib_0.1.4     MASS_7.3-64        
## [19] fastmap_1.2.0       yaml_2.3.10         lifecycle_1.0.4    
## [22] BiocManager_1.30.25 cluster_2.1.8       compiler_4.4.2     
## [25] digest_0.6.37       R6_2.5.1            bslib_0.8.0        
## [28] tools_4.4.2         BiocGenerics_0.53.3 cachem_1.1.0

References


  1. See Sticker et al. Robust summarization and inference in proteome-wide label-free quantification. https://doi.org/10.1101/668863.↩︎