MS2 chromatograms based alignment of targeted mass-spectrometry runs

In this document we are presenting a workflow of retention-time alignment across multiple Targeted-MS (e.g. DIA, SWATH-MS, PRM, SRM) runs using DIAlignR. This tool requires MS2 chromatograms and provides a hybrid approach of global and local alignment to establish correspondence between peaks.

Install DIAlignR

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("DIAlignR")
library(DIAlignR)

Prepare input files for alignment

Mass-spectrometry files mostly contains spectra. Targeted proteomics workflow identifyies analytes from their chromatographic elution profile. DIAlignR extends the same concept for retention-time (RT) alignment and, therefore, relies on MS2 chromatograms. DIAlignR expects raw chromatogram file (.chrom.sqMass) and FDR-scored features (.osw) file.
Example files are available with this package and can be located with this command:

dataPath <- system.file("extdata", package = "DIAlignR")
    (Optional) To obtain files for alignment, following three steps are needed (Paper):
  • Step 1 Convert spectra files from vendor-specific format to standard mzMl format using ProteoWizard-MSConvert.
  • Step 2 Extract features and raw extracted-ion chromatograms(XICs) for library analytes. A detailed tutorial using OpenSWATH is available for this steps. In short, following bash commands to be used:
OpenSwathWorkflow -in Filename.mzML.gz -tr library.pqp -tr_irt
iRTassays.TraML -out_osw Filename.osw -out_chrom Filename.chrom.mzML
OpenSwathMzMLFileCacher -in Filename.chrom.mzML -out Filename.chrom.sqMass -lossy_compression false
    Output files Filename.osw and Filename.chrom.sqMass are required for next steps.

Note: If you prefer to use chrom.mzML instead of chrom.sqMass, some chromatograms are stored in compressed form and currently inaccesible by mzR. In such cases mzR would throw an error indicating Invalid cvParam accession "1002746". To avoid this issue, uncompress chromatograms using OpenMS.

FileConverter -in Filename.chrom.mzML -in_type 'mzML' -out Filename.chrom.mzML
  • Step 3: Score features and calculate their q-values. A machine-learning based workflow is available with PyProphet. For multiplt-runs experiment-wide FDR is recommended. This step is suggested only for small studies (n<30). For large n, look into the large-scale analysis section. An example of running pyprophet on OpenSWATH results is given below:
pyprophet merge --template=library.pqp --out=merged.osw *.osw
pyprophet score --in=merged.osw --classifier=XGBoost --level=ms1ms2
pyprophet peptide --in=merged.osw --context=experiment-wide
  Congrats! Now we have raw chromatogram files and associated scored features in merged.osw files. Move all .chrom.sqMass files in xics directory and merged.osw file in osw directory. The parent folder is given as dataPath to DIAlignR functions.

Performing alignment on DIA runs

There are three modes for multirun alignment: star, MST and Progressive. The functions align proteomics or metabolomics DIA runs. They expect two directories “osw” and “xics” at dataPath, and output an intensity table where rows specify each analyte and columns specify runs.

runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
          "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
params <- paramsDIAlignR()
params[["context"]] <- "experiment-wide"

Star alignmnet

# For specific runs provide their names.
alignTargetedRuns(dataPath = dataPath, outFile = "test", runs = runs, oswMerged = TRUE, params = params)
# For all the analytes in all runs, keep them as NULL.
alignTargetedRuns(dataPath = dataPath, outFile = "test", runs = NULL, oswMerged = TRUE, params = params)

MST alignmnet

For MST alignment, a precomputed guide-tree can be supplied.

tree <- "run2 run2\nrun1 run0"
mstAlignRuns(dataPath = dataPath, outFile = "test", mstNet = tree, oswMerged = TRUE, params = params)
# Compute tree on-the-fly
mstAlignRuns(dataPath = dataPath, outFile = "test", oswMerged = TRUE, params = params)

Progressive alignmnet

Similar to previous approach, a precomputed guide-tree can be supplied.

text1 <- "(run1:0.08857142857,(run0:0.06857142857,run2:0.06857142857)masterB:0.02)master1;"
progAlignRuns(dataPath = dataPath, outFile = "test", newickTree = text1, oswMerged = TRUE, params = params)
# Compute tree on-the-fly
progAlignRuns(dataPath = dataPath, outFile = "test", oswMerged = TRUE, params = params)

For large-scale analysis

In a large-scale study, the pyprophet merge would create a huge file that can’t be fit in the memory. Hence, scaling-up of pyprophet based on subsampling is recommended. Do not run the last two commands pyprophet backpropagate and pyprophet export, as these commands copy scores from model_global.osw to each run, increasing the size unnecessarily.

Instead, use oswMerged = FALSE and scoreFile=PATH/TO/model_global.osw.

Requantification

Visualizing multiple chromatograms

Investigating alignment of analytes

For getting alignment object which has aligned indices of XICs getAlignObjs function can be used. Like previous function, it expects two directories “osw” and “xics” at dataPath. It performs alignment for exactly two runs. In case of refRun is not provided, m-score from osw files is used to select reference run.

runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
          "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
AlignObjLight <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, objType   = "light", params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" 
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
# First element contains names of runs, spectra files, chromatogram files and feature files.
AlignObjLight[[1]][, c("runName", "spectraFile")]
#>                                                 runName
#> run1  hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt
#> run2 hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt
#>                                                              spectraFile
#> run1  data/raw/hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.mzML.gz
#> run2 data/raw/hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.mzML.gz
obj <- AlignObjLight[[2]][["4618"]][[1]][["AlignObj"]]
slotNames(obj)
#> [1] "indexA_aligned" "indexB_aligned" "score"
names(as.list(obj))
#> [1] "indexA_aligned" "indexB_aligned" "score"
AlignObjMedium <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, objType  = "medium", params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" 
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
obj <- AlignObjMedium[[2]][["4618"]][[1]][["AlignObj"]]
slotNames(obj)
#> [1] "s"              "path"           "indexA_aligned" "indexB_aligned"
#> [5] "score"

Alignment object has slots * indexA_aligned aligned indices of reference chromatogram. * indexB_aligned aligned indices of experiment chromatogram * score cumulative score of the alignment till an index. * s similarity score matrix. * path path of the alignment through similarity score matrix.

Visualizing the aligned chromatograms

We can visualize aligned chromatograms using plotAlignedAnalytes. The top figure is experiment unaligned-XICs, middle one is reference XICs, last figure is experiment run aligned to reference.

runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
 "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
AlignObj <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" 
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
plotAlignedAnalytes(AlignObj, annotatePeak = TRUE)
#> Warning: Removed 30 rows containing missing values or values outside the scale range
#> (`geom_line()`).

Visualizing the alignment path

We can also visualize the alignment path using plotAlignemntPath function.

library(lattice)
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
 "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
AlignObjOutput <- getAlignObjs(analytes = 4618L, runs = runs, params = params, dataPath = dataPath, objType = "medium")
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" 
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
plotAlignmentPath(AlignObjOutput)

Citation

Gupta S, Ahadi S, Zhou W, Röst H. “DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics.” Mol Cell Proteomics. 2019 Apr;18(4):806-817. doi: https://doi.org/10.1074/mcp.TIR118.001132

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] lattice_0.22-6   DIAlignR_2.15.0  BiocStyle_2.35.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] fastmatch_1.1-4     gtable_0.3.6        xfun_0.49          
#>  [4] bslib_0.8.0         ggplot2_3.5.1       latticeExtra_0.6-30
#>  [7] quadprog_1.5-8      vctrs_0.6.5         tools_4.4.2        
#> [10] generics_0.1.3      parallel_4.4.2      tibble_3.2.1       
#> [13] fansi_1.0.6         RSQLite_2.3.8       blob_1.2.4         
#> [16] pkgconfig_2.0.3     Matrix_1.7-1        data.table_1.16.2  
#> [19] RColorBrewer_1.1-3  lifecycle_1.0.4     deldir_2.0-4       
#> [22] farver_2.1.2        compiler_4.4.2      munsell_0.5.1      
#> [25] codetools_0.2-20    htmltools_0.5.8.1   sys_3.4.3          
#> [28] buildtools_1.0.0    sass_0.4.9          yaml_2.3.10        
#> [31] pracma_2.4.4        pillar_1.9.0        jquerylib_0.1.4    
#> [34] tidyr_1.3.1         MASS_7.3-61         cachem_1.1.0       
#> [37] nlme_3.1-166        phangorn_2.12.1     tidyselect_1.2.1   
#> [40] digest_0.6.37       dplyr_1.1.4         purrr_1.0.2        
#> [43] maketools_1.3.1     labeling_0.4.3      fastmap_1.2.0      
#> [46] grid_4.4.2          colorspace_2.1-1    cli_3.6.3          
#> [49] magrittr_2.0.3      utf8_1.2.4          ape_5.8            
#> [52] withr_3.0.2         scales_1.3.0        bit64_4.5.2        
#> [55] rmarkdown_2.29      jpeg_0.1-10         interp_1.1-6       
#> [58] signal_1.8-1        igraph_2.1.1        bit_4.5.0          
#> [61] reticulate_1.40.0   gridExtra_2.3       zoo_1.8-12         
#> [64] png_0.1-8           RMSNumpress_1.0.1   memoise_2.0.1      
#> [67] evaluate_1.0.1      knitr_1.49          rlang_1.1.4        
#> [70] Rcpp_1.0.13-1       glue_1.8.0          DBI_1.2.3          
#> [73] BiocManager_1.30.25 jsonlite_1.8.9      R6_2.5.1