In this document we are presenting a workflow of retention-time alignment across multiple Targeted-MS (e.g. DIA, SWATH-MS, PRM, SRM) runs using DIAlignR. This tool requires MS2 chromatograms and provides a hybrid approach of global and local alignment to establish correspondence between peaks.
Mass-spectrometry files mostly contains spectra. Targeted proteomics
workflow identifyies analytes from their chromatographic elution
profile. DIAlignR extends the same concept for retention-time (RT)
alignment and, therefore, relies on MS2 chromatograms. DIAlignR expects
raw chromatogram file (.chrom.sqMass) and FDR-scored features (.osw)
file.
Example files are available with this package and can be located with
this command:
bash
commands to be used:OpenSwathWorkflow -in Filename.mzML.gz -tr library.pqp -tr_irt
iRTassays.TraML -out_osw Filename.osw -out_chrom Filename.chrom.mzML
OpenSwathMzMLFileCacher -in Filename.chrom.mzML -out Filename.chrom.sqMass -lossy_compression false
Note: If you prefer to use chrom.mzML instead of chrom.sqMass, some
chromatograms are stored in compressed form and currently inaccesible by
mzR
. In such cases mzR
would throw an error
indicating Invalid cvParam accession "1002746"
. To avoid
this issue, uncompress chromatograms using OpenMS.
pyprophet merge --template=library.pqp --out=merged.osw *.osw
pyprophet score --in=merged.osw --classifier=XGBoost --level=ms1ms2
pyprophet peptide --in=merged.osw --context=experiment-wide
xics
directory and merged.osw file
in osw
directory. The parent folder is given as
dataPath
to DIAlignR functions.There are three modes for multirun alignment: star, MST and
Progressive. The functions align proteomics or metabolomics DIA runs.
They expect two directories “osw” and “xics” at dataPath
,
and output an intensity table where rows specify each analyte and
columns specify runs.
# For specific runs provide their names.
alignTargetedRuns(dataPath = dataPath, outFile = "test", runs = runs, oswMerged = TRUE, params = params)
# For all the analytes in all runs, keep them as NULL.
alignTargetedRuns(dataPath = dataPath, outFile = "test", runs = NULL, oswMerged = TRUE, params = params)
For MST alignment, a precomputed guide-tree can be supplied.
Similar to previous approach, a precomputed guide-tree can be supplied.
text1 <- "(run1:0.08857142857,(run0:0.06857142857,run2:0.06857142857)masterB:0.02)master1;"
progAlignRuns(dataPath = dataPath, outFile = "test", newickTree = text1, oswMerged = TRUE, params = params)
# Compute tree on-the-fly
progAlignRuns(dataPath = dataPath, outFile = "test", oswMerged = TRUE, params = params)
In a large-scale study, the pyprophet merge
would create
a huge file that can’t be fit in the memory. Hence, scaling-up
of pyprophet based on subsampling is recommended. Do not run the last
two commands pyprophet backpropagate
and
pyprophet export
, as these commands copy scores from
model_global.osw
to each run, increasing the size
unnecessarily.
Instead, use oswMerged = FALSE
and
scoreFile=PATH/TO/model_global.osw
.
For getting alignment object which has aligned indices of XICs
getAlignObjs
function can be used. Like previous function,
it expects two directories “osw” and “xics” at dataPath
. It
performs alignment for exactly two runs. In case of refRun
is not provided, m-score from osw files is used to select reference
run.
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
"hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
AlignObjLight <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, objType = "light", params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
# First element contains names of runs, spectra files, chromatogram files and feature files.
AlignObjLight[[1]][, c("runName", "spectraFile")]
#> runName
#> run1 hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt
#> run2 hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt
#> spectraFile
#> run1 data/raw/hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.mzML.gz
#> run2 data/raw/hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.mzML.gz
obj <- AlignObjLight[[2]][["4618"]][[1]][["AlignObj"]]
slotNames(obj)
#> [1] "indexA_aligned" "indexB_aligned" "score"
names(as.list(obj))
#> [1] "indexA_aligned" "indexB_aligned" "score"
AlignObjMedium <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, objType = "medium", params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
obj <- AlignObjMedium[[2]][["4618"]][[1]][["AlignObj"]]
slotNames(obj)
#> [1] "s" "path" "indexA_aligned" "indexB_aligned"
#> [5] "score"
Alignment object has slots * indexA_aligned aligned indices of reference chromatogram. * indexB_aligned aligned indices of experiment chromatogram * score cumulative score of the alignment till an index. * s similarity score matrix. * path path of the alignment through similarity score matrix.
We can visualize aligned chromatograms using
plotAlignedAnalytes
. The top figure is experiment
unaligned-XICs, middle one is reference XICs, last figure is experiment
run aligned to reference.
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
"hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
AlignObj <- getAlignObjs(analytes = 4618L, runs = runs, dataPath = dataPath, params = params)
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
plotAlignedAnalytes(AlignObj, annotatePeak = TRUE)
#> Warning: Removed 30 rows containing missing values or values outside the scale range
#> (`geom_line()`).
We can also visualize the alignment path using
plotAlignemntPath
function.
library(lattice)
runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt",
"hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt")
AlignObjOutput <- getAlignObjs(analytes = 4618L, runs = runs, params = params, dataPath = dataPath, objType = "medium")
#> [1] "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"
#> [2] "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"
#> [1] "Finding reference run using SCORE_PEPTIDE table"
plotAlignmentPath(AlignObjOutput)
Gupta S, Ahadi S, Zhou W, Röst H. “DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics.” Mol Cell Proteomics. 2019 Apr;18(4):806-817. doi: https://doi.org/10.1074/mcp.TIR118.001132
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] lattice_0.22-6 DIAlignR_2.15.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] fastmatch_1.1-4 gtable_0.3.6 xfun_0.49
#> [4] bslib_0.8.0 ggplot2_3.5.1 latticeExtra_0.6-30
#> [7] quadprog_1.5-8 vctrs_0.6.5 tools_4.4.2
#> [10] generics_0.1.3 parallel_4.4.2 tibble_3.2.1
#> [13] fansi_1.0.6 RSQLite_2.3.8 blob_1.2.4
#> [16] pkgconfig_2.0.3 Matrix_1.7-1 data.table_1.16.2
#> [19] RColorBrewer_1.1-3 lifecycle_1.0.4 deldir_2.0-4
#> [22] farver_2.1.2 compiler_4.4.2 munsell_0.5.1
#> [25] codetools_0.2-20 htmltools_0.5.8.1 sys_3.4.3
#> [28] buildtools_1.0.0 sass_0.4.9 yaml_2.3.10
#> [31] pracma_2.4.4 pillar_1.9.0 jquerylib_0.1.4
#> [34] tidyr_1.3.1 MASS_7.3-61 cachem_1.1.0
#> [37] nlme_3.1-166 phangorn_2.12.1 tidyselect_1.2.1
#> [40] digest_0.6.37 dplyr_1.1.4 purrr_1.0.2
#> [43] maketools_1.3.1 labeling_0.4.3 fastmap_1.2.0
#> [46] grid_4.4.2 colorspace_2.1-1 cli_3.6.3
#> [49] magrittr_2.0.3 utf8_1.2.4 ape_5.8
#> [52] withr_3.0.2 scales_1.3.0 bit64_4.5.2
#> [55] rmarkdown_2.29 jpeg_0.1-10 interp_1.1-6
#> [58] signal_1.8-1 igraph_2.1.1 bit_4.5.0
#> [61] reticulate_1.40.0 gridExtra_2.3 zoo_1.8-12
#> [64] png_0.1-8 RMSNumpress_1.0.1 memoise_2.0.1
#> [67] evaluate_1.0.1 knitr_1.49 rlang_1.1.4
#> [70] Rcpp_1.0.13-1 glue_1.8.0 DBI_1.2.3
#> [73] BiocManager_1.30.25 jsonlite_1.8.9 R6_2.5.1