synapter is free and open-source software. If you use it, please support the project by citing it in publications:
Nicholas James Bond, Pavel Vyacheslavovich Shliaha, Kathryn S. Lilley, and Laurent Gatto. Improving qualitative and quantitative performance for MSE-based label free proteomics. J. Proteome Res., 2013, 12 (6), pp 2340–2353
For bugs, typos, suggestions or other questions, please file an issue
in our tracking system (https://github.com/lgatto/synapter/issues) providing as
much information as possible, a reproducible example and the output of
sessionInfo()
.
If you don’t have a GitHub account or wish to reach a broader audience for general questions about proteomics analysis using R, you may want to use the Bioconductor support site: https://support.bioconductor.org/.
Here we describe the new functionality implemented in synapter 2.0. Namely this vignette covers the utilisation of the new 3D grid search, the fragment matching, intensity modeling and correction of detector saturation.
The synapter2 workflow is similar to the old one in
synapter1. First it is necessary to use PLGS to create
the csv (and xml) files. Therefore we refer the reader
to the default synapter
vignette, available online
and with vignette("synapter", package = "synapter")
.
In contrast to the original workflow the
final_fragment.csv
file for the identification run and a
Spectrum.xml
file for the quantification run are needed if
the fragment matching should be applied.
Subsequently the original workflow is enhanced by the new 3D grid search and the intensity modeling. Afterwards the fragment matching could be applied. MSnbase
(Gatto and Lilley 2012) is used for
further analysis. The new synapter
adds synapter/PLGS consensus
filtering and the detector saturation
correction for MSnSet
s.
Synapter
objectTo demonstrate a typical step-by-step workflow we use example data
that are available on http://proteome.sysbiol.cam.ac.uk/lgatto/synapter/data/.
There is also an synobj2
object in synapterdata
which contains the same data.
The Synapter
constructor uses a named list
of input files. Please note that we add identfragments
(final_fragment.csv
) and quantspectra
(Spectrum.xml
) because we want to apply the fragment
matching later.
## Please find the raw data at:
## http://proteome.sysbiol.cam.ac.uk/lgatto/synapter/data/
library("synapter")
inlist <- list(
identpeptide = "fermentor_03_sample_01_HDMSE_01_IA_final_peptide.csv.gz",
identfragments = "fermentor_03_sample_01_HDMSE_01_IA_final_fragment.csv.gz",
quantpeptide = "fermentor_02_sample_01_HDMSE_01_IA_final_peptide.csv.gz",
quantpep3d = "fermentor_02_sample_01_HDMSE_01_Pep3DAMRT.csv.gz",
quantspectra = "fermentor_02_sample_01_HDMSE_01_Pep3D_Spectrum.xml.gz",
fasta = "S.cerevisiae_Uniprot_reference_canonical_18_03_14.fasta")
synobj2 <- Synapter(inlist, master=FALSE)
## Object of class "Synapter"
## Class version 2.0.0
## Package version 1.99.0
## Data files:
## + Identification pep file: fermentor_03_sample_01_HDMSE_01_IA_final_peptide.csv.gz
## + Identification Fragment file: fermentor_02_sample_01_HDMSE_01_Pep3D_Spectrum.xml.gz
## + Quantitation pep file: fermentor_02_sample_01_HDMSE_01_IA_final_peptide.csv.gz
## + Quantitation Pep3DAMRT file: fermentor_02_sample_01_HDMSE_01_Pep3DAMRT.csv.gz
## + Quantitation Spectrum file: fermentor_02_sample_01_HDMSE_01_Pep3D_Spectrum.xml.gz
## + Fasta file: S.cerevisiae_Uniprot_reference_canonical_18_03_14.fasta
## Log:
## [1] "Instance created on Tue Apr 4 09:01:34 2017"
## [2] "Read identification peptide data [11471,16]"
## [ 10 lines ]
## [13] "Read identification fragment data [7078]"
## [14] "Read quantitation spectra [36539]"
The first steps in each synapter analysis are filtering by peptide sequence, peptide length, ppm error and false positive rate.
Here we use the default values for each method. But the accompanying plotting methods should be used to find the best threshold:
filterUniqueDbPeptides(synobj2,
missedCleavages=0,
IisL=TRUE)
filterPeptideLength(synobj2, l=7)
plotFdr(synobj2)
filterQuantPepScore(synobj2, method="BH",
fdr=0.05)
filterIdentPepScore(synobj2, method="BH",
fdr=0.05)
par(mfcol=c(1, 2))
plotPpmError(synobj2, what="Ident")
plotPpmError(synobj2, what="Quant")
Next we merge the identified peptides from the identification run and quantification run and build a LOWESS based retention time model to remove systematic shifts in the retention times. Here we use the default values but as stated above the plotting methods should be used to find sensible thresholds.
To find EMRTS (exact m/z-retention time pairs) we try are running a grid search to find the best retention time tolerance and m/z tolerance that results in the most correct one-to-one matching in the merged (already identified) data. If the identification and quantitation run are HDMSE data we could use the new 3D grid search that looks for the best matching in the retention time, m/z and ion mobility (drift time) domain to increase the accuracy. If one or both datasets are MSE data it falls back to the traditional 2D grid search.
For the details of the fragment matching procedure we refer to the
fragment matching vignette that is available online
and with
vignette("fragmentmatching", package = "synapter")
. Briefly
we compare the fragments of the identification run with the spectra from
the quantification run and remove entries where there are very few/none
common peaks/fragments between them.
First we starting by removing less intense fragments and peaks.
filterFragments(synobj2,
what="fragments.ident",
minIntensity=70)
filterFragments(synobj2,
what="spectra.quant",
minIntensity=70)
Next we look for common peaks via fragmentMatching
:
## Warning in object$fragmentMatching(verbose = verbose): FragmentMatching ppm
## tolerance undefined. Setting to default value.
We get tables for unique and non-unique matches:
ncommon | tp | fp | tn | fn | all | fdr |
---|---|---|---|---|---|---|
0 | 3097 | 79 | 0 | 0 | 4159 | 0.0248741 |
1 | 2556 | 22 | 57 | 541 | 2864 | 0.0085337 |
2 | 1682 | 12 | 67 | 1415 | 1864 | 0.0070838 |
3 | 1193 | 4 | 75 | 1904 | 1339 | 0.0033417 |
4 | 862 | 1 | 78 | 2235 | 987 | 0.0011587 |
5 | 651 | 1 | 78 | 2446 | 754 | 0.0015337 |
6 | 488 | 1 | 78 | 2609 | 578 | 0.0020450 |
7 | 372 | 1 | 78 | 2725 | 453 | 0.0026810 |
8 | 294 | 0 | 79 | 2803 | 367 | 0.0000000 |
9 | 240 | 0 | 79 | 2857 | 307 | 0.0000000 |
10 | 186 | 0 | 79 | 2911 | 245 | 0.0000000 |
11 | 149 | 0 | 79 | 2948 | 197 | 0.0000000 |
12 | 124 | 0 | 79 | 2973 | 165 | 0.0000000 |
13 | 95 | 0 | 79 | 3002 | 126 | 0.0000000 |
14 | 82 | 0 | 79 | 3015 | 105 | 0.0000000 |
15 | 59 | 0 | 79 | 3038 | 76 | 0.0000000 |
16 | 48 | 0 | 79 | 3049 | 61 | 0.0000000 |
17 | 41 | 0 | 79 | 3056 | 53 | 0.0000000 |
18 | 27 | 0 | 79 | 3070 | 36 | 0.0000000 |
19 | 21 | 0 | 79 | 3076 | 26 | 0.0000000 |
20 | 14 | 0 | 79 | 3083 | 16 | 0.0000000 |
21 | 12 | 0 | 79 | 3085 | 14 | 0.0000000 |
22 | 8 | 0 | 79 | 3089 | 10 | 0.0000000 |
23 | 6 | 0 | 79 | 3091 | 8 | 0.0000000 |
24 | 5 | 0 | 79 | 3092 | 6 | 0.0000000 |
25 | 4 | 0 | 79 | 3093 | 5 | 0.0000000 |
26 | 3 | 0 | 79 | 3094 | 3 | 0.0000000 |
27 | 3 | 0 | 79 | 3094 | 3 | 0.0000000 |
28 | 1 | 0 | 79 | 3096 | 1 | 0.0000000 |
deltacommon | tp | fp | tn | fn | all | fdr |
---|---|---|---|---|---|---|
0 | 163 | 33 | 0 | 0 | 4159 | 0.1683673 |
1 | 163 | 2 | 31 | 0 | 374 | 0.0121212 |
2 | 132 | 1 | 32 | 31 | 304 | 0.0075188 |
3 | 119 | 1 | 32 | 44 | 276 | 0.0083333 |
4 | 110 | 0 | 33 | 53 | 256 | 0.0000000 |
5 | 95 | 0 | 33 | 68 | 226 | 0.0000000 |
6 | 85 | 0 | 33 | 78 | 202 | 0.0000000 |
7 | 75 | 0 | 33 | 88 | 178 | 0.0000000 |
8 | 67 | 0 | 33 | 96 | 161 | 0.0000000 |
9 | 60 | 0 | 33 | 103 | 145 | 0.0000000 |
10 | 53 | 0 | 33 | 110 | 127 | 0.0000000 |
11 | 43 | 0 | 33 | 120 | 106 | 0.0000000 |
12 | 37 | 0 | 33 | 126 | 92 | 0.0000000 |
13 | 27 | 0 | 33 | 136 | 71 | 0.0000000 |
14 | 20 | 0 | 33 | 143 | 56 | 0.0000000 |
15 | 12 | 0 | 33 | 151 | 37 | 0.0000000 |
16 | 10 | 0 | 33 | 153 | 26 | 0.0000000 |
17 | 8 | 0 | 33 | 155 | 21 | 0.0000000 |
18 | 7 | 0 | 33 | 156 | 18 | 0.0000000 |
19 | 4 | 0 | 33 | 159 | 9 | 0.0000000 |
20 | 2 | 0 | 33 | 161 | 5 | 0.0000000 |
21 | 2 | 0 | 33 | 161 | 5 | 0.0000000 |
22 | 2 | 0 | 33 | 161 | 5 | 0.0000000 |
23 | 1 | 0 | 33 | 162 | 3 | 0.0000000 |
24 | 1 | 0 | 33 | 162 | 3 | 0.0000000 |
25 | 1 | 0 | 33 | 162 | 3 | 0.0000000 |
Subsequently we could filter by minimal accepted common peaks:
filterUniqueMatches(synobj2, minNumber=1)
filterNonUniqueMatches(synobj2, minDelta=2)
filterNonUniqueIdentMatches(synobj2)
Finally we rescue EMRTs that are filtered but were identified by PLGS:
In a similar manner as correcting for the retention time drift we
correct systematic errors of the intensity via a LOWESS
model. The function modelIntensity
has to applied after
findEMRTs
. The model is build on the merged peptides as it
is done for the retention time model. But in contrast to the retention
time model the prediction is necessary for the matched quantitation
data.
synergise2
The whole workflow described in the step-by-step workflow is wrapped in the
synergise2
function. As side effect it generates a nice
HTML report. An example could be found on https://github.com/lgatto/synapter.
For the next steps we need to convert the Synapter
object into an MSnSet
.
Subsequently we look for synapter/PLGS agreement
(this is more useful for a combined MSnSet
; see basic
synapter
online
or with vignette("synapter", package = "synapter")
).
synapterPlgsAgreement
adds an agreement column for each
sample and counts the agreement/disagreement in additional columns:
msn <- synapterPlgsAgreement(msn)
knitr::kable(head(fData(msn)[, grepl("[Aa]gree",
fvarLabels(msn)),
drop=FALSE]))
synapterPlgsAgreement.fermentor_02_sample_01_HDMSE_01 | nAgree | nDisagree | synapterPlgsAgreementRatio | |
---|---|---|---|---|
NVNDVIAPAFVK | agree | 1 | 0 | 1 |
TAGIQIVADDLTVTNPK | agree | 1 | 0 | 1 |
WLTGPQLADLYHSLMK | agree | 1 | 0 | 1 |
AAQDSFAAGWGVMVSHR | agree | 1 | 0 | 1 |
TFAEALR | agree | 1 | 0 | 1 |
LGANAILGVSLAASR | no_synapter_transfer | 0 | 0 | NaN |
As described in (Shliaha et al. 2013)
Synapt G2 devices suffer from detector saturation. This could be partly
corrected by requantify
. Therefore a
saturationThreshold
has to be given above that intensity
saturation potentially happens. There are several methods available.
If an MSnSet
object was requantified using the
"sum"
requantification method TOP3 normalisation
is not valid anymore because the most abundant proteins are penalised by
removing high intensity isotopes (for details see
?requantify
and ?rescaleForTop3
). This could
be overcome by calling rescaleForTop3
:
Since synapter
2.0 makeMaster
supports fragment files as well. It is
possible to create a fragment library that could used for fragment matching because of the large data
this could not covered in this vignette. An introduction how to create a
master could be found in the basic synapter
vignette, available online
or with vignette("synapter", package = "synapter")
. Please
find details about creating a fragment library in
?makeMaster
.
All software and respective versions used to produce this document are listed below.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] qvalue_2.39.0 BiocStyle_2.35.0 synapterdata_1.44.0
## [4] synapter_2.31.0 MSnbase_2.33.2 ProtGenerics_1.39.1
## [7] S4Vectors_0.45.2 mzR_2.41.1 Rcpp_1.0.13-1
## [10] Biobase_2.67.0 BiocGenerics_0.53.3 generics_0.1.3
## [13] rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] rlang_1.1.4 magrittr_2.0.3
## [3] clue_0.3-66 cleaver_1.45.0
## [5] matrixStats_1.4.1 compiler_4.4.2
## [7] vctrs_0.6.5 reshape2_1.4.4
## [9] stringr_1.5.1 pkgconfig_2.0.3
## [11] crayon_1.5.3 fastmap_1.2.0
## [13] XVector_0.47.0 tzdb_0.4.0
## [15] UCSC.utils_1.3.0 preprocessCore_1.69.0
## [17] bit_4.5.0.1 purrr_1.0.2
## [19] xfun_0.49 MultiAssayExperiment_1.33.1
## [21] zlibbioc_1.52.0 cachem_1.1.0
## [23] GenomeInfoDb_1.43.2 jsonlite_1.8.9
## [25] DelayedArray_0.33.3 BiocParallel_1.41.0
## [27] parallel_4.4.2 cluster_2.1.8
## [29] R6_2.5.1 RColorBrewer_1.1-3
## [31] bslib_0.8.0 stringi_1.8.4
## [33] limma_3.63.2 GenomicRanges_1.59.1
## [35] jquerylib_0.1.4 SummarizedExperiment_1.37.0
## [37] iterators_1.0.14 knitr_1.49
## [39] readr_2.1.5 IRanges_2.41.2
## [41] splines_4.4.2 Matrix_1.7-1
## [43] igraph_2.1.2 tidyselect_1.2.1
## [45] abind_1.4-8 yaml_2.3.10
## [47] doParallel_1.0.17 codetools_0.2-20
## [49] affy_1.85.0 lattice_0.22-6
## [51] tibble_3.2.1 plyr_1.8.9
## [53] evaluate_1.0.1 survival_3.8-3
## [55] Biostrings_2.75.3 pillar_1.10.0
## [57] affyio_1.77.1 BiocManager_1.30.25
## [59] MatrixGenerics_1.19.0 foreach_1.5.2
## [61] MALDIquant_1.22.3 ncdf4_1.23
## [63] vroom_1.6.5 hms_1.1.3
## [65] ggplot2_3.5.1 munsell_0.5.1
## [67] scales_1.3.0 glue_1.8.0
## [69] lazyeval_0.2.2 maketools_1.3.1
## [71] tools_4.4.2 mzID_1.45.0
## [73] sys_3.4.3 QFeatures_1.17.0
## [75] vsn_3.75.0 buildtools_1.0.0
## [77] XML_3.99-0.17 grid_4.4.2
## [79] impute_1.81.0 tidyr_1.3.1
## [81] MsCoreUtils_1.19.0 colorspace_2.1-1
## [83] GenomeInfoDbData_1.2.13 PSMatch_1.11.0
## [85] cli_3.6.3 S4Arrays_1.7.1
## [87] dplyr_1.1.4 AnnotationFilter_1.31.0
## [89] pcaMethods_1.99.0 gtable_0.3.6
## [91] sass_0.4.9 digest_0.6.37
## [93] SparseArray_1.7.2 multtest_2.63.0
## [95] htmltools_0.5.8.1 lifecycle_1.0.4
## [97] httr_1.4.7 statmod_1.5.0
## [99] bit64_4.5.2 MASS_7.3-61