specL automatic report

Requirements

In the first step, the peptide identification result is generated by a standard shotgun proteomics experiment and has to be processed using the bibliospec software (Frewen and MacCoss 2007).

For generating the ion library, the specL is used. The workflow is described in (pmid25712692?).

The following R package has to be installed on the compute box.

library(specL)
## Loading required package: DBI
## Loading required package: protViz
## Loading required package: RSQLite
## Loading required package: seqinr
## 
## Attaching package: 'specL'
## The following objects are masked from 'package:protViz':
## 
##     plot.psm, plot.psmSet, summary.psmSet

This file can be rendered by using the following code snippet.

library(rmarkdown)
library(BiocStyle)
report_file <- tempfile(fileext='.Rmd'); 
file.copy(system.file("doc", "report.Rmd", 
                      package = "specL"), 
          report_file); 
rmarkdown::render(report_file, 
                  output_format='html_document', 
                  output_file='/tmp/report_specL.html')

Input

Parameter

If no INPUT is defined, the report uses the specL package’s data and the following default parameters.

if(!exists("INPUT")){
  INPUT <- list(FASTA_FILE 
      = system.file("extdata", "SP201602-specL.fasta.gz",
                    package = "specL"),
    BLIB_FILTERED_FILE 
      = system.file("extdata", "peptideStd.sqlite",
                    package = "specL"),
    BLIB_REDUNDANT_FILE 
      = system.file("extdata", "peptideStd_redundant.sqlite",
                    package = "specL"),
    MIN_IONS = 5,
    MAX_IONS = 6,
    MZ_ERROR = 0.05,
    MASCOTSCORECUTOFF = 17,
    FRAGMENTIONMZRANGE = c(300, 1250),
    FRAGMENTIONRANGE = c(5, 200),
    NORMRTPEPTIDES = specL::iRTpeptides,
    OUTPUT_LIBRARY_FILE = tempfile(fileext ='.csv'),
    RDATA_LIBRARY_FILE = tempfile(fileext ='.RData'),
    ANNOTATE = TRUE
    )
} 

The library generation workflow was performed using the following parameters:

used INPUT parameter
parameter.values
FASTA_FILE /tmp/RtmpEEXKmy/Rinst1ba92523e197/specL/extdata/SP201602-specL.fasta.gz
BLIB_FILTERED_FILE /tmp/RtmpEEXKmy/Rinst1ba92523e197/specL/extdata/peptideStd.sqlite
BLIB_REDUNDANT_FILE /tmp/RtmpEEXKmy/Rinst1ba92523e197/specL/extdata/peptideStd_redundant.sqlite
MIN_IONS 5
MAX_IONS 6
MZ_ERROR 0.05
MASCOTSCORECUTOFF 17
FRAGMENTIONMZRANGE 300, 1250
FRAGMENTIONRANGE 5, 200
OUTPUT_LIBRARY_FILE /tmp/RtmpZaA4oF/file1c5f39c3c66e.csv
RDATA_LIBRARY_FILE /tmp/RtmpZaA4oF/file1c5f97ee656.RData

Define the fragment ions of interest

The following R helper function is used for composing the in-silico fragment ions using protViz.

fragmentIonFunction_specL <- function (b, y) {
  Hydrogen <- 1.007825
  Oxygen <- 15.994915
  Nitrogen <- 14.003074
  b1_ <- (b )
  y1_ <- (y )
  b2_ <- (b + Hydrogen) / 2
  y2_ <- (y + Hydrogen) / 2 
  return( cbind(b1_, y1_, b2_, y2_) )
}

Read the sqlite files

BLIB_FILTERED <- read.bibliospec(INPUT$BLIB_FILTERED_FILE) 
## fetched 137 rows.
## assigning 28 modifications ...
summary(BLIB_FILTERED)
## Summary of a "psmSet" object.
## Number of precursor:
##  137
## Number of precursors in Filename(s)
##  _methods\20140910_01_fetuin_400amol_1.raw   21
##  _methods\20140910_07_fetuin_400amol_2.raw   116
## Number of annotated precursor:
##  0
BLIB_REDUNDANT <- read.bibliospec(INPUT$BLIB_REDUNDANT_FILE) 
## fetched 184 rows.
## assigning 37 modifications ...
summary(BLIB_REDUNDANT)
## Summary of a "psmSet" object.
## Number of precursor:
##  184
## Number of precursors in Filename(s)
##  _methods\20140910_01_fetuin_400amol_1.raw   32
##  _methods\20140910_07_fetuin_400amol_2.raw   152
## Number of annotated precursor:
##  0

Protein (re)-annotation

After processing the psm using bibliospec, the protein information is gone. The read.fasta function is provided by the CRAN package seqinr.

if(INPUT$ANNOTATE){
  FASTA <- read.fasta(INPUT$FASTA_FILE, 
                    seqtype = "AA", 
                    as.string = TRUE)

  BLIB_FILTERED <- annotate.protein_id(BLIB_FILTERED, 
                                       fasta = FASTA)
}
## start protein annotation ...
## time taken:  0.000795412063598633 minutes

Peptides used for RT normalization

The following peptides are used for retention time (RT) normalization. The last column indicates by FALSE|TRUE if a peptide is included in the data. The rows were ordered by the RT values.

peptides used for RT normaization.
peptide rt included
1 LGGNEQVTR -24.92000 FALSE
21 LGGNETQVR -24.92000 FALSE
2 GAGSSEPVTGLDAK 0.00000 TRUE
22 AGGSSEPVTGLADK 0.00000 FALSE
3 AAVYHHFISDGVR 10.48963 FALSE
4 VEATFGVDESNAK 12.39000 TRUE
23 VEATFGVDESANK 12.39000 FALSE
5 YILAGVENSK 19.79000 FALSE
24 YILAGVESNK 19.79000 FALSE
6 HIQNIDIQHLAGK 23.93091 FALSE
7 TPVISGGPYEYR 28.71000 TRUE
25 TPVISGGPYYER 28.71000 FALSE
8 TPVITGAPYEYR 33.38000 TRUE
26 TPVITGAPYYER 33.38000 FALSE
9 DGLDAASYYAPVR 42.26000 TRUE
27 GDLDAASYYAPVR 42.26000 FALSE
10 TEVSSNHVLIYLDK 43.54062 FALSE
11 ADVTPADFSEWSK 54.62000 TRUE
28 DAVTPADFSEWSK 54.62000 FALSE
12 LVAYYTLIGASGQR 64.15480 FALSE
13 GTFIIDPGGVIR 70.52000 TRUE
29 TGFIIDPGGVIR 70.52000 FALSE
14 TEHPFTVEEFVLPK 74.50968 FALSE
15 TTNIQGINLLFSSR 84.36927 FALSE
16 GTFIIDPAAVIR 87.23000 FALSE
30 GTFIIDPAAIVR 87.23000 FALSE
17 LFLQFGAQGSPFLK 100.00000 TRUE
31 FLLQFGAQGSPLFK 100.00000 FALSE
18 NQGNTWLTAFVLK 104.06935 FALSE
19 DSPVLIDFFEDTER 112.63426 FALSE
20 ITPNLAEFAFSLYR 122.24622 FALSE

Generate the ion library

specLibrary <- specL::genSwathIonLib(
  data = BLIB_FILTERED,
  data.fit = BLIB_REDUNDANT,
  max.mZ.Da.error = INPUT$MZ_ERROR,
  topN = INPUT$MAX_IONS,
  fragmentIonMzRange = INPUT$FRAGMENTIONMZRANGE,
  fragmentIonRange = INPUT$FRAGMENTIONRANGE,
  fragmentIonFUN = fragmentIonFunction_specL,
  mascotIonScoreCutOFF = INPUT$MASCOTSCORECUTOFF,
  iRT = INPUT$NORMRTPEPTIDES
  )

Library Generation Summary

Total Number of PSM’s with Mascot e-value < 0.05, in your search, is 184. The number of unique precursors is 137. The size of the generated ion library is 131. That means that 95.62 % of the unique precursors fulfilled the filtering criteria.

summary(specLibrary)
## Summary of a "specLSet" object.
## 
## Parameter:
## 
## Number of precursor (q1 and peptideModSeq) = 131
## Number of unique precursor
## (q1.in-silico and peptideModSeq) = 122
## Number of iRT peptide(s) = 8
## Which std peptides (iRTs) where found in which raw files:
##   _methods\20140910_01_fetuin_400amol_1.raw GAGSSEPVTGLDAK 
##       _methods\20140910_01_fetuin_400amol_1.raw TPVITGAPYEYR 
##       _methods\20140910_01_fetuin_400amol_1.raw VEATFGVDESNAK 
##       _methods\20140910_07_fetuin_400amol_2.raw ADVTPADFSEWSK 
##       _methods\20140910_07_fetuin_400amol_2.raw DGLDAASYYAPVR 
##       _methods\20140910_07_fetuin_400amol_2.raw GTFIIDPGGVIR 
##       _methods\20140910_07_fetuin_400amol_2.raw LFLQFGAQGSPFLK 
##       _methods\20140910_07_fetuin_400amol_2.raw TPVISGGPYEYR 
## 
## Number of transitions frequency:
##  5   16
##  6   115
## 
## Number of annotated precursor = 1855
## Number of file(s)
##  2
## 
## Number of precursors in Filename(s)
##  _methods\20140910_01_fetuin_400amol_1.raw   19
##  _methods\20140910_07_fetuin_400amol_2.raw   112
## 
## Misc:
## 
## Memory usage =    742776 bytes

In the following two code snippets the first element of the ion library is displayed:

#  slotNames(specLibrary@ionlibrary[[1]])
specLibrary@ionlibrary[[1]]
## An "specL" object.
## 
## 
## content:
## group_id = ADQPQC[+57.0]LSLAWSTDGQTLFAGYSDNTIR.3 
## peptide_sequence = ADQPQCLSLAWSTDGQTLFAGYSDNTIR 
## proteinInformation = sp|O18640|GBLP_DROME 
## q1 = 1039.151 
## q1.in_silico = 3172.464 
## q3 = 925.436 1143.542 996.4756 705.3505 868.4149 503.2933 
## q3.in_silico = 925.4374 1143.543 996.4745 705.3526 868.4159 503.2936 
## prec_z = 3 
## frg_type = y y y y y y 
## frg_nr = 8 10 9 6 7 4 
## frg_z = 1 1 1 1 1 1 
## relativeFragmentIntensity = 100 56 56 35 14 11 
## irt = 95.97 
## peptideModSeq = ADQPQC[+57.0]LSLAWSTDGQTLFAGYSDNTIR 
## mZ.error = 0.001407 0.001031 0.001095 0.002066 0.001004 0.000286 
## \ctrachse_20140910_Nuclei_diff_extraction_methods\20140910_07_fetuin_400amol_2.raw
## score = 15.83609 
## 
## size:
## Memory usage: 4224 bytes
plot(specLibrary@ionlibrary[[1]])

plot(specLibrary)

## [1] 16.83185 13.13262 18.54058 18.36923 15.30478 15.30478

## [1]  7.032372  6.490769 14.787681 14.544429 15.207398 15.207398

The code snippet below plots an overview of the whole ion library. Please note that the iRT peptides used for the normalization of RT do not have to be included in the resulting .

Output

write.spectronaut(specLibrary, file =  INPUT$OUTPUT_LIBRARY_FILE)
## writting specL object (including header) to file '/tmp/RtmpZaA4oF/file1c5f39c3c66e.csv' ...
save(specLibrary, file = INPUT$RDATA_LIBRARY_FILE)

saves the result object to a file.

Remarks

For questions and improvements please do contact the authors of the specL. This report Rmarkdown file has been written by WEW and is maintained by CP (Panse et al. 2015).

Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.48       specL_1.41.0     seqinr_4.2-36    RSQLite_2.3.7   
## [5] protViz_0.7.9    DBI_1.2.3        BiocStyle_2.35.0
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5         cli_3.6.3           rlang_1.1.4        
##  [4] xfun_0.48           highr_0.11          jsonlite_1.8.9     
##  [7] bit_4.5.0           buildtools_1.0.0    htmltools_0.5.8.1  
## [10] maketools_1.3.1     sys_3.4.3           sass_0.4.9         
## [13] rmarkdown_2.28      evaluate_1.0.1      jquerylib_0.1.4    
## [16] MASS_7.3-61         fastmap_1.2.0       yaml_2.3.10        
## [19] lifecycle_1.0.4     memoise_2.0.1       BiocManager_1.30.25
## [22] compiler_4.4.1      codetools_0.2-20    blob_1.2.4         
## [25] Rcpp_1.0.13         digest_0.6.37       R6_2.5.1           
## [28] parallel_4.4.1      bslib_0.8.0         tools_4.4.1        
## [31] bit64_4.5.2         ade4_1.7-22         cachem_1.1.0

References

Frewen, B., and M. J. MacCoss. 2007. Using BiblioSpec for creating and searching tandem MS peptide libraries.” Curr Protoc Bioinformatics Chapter 13 (December): Unit 13.7.
Panse, Christian, Christian Trachsel, Jonas Grossmann, and Ralph Schlapbach. 2015. specL — an R/Bioconductor Package to Prepare Peptide Spectrum Matches for Use in Targeted Proteomics.” Bioinformatics 31 (13): 2228. https://doi.org/10.1093/bioinformatics/btv105.
Panse, Christian, Christian Trachsel, and Can Türker. 2022. “Bridging Data Management Platforms and Visualization Tools to Enable Ad-Hoc and Smart Analytics in Life Sciences.” Journal of Integrative Bioinformatics. https://doi.org/10.1515/jib-2022-0031.