synapter is free and open-source software. If you use it, please support the project by citing it in publications:
Nicholas James Bond, Pavel Vyacheslavovich Shliaha, Kathryn S. Lilley, and Laurent Gatto. Improving qualitative and quantitative performance for MSE-based label free proteomics. J. Proteome Res., 2013, 12 (6), pp 2340–2353
For bugs, typos, suggestions or other questions, please file an issue
in our tracking system (https://github.com/lgatto/synapter/issues) providing as
much information as possible, a reproducible example and the output of
sessionInfo()
.
If you don’t have a GitHub account or wish to reach a broader audience for general questions about proteomics analysis using R, you may want to use the Bioconductor support site: https://support.bioconductor.org/.
The main functionality of synapter is to combine proteomics data acquired under different mass spectrometry settings or with different samples to (i) optimise the respective qualities of the two data sets or (ii) increase the number of identifications, thereby decreasing missing values. Besides synapter offers other functionality inaccessible in the default pipeline, like peptide FDR estimation and filtering on peptide match type and peptide uniqueness.
The example that motivated the development of this package was to combine data obtained on a Synapt G2 instrument:
The former is data is called identification peptides and the latter quantitation peptides, irrespective of the acquisition mode (HDMSE or MSE). This HDMSE/MSE design is used in this document to illustrate the synapter package.
However, although HDMSE mode possesses superior identification and MSE mode superior quantitation capabilities and transferring identifications from HDMSE to MSE is a priori the most efficient setup, identifications can be transferred between any runs, independently of the acquisition mode. This allows to reduce the number of missing values, one of the primary limitation of label-free proteomics. Thus users will benefit from synapter’s functionality even if they run their instruments in a single mode (HDMSE or MSE only).
However, as will be shown in section Data analysis, transferring identifications from multiple runs to each other increases analysis time and peptide FDR within the analysis. synapter allows to minimise these effects to acceptable degree by choosing runs to transfer identifications from and merging them in the master HDMSE file.
This data processing methodology is described in section HDMSE/MSE data analysis and the analysis pipeline is described in section Different pipelines.
To maximise the benefit of combining better identification and quantitation data, it is also possible to combine several, previously merged identification data files into one master set. This functionality is described in section Using master peptide files.
Finally, section Analysis of complex experiments illustrates a complete pipeline including synapter and MSnbase (Gatto and Lilley 2012) packages to perform protein label-free quantitation: how to combine multiple synapter results to represent the complete experimental design under study and further explore the data, normalise it and perform robust statistical data analysis inside the R environment.
The rationale underlying synapter’s functionality are described in (Shliaha et al. 2013) and (Bond et al. 2013). The first reference describes the benefits of ion mobility separation on identification and the effects on quantitation, that led to the development of synapter, which in described and demonstrated in (Bond et al. 2013).
synapter is written for R~(R Core Team 2012), an open source, cross platform, freely available statistical computing environment and programming language1. Functionality available in the R environment can be extended though the usage of packages. Thousands of developers have contributed packages that are distributed via the Comprehensive R Archive Network (CRAN) or through specific initiatives like the Bioconductor2 project (Gentleman et al. 2004), focusing on the analysis and comprehension of high-throughput biological data.
synapter
is such an R package dedicated to the analysis of label-free proteomics
data. To obtain detailed information about any function in the package,
it is possible to access it’s documentation by preceding it’s name with
a question mark at the command line prompt. For example, to obtain
information about the synapter
package, one would type ?synapter
.
synapter is available through the Bioconductor project. Details about the package and the installation procedure can be found on its page3. Briefly, installation of the package and all its dependencies should be done using the dedicated Bioconductor infrastructure as shown below:
After installation, synapter will have to be explicitly loaded with
so that all the package’s functionality is available to the user.
Preparation of the data for synapter
requires the .raw
data first to be processed with Waters’
ProteinLynx Global Serve (PLGS) software. The PLGS result is
then exported as csv
spreadsheet files in user specified
folders. These csv
files can then be used as input for
synapter.
We also highly recommend users to acquaint themselves with the PLGS search algorithm for data independent acquisitions (Li et al. 2009).
First the user has to specify the output folders for files to be used in synapter analysis as demonstrated in the figures. After the folders are specified ignore the message that appears requiring restarting PLGS.
At the first stage PLGS performs noise reduction and centroiding, based on user specified preferences called processing parameters. These preferences determine thresholds in intensity for discriminating between noise peaks and peptide and fragment ion peaks in high and low energy functions of an acquisition. The optimal value of these parameters is sample dependant and different between MSE and HDMSE modes. For synapter to function properly all acquisitions in the analysis have to be processed with the same thresholds, optimal for the mode identifications are transferred from (typically HDMSE mode). The user is expected to identify optimal parameters himself for every new sample type by repeatedly analysing a representative acquisition with different thresholds.
After the ions peaks have been determined and centroided, the ions
representing charge states and isotopes of a peptide are collapsed into
a single entity called EMRT (exact mass retention time pair). The EMRTs
in low energy function represent unidentified peptides and are assigned
peptides sequences during database search. The total list of EMRTs can
be found in the pep3DAMRT.csv
file and it is one of the
synapter
input files for the runs used for quantitation (typically MSE mode)
Prior to the database search, randomised entries are added to the database to allow PLGS to compute protein false positive rate. The randomised entries can either be added automatically or manually, using the Randomise Databank function in the Databank admin tool. To properly prepare the files for synapter, the user has to add randomised entries manually via Databank admin tool, since only then randomised entries identified in the database search will be displayed in the csv output. The following figures demonstrate how to create a randomised databank manually using one randomised entry per regular entry.
The user is also expected to use a minimum of 1 fragment per peptide,
3 fragments per protein and 1 peptide per protein identification
thresholds and 100% False Positive Rate4 for protein
identification during database search for all of the acquisitions in the
analysis as demonstrated in figure. This allows
to maximise the number of identified peptides from the randomised part
of the database, needed to estimate peptide identifications statistics.
The total list of identified peptides is given in
final_peptide.csv
files. A single
final_peptide.csv
file has to be supplied to synapter
for every run in the analysis (for both identification and quantitation
runs).
More details and screenshots are available in a separate document available at https://lgatto.github.com/synapter/.
The analysis of pairs of HDMSE and MSE data files is based on the following rationale – combine strengths or each approach by matching high quality HDMSE identifications to quantified MSE EMRTs applying the following algorithm:
Two different pipeline are available to the user:
The synergise
function is a high level wrapper that
implements a suggested analysis to combine two files (see next paragraph
for details). A set of parameters can be passed, although sensible
defaults are provided. While the analysis is executed, a html report is
created, including all result files in text spreadsheet
(csv
format) and binary R output. This level allows easy
scripting for automated batch analysis. Using data from the synapterdata
package, the following code chunk illustrates the synergise
usage. An example report can be found online at https://lgatto.github.com/synapter/.
## [1] "HDMSe_101111_25fmol_UPS1_in_Ecoli_04_IA_final_peptide.csv.gz"
## [1] "MSe_101111_25fmol_UPS1_in_Ecoli_03_IA_final_peptide.csv.gz"
## [1] "MSe_101111_25fmol_UPS1_in_Ecoli_03_Pep3DAMRT.csv.gz"
## [1] "EcoliK12_enolase_UPSsimga_NB.fasta"
## the synergise input is a (named) list of filenames
input <- list(identpeptide = hdmsefile,
quantpeptide = msefile,
quantpep3d = msepep3dfile,
fasta = fas)
## a report and result files will be stored
## in the 'output' directory
output <- tempdir()
output
## [1] "/tmp/RtmprsDukZ"
See ?synergise
for details.
The user can have detailed control on each step of the analysis by
executing each low-level function manually. This pipeline, including
generation of data containers (class instances) and all available
operations are available in ?Synapter
. This strategy allows
the maximum flexibility to develop new unexplored approaches.
While analysing one MSE file against one single
HDMSE file
increased the total number of reliably identified and quantified
features compared to each single MSE analysis, a better
procedure can be applied when replicates are available. Consider the
following design with two pairs of files: HDMS1E, MS1E, HDMS2E and MS2E. The
classical approach would lead to combining for example, HDMS1E and MS1E and
HDMS2E
and MS2E. However,
HDMS1E –
MS2E and
HDMS2E –
MS1E
would also be suitable, possibly leading to new identified and
quantified features. Instead of repeating all possible combinations,
which could hardly be applied for more replicates, we allow to merge
HDMS1E
and HDMS2E into a new
master HDMS12E and then
using it to transfer identification to both MSE runs. In addition to
leading to a simpler set of analyses, this approach also allows to
control the false positive rate during the HDMSE merging (see section Choosing HDMSE files). Such
master HDMSE files can be readily
created with the makeMaster
function, as described in
section Generating a master
file.
We will use data from the synapterdata to illustrate how to create master files.
In a more complex design, a greater number of HDMSE files might need to be
combined. When combining files, one also accumulates false peptides
assignments. The extent to which combining files increases new reliable
identification at the cost of accumulating false assignments can be
estimated with the estimateMasterFdr
function.
To illustrate how FDR is estimated for master HDMSE files, let’s consider two extreme cases.
In the first one, the two files (each with 1000 peptides filtered at an FDR of 0.01) to be combined are nearly identical, sharing 900 peptides. The combined data will have 900(shared) + 2 × 100(unique) peptides and each file, taken separately is estimated to have 1000 × 0.01 = 10 false positive identifications. We thus estimate the upper FDR bound after merging the two files to be $\frac{20}{1100} = 0.0182$.
In the second hypothetical case, the two files (again each with 1000 peptides filtered at a FDR of 0.01) to be combined are very different and share only 100 peptides. The combined data will have 100(shared) + 2 × 900(unique) peptides and, as above, each file is estimated to have 10 false discoveries. In this case, we obtain an upper FDR bound of $\frac{20}{1900} = 0.0105$.
In general, the final false discovery for two files will be $$FDR_{master} = \frac{nfd_{1} + nfd_{2}}{union(peptides~HDMS$^E_{1}, peptides~HDMS$E_{2})}$$
where nfdi is the number of false discoveries in HDMSE file i. Note that we do not make any assumptions about repeated identification in multiple files here.
estimateMasterFdr
generalised this for any number of
HDMSE files and
indicates the best combination at a fixed user-specified
masterFdr
level. Mandatory input is a list of HDMSE file names and a fasta
database file name to filter non-unique proteotypic peptides.
The result of estimateMasterFdr
stores the number of
unique proteotypic peptides and FDR for all possible 57 combinations of
6 files. A summary can be printed on the console or plotted with
plot(cmb)
(see figure 1).
## using the full set of 6 HDMSe files and a
## fasta database from the synapterdata package
inputfiles <- getHDMSeFinalPeptide()
fasta <- getFasta()
cmb <- estimateMasterFdr(inputfiles, fasta, masterFdr = 0.02)
cmb
## 6 files - 57 combinations
## Best combination: 4 5
## - 5730 proteotypic peptides
## - 6642 unique peptides
## - 0.017 FDR
The best combination can be extracted with the bestComb
function.
## [1] 4 5
See ?estimateMasterFdr
and references therein for more
details about the function and the returned object.
Now that we have identified which files should be used to create the
master file, we can directly pass the relevant identification files to
the makeMaster
function to generate the master
file. The function has one mandatory input parameter,
pepfiles
, a list oh identification file names to be merged.
The output is an object of class MasterPeptides
that stores
the relevant peptides from the original input files. The result can be
saved to disk using saveRDS
for further analysis, as
described in section HDMSE/MSE data analysis.
## Object of class "MasterPeptides"
## 1st Master [ 1 2 ] has 6642 peptides
## 2nd Master [ 2 1 ] has 6642 peptides
## [1] HDMSe_111111_50fmol_UPS1_in_Ecoli_04_IA_final_peptide.csv.gz
## [2] HDMSe_111111_50fmol_UPS1_in_Ecoli_02_IA_final_peptide.csv.gz
##
## No fragment library.
More details can be found in the function documentation accessible
with ?makeMaster
.
Two functions are needed to choose a set of identification run files and create the master identification run. One function enables to perform a complete identification transfer. The following table summarises all there is to know to utilise synapter’s functionality.
Function | Description |
---|---|
synergise |
Runs the complete identification transfer |
estimateMasterFdr |
Chooses which files to be used to create the master IR |
makeMaster |
Creates the master IR |
The functionality described in this section relies on the MSnbase package (Gatto and Lilley 2012), which is installed by default with synapter. Please refer to the MSnbase Bioconductor web page. the associated vignettes and the respective manual pages for more details.
The synapterdata
already provides preprocessed PLGS data. Six
Synapter
instances are available: 3 replicates (labelled
a
, b
and c
) of the Universal
Proteomics Standard (UPS1) 48 protein mix at 25 fmol and 3 replicates at
50 fmol, in a constant E. coli background. The 6 files can be
loaded in your working space with
We will start by describing the analysis of ups25a
in
details, and then show how to analyse all the runs using more compact
code. The first step of our analysis is to convert the synapter
object output (an Synapter
instance), into a MSnbase-compatible
object, called an MSnSet
, that we will name
ms25a
. We can obtain a description of the
MSnSet
object by typing its name.
## [1] "Synapter"
## [1] "MSnSet"
## MSnSet (storageMode: lockedEnvironment)
## assayData: 5642 features, 1 samples
## element names: exprs
## protocolData: none
## phenoData: none
## featureData
## featureNames: AALESTLAAITESLK IAAANVPAFVSGK ... NDSALGLFNGDIGIALDR
## (5642 total)
## fvarLabels: peptide.seq.MSe_101111_25fmol_UPS1_in_Ecoli_01
## protein.Accession.MSe_101111_25fmol_UPS1_in_Ecoli_01 ...
## qval.1.MSe_101111_25fmol_UPS1_in_Ecoli_01 (21 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: No annotation
## - - - Processing information - - -
## Coerced from a 'Synapter' object.
## MSnbase version: 2.33.2
It contains quantitation information about 5642 peptides for 1
sample. In the code chunk below, we update the default sample name
Synapter1
with a more meaningful one.
## [1] "MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [1] "ups25a"
Quantitative data and meta-data, which has been acquired by synapter,
can be extracted with the exprs
and fData
methods.
## ups25a
## AFLNDK NA
## IEAQLNDVIADLDAVR 1671
## AVFNGLINVAQHAIK 1499
## LEEVK 664
## VALQGNMDPSMLYAPPAR 2969
## NDSALGLFNGDIGIALDR NA
## protein.Accession.MSe_101111_25fmol_UPS1_in_Ecoli_01
## AFLNDK MNME_ECODH
## IEAQLNDVIADLDAVR MNME_ECODH
## AVFNGLINVAQHAIK B1XFY9_ECODH
## LEEVK B1X7F0_ECODH
## VALQGNMDPSMLYAPPAR DCUP_ECODH
## NDSALGLFNGDIGIALDR B1XDM5_ECODH
## precursor.retT.MSe_101111_25fmol_UPS1_in_Ecoli_01
## AFLNDK 33.50725
## IEAQLNDVIADLDAVR 84.15609
## AVFNGLINVAQHAIK 76.22252
## LEEVK 41.40945
## VALQGNMDPSMLYAPPAR 67.60099
## NDSALGLFNGDIGIALDR 97.65882
## [1] "peptide.seq.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [2] "protein.Accession.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [3] "protein.Description.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [4] "protein.falsePositiveRate.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [5] "peptide.matchType.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [6] "peptide.mhp.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [7] "peptide.score.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [8] "precursor.mhp.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [9] "precursor.retT.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [10] "precursor.inten.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [11] "precursor.Mobility.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [12] "spectrumID.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [13] "Intensity.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [14] "ion_ID.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [15] "ion_area.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [16] "ion_counts.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [17] "pval.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [18] "Bonferroni.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [19] "BH.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [20] "qval.MSe_101111_25fmol_UPS1_in_Ecoli_01"
## [21] "qval.1.MSe_101111_25fmol_UPS1_in_Ecoli_01"
We will describe a how to process the data using a Top 3
approach, where the 3 most intense peptides of each protein are used to
compute the protein intensity, using the topN
and
combineFeatures
methods. The former allows to extract the
top most intense peptides (default n
is 3) and remove all
other peptides from the MSnSet
object. The latter than
aggregates the n
most intense peptides per protein using a
user-defined function (sum
, below). Finally, we also scale
protein intensity values depending on the actual number of peptides that
have summed. This number of quantified peptides can be calculated (after
topN
, but before combineFeatures
) with
nQuants
.
ms25a <- topN(ms25a,
groupBy = fData(ms25a)$protein.Accession,
n = 3)
nPeps <- nQuants(ms25a,
groupBy = fData(ms25a)$protein.Accession)
ms25a <- combineFeatures(ms25a,
fData(ms25a)$protein.Accession,
"sum", na.rm = TRUE,
verbose = FALSE)
## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.
## ups25a
## 6PGL_ECODH 71555
## ABDH_ECODH 47542
## ACCA_ECODH 38249
## ACCD_ECODH 25615
## ACP_ECODH 16388
## APT_ECODH 0
## ups25a
## 6PGL_ECODH 3
## ABDH_ECODH 3
## ACCA_ECODH 3
## ACCD_ECODH 3
## ACP_ECODH 1
## APT_ECODH 0
## scale intensities
exprs(ms25a) <- exprs(ms25a) * (3/nPeps)
## NaN result from the division by 0, when
## no peptide was found for that protein
head(exprs(ms25a))
## ups25a
## 6PGL_ECODH 71555
## ABDH_ECODH 47542
## ACCA_ECODH 38249
## ACCD_ECODH 25615
## ACP_ECODH 49164
## APT_ECODH NaN
The code chunk below repeats the above processing for the other 5 UPS1/E. coli runs.
nms <- c(paste0("ups", 25, c("b", "c")),
paste0("ups", 50, c("a", "b", "c")))
tmp <- sapply(nms, function(.ups) {
cat("Processing", .ups, "... ")
## get the object from workspace and convert to MSnSet
x <- get(.ups, envir = .GlobalEnv)
x <- as(x, "MSnSet")
sampleNames(x) <- .ups
## extract top 3 peptides
x <- topN(x, groupBy = fData(x)$protein.Accession, n = 3)
## calculate the number of peptides that are available
nPeps <- nQuants(x, fData(x)$protein.Accession)
## sum top3 peptides into protein quantitation
x <- combineFeatures(x, fData(x)$protein.Accession,
"sum", na.rm = TRUE, verbose = FALSE)
## adjust protein intensity based on actual number of top peptides
exprs(x) <- exprs(x) * (3/nPeps)
## adjust feature variable names for combine
x <- updateFvarLabels(x, .ups)
## save the new MSnExp instance in the workspace
varnm <- sub("ups", "ms", .ups)
assign(varnm, x, envir = .GlobalEnv)
cat("done\n")
})
## Processing ups25b ...
## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.
## done
## Processing ups25c ...
## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.
## done
## Processing ups50a ...
## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.
## done
## Processing ups50b ...
## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.
## done
## Processing ups50c ...
## Your data contains missing values. Please read the relevant section in
## the combineFeatures manual page for details on the effects of missing
## values on data aggregation.
## done
We now have 6 MSnSet
instances, containing protein
quantitation for the 6 UPS/E. coli runs.
We now want to filter data out based on missing quantitation data,
retaining proteins that have been quantified in at a least two out of
three replicates. Filtering based on missing data can be done using the
filterNA
method and a maximum missing data content as
defined by pNA
. Multiple MSnSet
instances can
be combined with the combine
method, which is described in
details in the MSnbase-demo
vignette5. The 6 objects have
appropriate distinct sample names and common feature (protein) names,
which will be used to properly combine the quantitation data.
## [1] 729 3
## [1] 709 3
Once combined and filtered, the 25 fmol group retains 709 entries with at least 2 out of 3 quantitation values, out of the 729 total number of proteins.
## [1] 729 3
## [1] 709 3
Similarly, the 50 fmol group retains 709 entries with at least 2 out of 3 quantitation values, out of the 729 initial proteins.
We now combine the two subgroups into one MSnSet
object
that contains all 6 samples and filter for proteins that are observed in
both groups, i.e retaining proteins with a maximum of 2/6 missing
values. We also compute a summary table with the number of protein that
have 4, 5, or 6 quantitation values across the 6 samples.
## ups25a ups25b ups25c ups50a ups50b ups50c
## 6PGL_ECODH 71555 62114 60655 59920 56185 53874
## ABDH_ECODH 47542 37805 36746 45570 43163 39506
## ACCA_ECODH 38249 31543 29570 30697 29656 27851
## ACCD_ECODH 25615 22247 20295 22206 19698 19819
## ACP_ECODH 49164 738365 706538 734425 712076 655842
## AROB_ECODH 5442 4050 3684 4095 4500 3879
##
## 4 5 6
## 6 25 674
We obtain a final data set containing 705 proteins. Finally, we normalise protein intensities in each sample to correct for experimental loading biases and pipetting errors. To do so, we compute 6 sample medians using all constant E. coli background proteins and divide each protein by its respective sample mean.
ecoli <- -grep("ups$", featureNames(msUps))
meds <- apply(exprs(msUps)[ecoli, ], 2,
median, na.rm=TRUE)
exprs(msUps) <- t(apply(exprs(msUps), 1,
"/", meds))
This same procedure could be applied with a set of constant spikes to estimate absolute protein quantities.
The UPS1 spiked-in protein mix is composed of 48 proteins, 47 of which have been observed and quantified in our final data object. In this section, we will illustrate how to analyse the 705 proteins to extract those that show differences between the two groups and show that these candidates represent the UPS1 spikes.
The R environment and many of the available packages allow extremely
powerful statistical analysis. In this document, we will apply a
standard t-test on log2
transformed data for convenience, to calculate p-value for individual
proteins (pv
variable below). For best performance with
small number of samples and more complex designs, we recommend the
Bioconductor limma
package (Smyth 2005). We then perform
multiple comparison adjustment using the qvalue
from the
qvalue
package, that implements the method from (Storey
and Tibshirani 2003) (qv
variable below). The multtest
package provides several other p-value adjustment methods. We will also
compute log2
fold-changes and illustrate the results on a volcano plot (figure 2). Figure 3 illustrates
the UPS1 proteins and samples on a classical heatmap.
## use log2 data for t-test
exprs(msUps) <- log2(exprs(msUps))
## apply a t-test and extract the p-value
pv <- apply(exprs(msUps), 1 ,
function(x)t.test(x[1:3], x[4:6])$p.value)
## calculate q-values
library("qvalue")
qv <- qvalue(pv)$qvalues
## calculate log2 fold-changes
lfc <- apply(exprs(msUps), 1 ,
function(x) mean(x[1:3], na.rm=TRUE)-mean(x[4:6], na.rm=TRUE))
## create a summary table
res <- data.frame(cbind(exprs(msUps), pv, qv, lfc))
## reorder based on q-values
res <- res[order(res$qv), ]
knitr::kable(head(round(res, 3)))
ups25a | ups25b | ups25c | ups50a | ups50b | ups50c | pv | qv | lfc | |
---|---|---|---|---|---|---|---|---|---|
P01112ups | -0.053 | -0.015 | -0.001 | 1.072 | 1.118 | 1.080 | 0 | 0.000 | -1.113 |
P00918ups | -0.247 | -0.201 | -0.214 | 0.616 | 0.667 | 0.674 | 0 | 0.001 | -0.873 |
P01008ups | 0.112 | 0.106 | 0.178 | 1.075 | 1.132 | 1.090 | 0 | 0.001 | -0.967 |
Q06830ups | 0.174 | 0.118 | 0.156 | 1.073 | 1.095 | 1.100 | 0 | 0.002 | -0.940 |
P10145ups | -0.323 | -0.332 | -0.274 | 0.689 | 0.764 | 0.788 | 0 | 0.003 | -1.057 |
P02788ups | 0.564 | 0.647 | 0.601 | 1.494 | 1.532 | 1.532 | 0 | 0.003 | -0.915 |
In the above example, quantitation values and statistics data are
saved in a summary dataframe (res
), that can be exported to
a comma-separated spreadsheet with
plot(res$lfc, -log10(res$qv),
col = ifelse(grepl("ups$", rownames(res)),
"#4582B3AA",
"#A1A1A180"),
pch = 19,
xlab = expression(log[2]~fold-change),
ylab = expression(-log[10]~q-value))
grid()
abline(v = -1, lty = "dotted")
abline(h = -log10(0.1), lty = "dotted")
legend("topright", c("UPS", "E. coli"),
col = c("#4582B3AA", "#A1A1A1AA"),
pch = 19, bty = "n")
ups25a | ups25b | ups25c | ups50a | ups50b | ups50c | pv | qv | lfc | |
---|---|---|---|---|---|---|---|---|---|
P01112ups | -0.053 | -0.015 | -0.001 | 1.072 | 1.118 | 1.080 | 0.000 | 0.000 | -1.113 |
P00918ups | -0.247 | -0.201 | -0.214 | 0.616 | 0.667 | 0.674 | 0.000 | 0.001 | -0.873 |
P01008ups | 0.112 | 0.106 | 0.178 | 1.075 | 1.132 | 1.090 | 0.000 | 0.001 | -0.967 |
Q06830ups | 0.174 | 0.118 | 0.156 | 1.073 | 1.095 | 1.100 | 0.000 | 0.002 | -0.940 |
P10145ups | -0.323 | -0.332 | -0.274 | 0.689 | 0.764 | 0.788 | 0.000 | 0.003 | -1.057 |
P02788ups | 0.564 | 0.647 | 0.601 | 1.494 | 1.532 | 1.532 | 0.000 | 0.003 | -0.915 |
P02753ups | -1.901 | -1.822 | -1.906 | -1.006 | -0.919 | -0.876 | 0.000 | 0.005 | -0.943 |
P01375ups | 0.812 | 0.930 | 0.963 | 1.825 | 1.756 | 1.692 | 0.000 | 0.008 | -0.856 |
P69905ups | -1.440 | -1.563 | -1.513 | -0.637 | -0.582 | -0.579 | 0.000 | 0.008 | -0.906 |
P00167ups | 0.869 | 0.890 | 0.975 | 1.923 | 1.963 | 1.938 | 0.000 | 0.011 | -1.030 |
P12081ups | -0.094 | 0.097 | -0.022 | 0.936 | 1.038 | 1.047 | 0.000 | 0.011 | -1.013 |
P00709ups | -0.187 | -0.308 | -0.324 | 0.424 | 0.505 | 0.508 | 0.000 | 0.012 | -0.752 |
O00762ups | 0.432 | 0.267 | 0.264 | 1.086 | 1.212 | 1.214 | 0.000 | 0.012 | -0.850 |
P05413ups | -0.173 | -0.395 | -0.284 | 0.576 | 0.680 | 0.689 | 0.001 | 0.024 | -0.932 |
P00441ups | -0.227 | -0.414 | -0.391 | 0.289 | 0.413 | 0.406 | 0.001 | 0.027 | -0.714 |
P04040ups | -0.065 | 0.142 | 0.211 | 1.094 | 1.245 | 1.237 | 0.001 | 0.027 | -1.096 |
P02787ups | 0.003 | 0.148 | 0.066 | 1.504 | 1.533 | 1.238 | 0.001 | 0.033 | -1.353 |
P10636-8ups | 0.727 | 0.534 | 0.558 | 1.493 | 1.502 | 1.576 | 0.001 | 0.033 | -0.917 |
P06396ups | 0.228 | 0.299 | 0.037 | 1.179 | 1.268 | 1.298 | 0.002 | 0.036 | -1.061 |
P16083ups | -0.179 | -0.407 | -0.282 | 1.169 | 1.149 | 1.168 | 0.002 | 0.040 | -1.452 |
P02768ups | 0.554 | 0.340 | 0.396 | 1.355 | 1.426 | 1.408 | 0.002 | 0.041 | -0.966 |
P01127ups | 0.330 | 0.145 | 0.216 | 1.183 | 1.183 | 1.213 | 0.002 | 0.045 | -0.963 |
P08758ups | 0.273 | 0.094 | 0.081 | 1.136 | 1.183 | 1.158 | 0.003 | 0.048 | -1.010 |
P00915ups | 0.055 | -0.182 | 0.063 | 1.100 | 1.137 | 1.157 | 0.004 | 0.058 | -1.153 |
P15559ups | 0.118 | -0.088 | -0.079 | 0.876 | 0.934 | 0.883 | 0.003 | 0.058 | -0.914 |
P55957ups | -1.083 | -1.461 | -1.329 | -0.348 | -0.393 | -0.181 | 0.004 | 0.058 | -0.984 |
P62988ups | 0.513 | 0.270 | 0.367 | 1.289 | 1.293 | 1.236 | 0.004 | 0.064 | -0.889 |
P01031ups | -0.407 | -0.648 | -0.639 | 0.628 | 0.636 | 0.633 | 0.004 | 0.064 | -1.197 |
P61626ups | -0.096 | -0.363 | -0.320 | 0.619 | 0.681 | 0.669 | 0.006 | 0.086 | -0.917 |
P51965ups | -0.893 | -1.182 | -1.301 | 0.017 | -0.039 | -0.009 | 0.011 | 0.143 | -1.115 |
P01344ups | -0.044 | -0.397 | -0.060 | 0.570 | 0.721 | 0.642 | 0.011 | 0.149 | -0.811 |
P01579ups | -0.948 | -0.716 | -0.660 | -0.257 | -0.254 | -0.165 | 0.016 | 0.204 | -0.549 |
P41159ups | 0.285 | -0.206 | -0.245 | 0.776 | 0.856 | 1.030 | 0.018 | 0.215 | -0.943 |
P62937ups | -1.375 | -0.691 | -1.117 | 0.312 | 0.380 | 0.262 | 0.018 | 0.215 | -1.379 |
P68871ups | -0.206 | -0.440 | -0.589 | 0.374 | 0.359 | 0.358 | 0.020 | 0.218 | -0.775 |
P08263ups | -1.113 | -0.642 | -1.517 | 0.188 | 0.254 | 0.279 | 0.033 | 0.302 | -1.331 |
P99999ups | -1.202 | -1.992 | -1.953 | -0.899 | -0.192 | -0.838 | 0.036 | 0.316 | -1.073 |
P10599ups | -0.875 | -1.133 | 0.166 | 1.387 | 0.897 | 0.758 | 0.037 | 0.317 | -1.628 |
P02144ups | -0.963 | -0.121 | -0.075 | 0.992 | 1.059 | 1.000 | 0.039 | 0.324 | -1.403 |
P01133ups | -0.796 | 0.011 | -0.382 | 0.529 | 0.564 | 0.499 | 0.058 | 0.434 | -0.920 |
P02741ups | -0.265 | -1.165 | -1.088 | 0.309 | 0.323 | 0.302 | 0.057 | 0.434 | -1.151 |
P61769ups | -0.656 | -0.151 | -0.160 | 0.725 | 0.375 | -0.029 | 0.073 | 0.465 | -0.679 |
P63165ups | -0.984 | 0.080 | 0.220 | 1.074 | 1.124 | 1.112 | 0.073 | 0.465 | -1.332 |
O76070ups | -0.457 | 0.253 | 0.328 | 0.881 | 0.802 | 0.935 | 0.077 | 0.479 | -0.831 |
P06732ups | 1.505 | 0.541 | 0.556 | 1.318 | 1.358 | 1.382 | 0.267 | 0.498 | -0.485 |
P63279ups | -1.864 | -3.085 | -1.577 | -0.643 | -0.680 | -0.687 | 0.083 | 0.498 | -1.505 |
P09211ups | 1.728 | -1.862 | -1.840 | -0.762 | -0.328 | -0.654 | 0.955 | 0.586 | -0.077 |
A total 29 proteins show a statistically different pattern between the two groups, at a false discovery rate of 10%. Table upstab summarises the results for all UPS1 proteins.
All software and respective versions used to produce this document are listed below.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] qvalue_2.39.0 BiocStyle_2.35.0 synapterdata_1.44.0
## [4] synapter_2.31.0 MSnbase_2.33.2 ProtGenerics_1.39.0
## [7] S4Vectors_0.45.2 mzR_2.41.0 Rcpp_1.0.13-1
## [10] Biobase_2.67.0 BiocGenerics_0.53.3 generics_0.1.3
## [13] rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] rlang_1.1.4 magrittr_2.0.3
## [3] clue_0.3-66 cleaver_1.45.0
## [5] matrixStats_1.4.1 compiler_4.4.2
## [7] vctrs_0.6.5 reshape2_1.4.4
## [9] stringr_1.5.1 pkgconfig_2.0.3
## [11] crayon_1.5.3 fastmap_1.2.0
## [13] XVector_0.47.0 utf8_1.2.4
## [15] tzdb_0.4.0 UCSC.utils_1.3.0
## [17] preprocessCore_1.69.0 bit_4.5.0
## [19] purrr_1.0.2 xfun_0.49
## [21] MultiAssayExperiment_1.33.0 zlibbioc_1.52.0
## [23] cachem_1.1.0 GenomeInfoDb_1.43.1
## [25] jsonlite_1.8.9 DelayedArray_0.33.2
## [27] BiocParallel_1.41.0 parallel_4.4.2
## [29] cluster_2.1.6 R6_2.5.1
## [31] RColorBrewer_1.1-3 bslib_0.8.0
## [33] stringi_1.8.4 limma_3.63.2
## [35] GenomicRanges_1.59.1 jquerylib_0.1.4
## [37] SummarizedExperiment_1.37.0 iterators_1.0.14
## [39] knitr_1.49 readr_2.1.5
## [41] IRanges_2.41.1 splines_4.4.2
## [43] Matrix_1.7-1 igraph_2.1.1
## [45] tidyselect_1.2.1 abind_1.4-8
## [47] yaml_2.3.10 doParallel_1.0.17
## [49] codetools_0.2-20 affy_1.85.0
## [51] lattice_0.22-6 tibble_3.2.1
## [53] plyr_1.8.9 evaluate_1.0.1
## [55] survival_3.7-0 Biostrings_2.75.1
## [57] pillar_1.9.0 affyio_1.77.0
## [59] BiocManager_1.30.25 MatrixGenerics_1.19.0
## [61] foreach_1.5.2 MALDIquant_1.22.3
## [63] ncdf4_1.23 vroom_1.6.5
## [65] hms_1.1.3 ggplot2_3.5.1
## [67] munsell_0.5.1 scales_1.3.0
## [69] glue_1.8.0 lazyeval_0.2.2
## [71] maketools_1.3.1 tools_4.4.2
## [73] mzID_1.45.0 sys_3.4.3
## [75] QFeatures_1.17.0 vsn_3.75.0
## [77] buildtools_1.0.0 XML_3.99-0.17
## [79] grid_4.4.2 impute_1.81.0
## [81] tidyr_1.3.1 MsCoreUtils_1.19.0
## [83] colorspace_2.1-1 GenomeInfoDbData_1.2.13
## [85] PSMatch_1.11.0 cli_3.6.3
## [87] fansi_1.0.6 S4Arrays_1.7.1
## [89] dplyr_1.1.4 AnnotationFilter_1.31.0
## [91] pcaMethods_1.99.0 gtable_0.3.6
## [93] sass_0.4.9 digest_0.6.37
## [95] SparseArray_1.7.2 multtest_2.63.0
## [97] htmltools_0.5.8.1 lifecycle_1.0.4
## [99] httr_1.4.7 statmod_1.5.0
## [101] bit64_4.5.2 MASS_7.3-61