This is an R Markdown Notebook describing the analysis of a LiP-MS experiment using MSstatsLiP. When you execute code within the notebook, the results appear beneath the code.
Here, we use LiP-MS data of human alpha-Synuclein in the monomeric (M) and fibrillar form (F) spiked into a S.cerevisiae lysate at 5 pmol/ug lysate (M1 and F1) and 20 pmol/ug lysate (M2 and F2).The data set is composed of four biological replicates per condition.
Load the data from the Spectronaut export. LiP data is loaded as
raw_lip
, trypsin-only control data (TrP data) is loaded as
raw_prot
. The function choose.files()
enables
browsing for the input file.
CAVE: Make sure the separator delim
is
set correctly. For comma-separated values (csv), the separator is set to
delim=","
.
raw_lip <- read_delim(file=choose.files(caption="Choose LiP dataset"),
delim=",", escape_double = FALSE, trim_ws = TRUE)
raw_prot <- read_delim(file=choose.files(caption="Choose TrP dataset"),
delim=",", escape_double = FALSE, trim_ws = TRUE)
raw_lip <- raw_lip %>% mutate_all(funs(ifelse(.=="P37840.1", "P37840", .)))
raw_prot <- raw_prot %>% mutate_all(funs(ifelse(.=="P37840.1", "P37840", .)))
Load the fasta file that was used in the Spectronaut search.
Convert the data to MSstatsLiP format. Load first the LiP data set
raw_lip
, then the FASTA file fasta_file
used
for searches. If the experiment contains TrP data, raw_prot
is loaded last.
To remove information on iRT peptides, the default setting is
removeiRT = TRUE
. As default, peptides containing
modifications are filtered, but this can be changed using the argument
removeModifications
. Also, peptides with multiple protein
annotations are filtered as default. However, for data sets containing
protein isoforms, this argument can be set to
removeNonUniqueProteins = FALSE
.
The default settings use PeakArea as measure of intensity, filter features based on the q-value, with a q-value cut-off of 0.01 and import all conditions. You can adjust the settings accordingly. For information on each option, refer to the vignette of the function.
## INFO [2024-10-30 09:13:38] ** Raw data from Spectronaut imported successfully.
## INFO [2024-10-30 09:13:38] ** Raw data from Spectronaut cleaned successfully.
## INFO [2024-10-30 09:13:38] ** Using annotation extracted from quantification data.
## INFO [2024-10-30 09:13:38] ** Run labels were standardized to remove symbols such as '.' or '%'.
## INFO [2024-10-30 09:13:38] ** The following options are used:
## - Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge
## - Shared peptides will be removed.
## - Proteins with single feature will not be removed.
## - Features with less than 3 measurements across runs will be removed.
## WARN [2024-10-30 09:13:38] ** PGQvalue not found in input columns.
## INFO [2024-10-30 09:13:38] ** Intensities with values not smaller than 0.01 in EGQvalue are replaced with 0
## INFO [2024-10-30 09:13:38] ** Features with all missing measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Shared peptides are removed.
## INFO [2024-10-30 09:13:38] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
## INFO [2024-10-30 09:13:38] ** Features with one or two measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Run annotation merged with quantification data.
## INFO [2024-10-30 09:13:38] ** Features with one or two measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Fractionation handled.
## INFO [2024-10-30 09:13:38] ** Updated quantification data to make balanced design. Missing values are marked by NA
## INFO [2024-10-30 09:13:38] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
## INFO [2024-10-30 09:13:38] ** Raw data from Spectronaut imported successfully.
## INFO [2024-10-30 09:13:38] ** Raw data from Spectronaut cleaned successfully.
## INFO [2024-10-30 09:13:38] ** Using annotation extracted from quantification data.
## INFO [2024-10-30 09:13:38] ** Run labels were standardized to remove symbols such as '.' or '%'.
## INFO [2024-10-30 09:13:38] ** The following options are used:
## - Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge
## - Shared peptides will be removed.
## - Proteins with single feature will not be removed.
## - Features with less than 3 measurements across runs will be removed.
## WARN [2024-10-30 09:13:38] ** PGQvalue not found in input columns.
## INFO [2024-10-30 09:13:38] ** Intensities with values not smaller than 0.01 in EGQvalue are replaced with 0
## INFO [2024-10-30 09:13:38] ** Features with all missing measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Shared peptides are removed.
## INFO [2024-10-30 09:13:38] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
## INFO [2024-10-30 09:13:38] ** Features with one or two measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Run annotation merged with quantification data.
## INFO [2024-10-30 09:13:38] ** Features with one or two measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Fractionation handled.
## INFO [2024-10-30 09:13:38] ** Updated quantification data to make balanced design. Missing values are marked by NA
## INFO [2024-10-30 09:13:38] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
Proteolytic resistance is calculated as the of the intensity of fully tryptic peptides in the LiP condition to the TrP condition. Half-tryptic (HT) peptides are excluded from this analysis. The function “calculateTrypticity” is used to annotate FT and HT peptides in the LiP dataset. Next, from the TrP dataset we filtered out FT peptides not identified in the LiP dataset.The msstats_data list will finally contain only FT peptides measured in both LiP and TrP datasets.
Ensure that the Condition
nomenclature is identical in
both data sets. If the output is TRUE
for all conditions,
continue to step 2.
## [1] TRUE TRUE TRUE TRUE
To correct the condition nomenclature, display the condition for both data sets.
paste("LiP Condition nomenclature:", unique(msstats_data[["LiP"]]$Condition), ",",
"TrP Condition nomenclature:",unique(msstats_data[["TrP"]]$Condition))
## [1] "LiP Condition nomenclature: F1 , TrP Condition nomenclature: F2"
## [2] "LiP Condition nomenclature: M1 , TrP Condition nomenclature: M2"
## [3] "LiP Condition nomenclature: M2 , TrP Condition nomenclature: M1"
## [4] "LiP Condition nomenclature: F2 , TrP Condition nomenclature: F1"
If necessary, un-comment following lines to correct the condition
nomenclature in either of the data sets. E.g. change the nomenclature of
the TrP samples from Cond1
to cond1
.
Ensure that BioReplicate
nomenclature is correctly
annotated (see also MSstats
user manual. The BioReplicate needs a unique nomenclature, while the
technical replicates can have duplicate numbering. If the replicate
nomenclature is correct, proceed to section
2.3.
paste("LiP BioReplicate nomenclature:", unique(msstats_data[["LiP"]]$BioReplicate), ",",
"TrP BioReplicate nomenclature:",unique(msstats_data[["TrP"]]$BioReplicate))
## [1] "LiP BioReplicate nomenclature: 1 , TrP BioReplicate nomenclature: 1"
## [2] "LiP BioReplicate nomenclature: 2 , TrP BioReplicate nomenclature: 2"
## [3] "LiP BioReplicate nomenclature: 3 , TrP BioReplicate nomenclature: 3"
## [4] "LiP BioReplicate nomenclature: 4 , TrP BioReplicate nomenclature: 4"
Adjust BioReplicate
column to correct nomenclature for a
Case-control experiment.
msstats_data[["LiP"]] = msstats_data[["LiP"]] %>%
mutate(BioReplicate = paste0(Condition,".",BioReplicate))
msstats_data[["TrP"]] = msstats_data[["TrP"]] %>%
mutate(BioReplicate = paste0(Condition,".",BioReplicate))
Inspect corrected BioReplicate
column.
paste("LiP BioReplicate nomenclature:", unique(msstats_data[["LiP"]]$BioReplicate), ",",
"TrP BioReplicate nomenclature:",unique(msstats_data[["TrP"]]$BioReplicate))
## [1] "LiP BioReplicate nomenclature: F1.1 , TrP BioReplicate nomenclature: F2.1"
## [2] "LiP BioReplicate nomenclature: M1.1 , TrP BioReplicate nomenclature: M2.1"
## [3] "LiP BioReplicate nomenclature: M2.1 , TrP BioReplicate nomenclature: M1.1"
## [4] "LiP BioReplicate nomenclature: F2.1 , TrP BioReplicate nomenclature: F1.1"
## [5] "LiP BioReplicate nomenclature: M1.2 , TrP BioReplicate nomenclature: M2.2"
## [6] "LiP BioReplicate nomenclature: M2.2 , TrP BioReplicate nomenclature: F1.2"
## [7] "LiP BioReplicate nomenclature: F1.2 , TrP BioReplicate nomenclature: M1.2"
## [8] "LiP BioReplicate nomenclature: F2.2 , TrP BioReplicate nomenclature: F2.2"
## [9] "LiP BioReplicate nomenclature: M2.3 , TrP BioReplicate nomenclature: M1.3"
## [10] "LiP BioReplicate nomenclature: F1.3 , TrP BioReplicate nomenclature: M2.3"
## [11] "LiP BioReplicate nomenclature: M1.3 , TrP BioReplicate nomenclature: F1.3"
## [12] "LiP BioReplicate nomenclature: F2.3 , TrP BioReplicate nomenclature: F2.3"
## [13] "LiP BioReplicate nomenclature: F2.4 , TrP BioReplicate nomenclature: F1.4"
## [14] "LiP BioReplicate nomenclature: M2.4 , TrP BioReplicate nomenclature: M1.4"
## [15] "LiP BioReplicate nomenclature: M1.4 , TrP BioReplicate nomenclature: M2.4"
## [16] "LiP BioReplicate nomenclature: F1.4 , TrP BioReplicate nomenclature: F2.4"
Summarize the data. The default settings use a log2-transformation
and normalize the data using the "equalizeMedians"
method.
The default summary method is "TMP"
and imputation is set
to "FALSE"
. For detailed information on all settings,
please refer to the function vignette.
This function will take some time and memory. If memory is limited,
it is advisable to remove the raw files using the rm()
function and clearing the memory cache using the gc()
function.
## INFO [2024-10-30 09:13:38] ** Features with one or two measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Fractionation handled.
## INFO [2024-10-30 09:13:38] ** Updated quantification data to make balanced design. Missing values are marked by NA
## INFO [2024-10-30 09:13:38] ** Use all features that the dataset originally has.
## INFO [2024-10-30 09:13:38]
## # proteins: 9
## # peptides per protein: 1-1
## # features per peptide: 1-2
## INFO [2024-10-30 09:13:38] Some proteins have only one feature:
## P14164_DLAIGAHGGK,
## P16622_AFSENITK,
## P17891_ALQLINQDDADIIGGR,
## P38805_LFQSILPQNPDIEGR,
## Q02908_IYPTLVIR ...
## INFO [2024-10-30 09:13:38]
## F1 F2 M1 M2
## # runs 4 4 4 4
## # bioreplicates 4 4 4 4
## # tech. replicates 1 1 1 1
## INFO [2024-10-30 09:13:38] Some features are completely missing in at least one condition:
## DLAIGAHGGK_2_y8_1,
## PLTAETYK_2_y7_1,
## ALQLINQDDADIIGGR_2_y11_1,
## LFQSILPQNPDIEGR_2_y9_1,
## NA ...
## INFO [2024-10-30 09:13:38] The following runs have more than 75% missing values: 2,
## 4,
## 6,
## 10,
## 12,
## 16
## INFO [2024-10-30 09:13:38] == Start the summarization per subplot...
## | | | 0% | |======== | 11% | |================ | 22% | |======================= | 33% | |=============================== | 44% | |======================================= | 56% | |=============================================== | 67% | |====================================================== | 78% | |============================================================== | 89% | |======================================================================| 100%
## INFO [2024-10-30 09:13:38] == Summarization is done.
## INFO [2024-10-30 09:13:38] ** Features with one or two measurements across runs are removed.
## INFO [2024-10-30 09:13:38] ** Fractionation handled.
## INFO [2024-10-30 09:13:38] ** Updated quantification data to make balanced design. Missing values are marked by NA
## INFO [2024-10-30 09:13:38] ** Log2 intensities under cutoff = 15.979 were considered as censored missing values.
## INFO [2024-10-30 09:13:38] ** Log2 intensities = NA were considered as censored missing values.
## INFO [2024-10-30 09:13:38] ** Use all features that the dataset originally has.
## INFO [2024-10-30 09:13:38]
## # proteins: 4
## # peptides per protein: 1-3
## # features per peptide: 1-3
## INFO [2024-10-30 09:13:38] Some proteins have only one feature:
## P53235 ...
## INFO [2024-10-30 09:13:38]
## F1 F2 M1 M2
## # runs 4 4 4 4
## # bioreplicates 4 4 4 4
## # tech. replicates 1 1 1 1
## INFO [2024-10-30 09:13:38] Some features are completely missing in at least one condition:
## AFSENITK_2_y5_1,
## AFSENITK_2_y6_1,
## ELQDEAIK_2_y6_1,
## SEVVDQWK_2_y5_1,
## IYPTLVIR_2_y6_1 ...
## INFO [2024-10-30 09:13:38] == Start the summarization per subplot...
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
## INFO [2024-10-30 09:13:39] == Summarization is done.
Inspect MSstatsLiP_Summarized
.
## [1] "FeatureLevelData" "ProteinLevelData" "SummaryMethod"
## [4] "ModelQC" "PredictBySurvival"
## Key: <FULL_PEPTIDE>
## FULL_PEPTIDE PEPTIDE TRANSITION FEATURE LABEL GROUP
## <char> <fctr> <fctr> <fctr> <fctr> <fctr>
## 1: P14164_DLAIGAHGGK DLAIGAHGGK_2 y8_1 DLAIGAHGGK_2_y8_1 L F1
## 2: P14164_DLAIGAHGGK DLAIGAHGGK_2 y8_1 DLAIGAHGGK_2_y8_1 L F1
## 3: P14164_DLAIGAHGGK DLAIGAHGGK_2 y8_1 DLAIGAHGGK_2_y8_1 L F1
## 4: P14164_DLAIGAHGGK DLAIGAHGGK_2 y8_1 DLAIGAHGGK_2_y8_1 L F1
## 5: P14164_DLAIGAHGGK DLAIGAHGGK_2 y8_1 DLAIGAHGGK_2_y8_1 L F2
## 6: P14164_DLAIGAHGGK DLAIGAHGGK_2 y8_1 DLAIGAHGGK_2_y8_1 L F2
## RUN SUBJECT FRACTION originalRUN censored INTENSITY ABUNDANCE
## <fctr> <fctr> <char> <fctr> <lgcl> <num> <num>
## 1: 1 F1.1 1 LM2480 FALSE 138063.2 17.05399
## 2: 2 F1.2 1 LM2494 FALSE NA NA
## 3: 3 F1.3 1 LM2497 FALSE NA NA
## 4: 4 F1.4 1 LM2511 FALSE NA NA
## 5: 5 F2.1 1 LM2483 FALSE 164339.6 17.25664
## 6: 6 F2.2 1 LM2495 FALSE NA NA
## newABUNDANCE PROTEIN
## <num> <char>
## 1: 17.05399 P14164
## 2: NA P14164
## 3: NA P14164
## 4: NA P14164
## 5: 17.25664 P14164
## 6: NA P14164
## Key: <FULL_PEPTIDE>
## FULL_PEPTIDE RUN LogIntensities originalRUN GROUP SUBJECT
## <char> <fctr> <num> <fctr> <fctr> <char>
## 1: P14164_DLAIGAHGGK 1 17.05399 LM2480 F1 F1.1
## 2: P14164_DLAIGAHGGK 5 17.25664 LM2483 F2 F2.1
## 3: P14164_DLAIGAHGGK 13 16.64724 LM2482 M2 M2.1
## 4: P16622_AFSENITK 1 16.69163 LM2480 F1 F1.1
## 5: P16622_AFSENITK 5 17.27895 LM2483 F2 F2.1
## 6: P16622_AFSENITK 7 16.93570 LM2499 F2 F2.3
## TotalGroupMeasurements NumMeasuredFeature MissingPercentage more50missing
## <int> <int> <num> <lgcl>
## 1: 4 1 0 FALSE
## 2: 4 1 0 FALSE
## 3: 4 1 0 FALSE
## 4: 4 1 0 FALSE
## 5: 4 1 0 FALSE
## 6: 4 1 0 FALSE
## NumImputedFeature Protein
## <num> <char>
## 1: 0 P14164
## 2: 0 P14164
## 3: 0 P14164
## 4: 0 P16622
## 5: 0 P16622
## 6: 0 P16622
## PROTEIN PEPTIDE TRANSITION FEATURE LABEL GROUP RUN SUBJECT
## 1 P16622 AFSENITK_2 y5_1 AFSENITK_2_y5_1 L F1 1 F1.1
## 2 P16622 AFSENITK_2 y5_1 AFSENITK_2_y5_1 L F1 2 F1.2
## 3 P16622 AFSENITK_2 y5_1 AFSENITK_2_y5_1 L F1 3 F1.3
## 4 P16622 AFSENITK_2 y5_1 AFSENITK_2_y5_1 L F1 4 F1.4
## 5 P16622 AFSENITK_2 y5_1 AFSENITK_2_y5_1 L F2 5 F2.1
## 6 P16622 AFSENITK_2 y5_1 AFSENITK_2_y5_1 L F2 6 F2.2
## FRACTION originalRUN censored INTENSITY ABUNDANCE newABUNDANCE predicted
## 1 1 LM2487 TRUE NA NA NA NA
## 2 1 LM2489 FALSE 104817.1 17.08109 17.08109 NA
## 3 1 LM2502 FALSE 405851.8 17.97301 17.97301 NA
## 4 1 LM2504 TRUE NA NA NA NA
## 5 1 LM2484 TRUE NA NA 17.21114 17.21114
## 6 1 LM2491 TRUE NA NA 17.25962 17.25962
## RUN Protein LogIntensities originalRUN GROUP SUBJECT TotalGroupMeasurements
## 1 2 P16622 16.61602 LM2489 F1 F1.2 8
## 2 3 P16622 17.34884 LM2502 F1 F1.3 8
## 3 5 P16622 17.24025 LM2484 F2 F2.1 8
## 4 6 P16622 17.30643 LM2491 F2 F2.2 8
## 5 7 P16622 16.90965 LM2503 F2 F2.3 8
## 6 8 P16622 17.11269 LM2507 F2 F2.4 8
## NumMeasuredFeature MissingPercentage more50missing NumImputedFeature
## 1 1 0.5 TRUE 1
## 2 1 0.5 TRUE 1
## 3 1 0.5 TRUE 1
## 4 1 0.5 TRUE 1
## 5 1 0.5 TRUE 1
## 6 1 0.5 TRUE 1
Save and/or load summarized data.
Run the modeling to obtain significantly altered peptides and
proteins. The function groupComparisonLiP
outputs a list
with three separate models: 1. LiP.Model
, which contains
the differential analysis on peptide level in the LiP sample without
correction for protein abundance alterations. 2.
Adjusted.LiP.Model
, which contains the differential
analysis on peptide level in the LiP sample with correction for protein
abundance alterations 3. TrP.Model
, which contains the
differential analysis on protein level. The default setting of the
function is a pairwise comparison of all existing groups. Alternatively,
a contrast matrix can be provided to specify the comparisons of
interest. See Vignette for details.
## INFO [2024-10-30 09:13:39] == Start to test and get inference in whole plot ...
## | | | 0% | |======== | 11% | |================ | 22% | |======================= | 33% | |=============================== | 44% | |======================================= | 56% | |=============================================== | 67% | |====================================================== | 78% | |============================================================== | 89% | |======================================================================| 100%
## INFO [2024-10-30 09:13:39] == Comparisons for all proteins are done.
## INFO [2024-10-30 09:13:39] == Start to test and get inference in whole plot ...
## | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
## INFO [2024-10-30 09:13:39] == Comparisons for all proteins are done.
Inspect MSstatsLiP_model
.
## Key: <FULL_PEPTIDE>
## FULL_PEPTIDE Label log2FC SE Tvalue DF pvalue adj.pvalue
## <char> <char> <num> <num> <num> <int> <num> <num>
## 1: P14164_DLAIGAHGGK F1 vs F2 -0.2026580 NaN NaN 0 NaN NaN
## 2: P14164_DLAIGAHGGK F1 vs M2 0.4067448 NaN NaN 0 NaN NaN
## 3: P14164_DLAIGAHGGK F1 vs M1 Inf NA NA NA NA 0
## 4: P14164_DLAIGAHGGK F2 vs M2 0.6094028 NaN NaN 0 NaN NaN
## 5: P14164_DLAIGAHGGK F2 vs M1 Inf NA NA NA NA 0
## 6: P14164_DLAIGAHGGK M2 vs M1 Inf NA NA NA NA 0
## issue MissingPercentage ImputationPercentage ProteinName
## <char> <num> <num> <char>
## 1: <NA> 0.7500000 0 P14164
## 2: <NA> 0.9166667 0 P14164
## 3: oneConditionMissing 0.7500000 0 P14164
## 4: <NA> 0.9166667 0 P14164
## 5: oneConditionMissing 0.7500000 0 P14164
## 6: oneConditionMissing 0.9166667 0 P14164
## PeptideSequence
## <char>
## 1: DLAIGAHGGK
## 2: DLAIGAHGGK
## 3: DLAIGAHGGK
## 4: DLAIGAHGGK
## 5: DLAIGAHGGK
## 6: DLAIGAHGGK
## Protein Label log2FC SE Tvalue DF pvalue adj.pvalue
## <fctr> <char> <num> <num> <num> <int> <num> <num>
## 1: P16622 F1 vs F2 -0.1598201 0.3004019 -0.532021 7 0.61117223 0.6854285
## 2: P16622 F1 vs M2 -0.5959528 0.3004019 -1.983852 7 0.08768248 0.2161724
## 3: P16622 F1 vs M1 Inf NA NA NA NA 0.0000000
## 4: P16622 F2 vs M2 -0.4361327 0.2452771 -1.778122 7 0.11862006 0.2372401
## 5: P16622 F2 vs M1 Inf NA NA NA NA 0.0000000
## 6: P16622 M2 vs M1 Inf NA NA NA NA 0.0000000
## issue MissingPercentage ImputationPercentage
## <char> <num> <num>
## 1: <NA> 0.625 0.375
## 2: <NA> 0.875 0.125
## 3: oneConditionMissing 0.500 0.250
## 4: <NA> 0.750 0.250
## 5: oneConditionMissing 0.375 0.375
## 6: oneConditionMissing 0.625 0.125
Save and/or load model data.
Proteolytic resistance ratios are calculated as the ratio of the intensity of fully tryptic peptides in the LiP condition to the TrP condition. In general, a low protease resistance value is indicative of high extent of cleavage, while high protease resistance values indicate low cleavage extent.
Accessibility = calculateProteolyticResistance(MSstatsLiP_Summarized,
fasta_file,
differential_analysis = TRUE)
## INFO [2024-10-30 09:13:39] == Start to test and get inference in whole plot ...
## | | | 0% | |======== | 11% | |================ | 22% | |======================= | 33%
## | |=============================== | 44% | |======================================= | 56% | |=============================================== | 67% | |====================================================== | 78% | |============================================================== | 89% | |======================================================================| 100%
## INFO [2024-10-30 09:13:39] == Comparisons for all proteins are done.
## Key: <FULL_PEPTIDE, Protein, GROUP, RUN>
## FULL_PEPTIDE Protein GROUP RUN PeptideSequence
## <char> <char> <fctr> <fctr> <char>
## 1: P14164_DLAIGAHGGK P14164 F1 1 DLAIGAHGGK
## 2: P14164_DLAIGAHGGK P14164 F2 5 DLAIGAHGGK
## 3: P14164_DLAIGAHGGK P14164 M2 13 DLAIGAHGGK
## 4: P16622_AFSENITK P16622 F1 1 AFSENITK
## 5: P16622_AFSENITK P16622 F2 5 AFSENITK
## 6: P16622_AFSENITK P16622 F2 7 AFSENITK
## 7: P16622_AFSENITK P16622 F2 8 AFSENITK
## 8: P16622_AFSENITK P16622 M1 9 AFSENITK
## 9: P16622_AFSENITK P16622 M1 11 AFSENITK
## 10: P16622_AFSENITK P16622 M2 13 AFSENITK
## 11: P16622_AFSENITK P16622 M2 14 AFSENITK
## 12: P16622_AFSENITK P16622 M2 15 AFSENITK
## 13: P16622_AFSENITK P16622 M2 16 AFSENITK
## 14: P16622_PLTAETYK P16622 F1 1 PLTAETYK
## 15: P16622_PLTAETYK P16622 F1 2 PLTAETYK
## 16: P16622_PLTAETYK P16622 F1 3 PLTAETYK
## 17: P16622_PLTAETYK P16622 F2 5 PLTAETYK
## 18: P16622_PLTAETYK P16622 F2 7 PLTAETYK
## 19: P16622_PLTAETYK P16622 F2 8 PLTAETYK
## 20: P16622_PLTAETYK P16622 F2 6 PLTAETYK
## 21: P16622_PLTAETYK P16622 M1 9 PLTAETYK
## 22: P16622_PLTAETYK P16622 M1 11 PLTAETYK
## 23: P16622_PLTAETYK P16622 M1 10 PLTAETYK
## 24: P16622_PLTAETYK P16622 M1 12 PLTAETYK
## 25: P16622_PLTAETYK P16622 M2 13 PLTAETYK
## 26: P16622_PLTAETYK P16622 M2 14 PLTAETYK
## 27: P16622_PLTAETYK P16622 M2 15 PLTAETYK
## 28: P16622_PLTAETYK P16622 M2 16 PLTAETYK
## 29: P17891_ALQLINQDDADIIGGR P17891 F1 1 ALQLINQDDADIIGGR
## 30: P17891_ALQLINQDDADIIGGR P17891 M1 9 ALQLINQDDADIIGGR
## 31: P17891_ALQLINQDDADIIGGR P17891 M1 11 ALQLINQDDADIIGGR
## 32: P17891_ALQLINQDDADIIGGR P17891 M2 14 ALQLINQDDADIIGGR
## 33: P17891_ALQLINQDDADIIGGR P17891 M2 15 ALQLINQDDADIIGGR
## 34: P17891_ELQDEAIK P17891 F1 1 ELQDEAIK
## 35: P17891_ELQDEAIK P17891 F2 5 ELQDEAIK
## 36: P17891_ELQDEAIK P17891 M1 9 ELQDEAIK
## 37: P17891_ELQDEAIK P17891 M1 11 ELQDEAIK
## 38: P17891_ELQDEAIK P17891 M2 13 ELQDEAIK
## 39: P17891_ELQDEAIK P17891 M2 14 ELQDEAIK
## 40: P17891_SEVVDQWK P17891 F1 1 SEVVDQWK
## 41: P17891_SEVVDQWK P17891 F2 5 SEVVDQWK
## 42: P17891_SEVVDQWK P17891 M1 9 SEVVDQWK
## 43: P17891_SEVVDQWK P17891 M1 10 SEVVDQWK
## 44: P17891_SEVVDQWK P17891 M2 13 SEVVDQWK
## 45: P38805_LFQSILPQNPDIEGR P38805 F1 1 LFQSILPQNPDIEGR
## 46: P38805_LFQSILPQNPDIEGR P38805 F1 3 LFQSILPQNPDIEGR
## 47: P38805_LFQSILPQNPDIEGR P38805 F2 5 LFQSILPQNPDIEGR
## 48: Q02908_IYPTLVIR Q02908 F1 1 IYPTLVIR
## 49: Q02908_IYPTLVIR Q02908 F1 3 IYPTLVIR
## 50: Q02908_IYPTLVIR Q02908 F2 5 IYPTLVIR
## 51: Q02908_IYPTLVIR Q02908 M1 9 IYPTLVIR
## 52: Q02908_IYPTLVIR Q02908 M2 13 IYPTLVIR
## 53: Q02908_IYPTLVIR Q02908 M2 15 IYPTLVIR
## 54: Q02908_VQPDQVELIR Q02908 F1 1 VQPDQVELIR
## 55: Q02908_VQPDQVELIR Q02908 F1 2 VQPDQVELIR
## 56: Q02908_VQPDQVELIR Q02908 F1 3 VQPDQVELIR
## 57: Q02908_VQPDQVELIR Q02908 F2 5 VQPDQVELIR
## 58: Q02908_VQPDQVELIR Q02908 F2 7 VQPDQVELIR
## 59: Q02908_VQPDQVELIR Q02908 F2 8 VQPDQVELIR
## 60: Q02908_VQPDQVELIR Q02908 M1 9 VQPDQVELIR
## 61: Q02908_VQPDQVELIR Q02908 M1 11 VQPDQVELIR
## 62: Q02908_VQPDQVELIR Q02908 M2 13 VQPDQVELIR
## 63: Q02908_VQPDQVELIR Q02908 M2 14 VQPDQVELIR
## 64: Q02908_VQPDQVELIR Q02908 M2 15 VQPDQVELIR
## FULL_PEPTIDE Protein GROUP RUN PeptideSequence
## Accessibility_ratio originalRUN SUBJECT TotalGroupMeasurements
## <num> <fctr> <char> <int>
## 1: NA LM2480 F1.1 4
## 2: NA LM2483 F2.1 4
## 3: NA LM2482 M2.1 4
## 4: NA LM2480 F1.1 4
## 5: 1.0000000 LM2483 F2.1 4
## 6: 1.0000000 LM2499 F2.3 4
## 7: 0.8682778 LM2508 F2.4 4
## 8: NA LM2481 M1.1 4
## 9: NA LM2498 M1.3 4
## 10: 1.0000000 LM2482 M2.1 4
## 11: 0.5359730 LM2493 M2.2 4
## 12: 0.6155192 LM2496 M2.3 4
## 13: 0.4883203 LM2509 M2.4 4
## 14: NA LM2480 F1.1 8
## 15: 0.7540955 LM2494 F1.2 8
## 16: 0.4709501 LM2497 F1.3 8
## 17: 0.8378363 LM2483 F2.1 8
## 18: 0.8339002 LM2499 F2.3 8
## 19: 0.6555789 LM2508 F2.4 8
## 20: 0.4973797 LM2495 F2.2 8
## 21: NA LM2481 M1.1 8
## 22: NA LM2498 M1.3 8
## 23: NA LM2492 M1.2 8
## 24: NA LM2510 M1.4 8
## 25: 0.9704908 LM2482 M2.1 8
## 26: 0.4694516 LM2493 M2.2 8
## 27: 0.5674287 LM2496 M2.3 8
## 28: 0.4205355 LM2509 M2.4 8
## 29: 1.0000000 LM2480 F1.1 4
## 30: 1.0000000 LM2481 M1.1 4
## 31: 1.0000000 LM2498 M1.3 4
## 32: 1.0000000 LM2493 M2.2 4
## 33: 1.0000000 LM2496 M2.3 4
## 34: 1.0000000 LM2480 F1.1 8
## 35: 0.8109229 LM2483 F2.1 8
## 36: 1.0000000 LM2481 M1.1 8
## 37: 1.0000000 LM2498 M1.3 8
## 38: 0.8691530 LM2482 M2.1 8
## 39: 1.0000000 LM2493 M2.2 8
## 40: 1.0000000 LM2480 F1.1 8
## 41: 0.8429855 LM2483 F2.1 8
## 42: 1.0000000 LM2481 M1.1 8
## 43: 0.7495376 LM2492 M1.2 8
## 44: 0.9519563 LM2482 M2.1 8
## 45: NA LM2480 F1.1 4
## 46: NA LM2497 F1.3 4
## 47: NA LM2483 F2.1 4
## 48: 1.0000000 LM2480 F1.1 4
## 49: NA LM2497 F1.3 4
## 50: 0.8885471 LM2483 F2.1 4
## 51: 0.8779434 LM2481 M1.1 4
## 52: 0.8178262 LM2482 M2.1 4
## 53: 1.0000000 LM2496 M2.3 4
## 54: 1.0000000 LM2480 F1.1 4
## 55: 1.0000000 LM2494 F1.2 4
## 56: NA LM2497 F1.3 4
## 57: 1.0000000 LM2483 F2.1 4
## 58: NA LM2499 F2.3 4
## 59: 0.9414730 LM2508 F2.4 4
## 60: 1.0000000 LM2481 M1.1 4
## 61: NA LM2498 M1.3 4
## 62: 1.0000000 LM2482 M2.1 4
## 63: 1.0000000 LM2493 M2.2 4
## 64: 1.0000000 LM2496 M2.3 4
## Accessibility_ratio originalRUN SUBJECT TotalGroupMeasurements
## NumMeasuredFeature MissingPercentage more50missing NumImputedFeature
## <int> <num> <lgcl> <num>
## 1: 1 0.0 FALSE 0
## 2: 1 0.0 FALSE 0
## 3: 1 0.0 FALSE 0
## 4: 1 0.0 FALSE 0
## 5: 1 0.0 FALSE 0
## 6: 1 0.0 FALSE 0
## 7: 1 0.0 FALSE 0
## 8: 1 0.0 FALSE 0
## 9: 1 0.0 FALSE 0
## 10: 1 0.0 FALSE 0
## 11: 1 0.0 FALSE 0
## 12: 1 0.0 FALSE 0
## 13: 1 0.0 FALSE 0
## 14: 2 0.0 FALSE 0
## 15: 1 0.5 TRUE 0
## 16: 1 0.5 TRUE 0
## 17: 1 0.5 TRUE 0
## 18: 1 0.5 TRUE 0
## 19: 1 0.5 TRUE 0
## 20: 1 0.5 TRUE 0
## 21: 2 0.0 FALSE 0
## 22: 2 0.0 FALSE 0
## 23: 1 0.5 TRUE 0
## 24: 1 0.5 TRUE 0
## 25: 1 0.5 TRUE 0
## 26: 1 0.5 TRUE 0
## 27: 1 0.5 TRUE 0
## 28: 1 0.5 TRUE 0
## 29: 1 0.0 FALSE 0
## 30: 1 0.0 FALSE 0
## 31: 1 0.0 FALSE 0
## 32: 1 0.0 FALSE 0
## 33: 1 0.0 FALSE 0
## 34: 2 0.0 FALSE 0
## 35: 2 0.0 FALSE 0
## 36: 2 0.0 FALSE 0
## 37: 1 0.5 TRUE 0
## 38: 2 0.0 FALSE 0
## 39: 1 0.5 TRUE 0
## 40: 2 0.0 FALSE 0
## 41: 2 0.0 FALSE 0
## 42: 2 0.0 FALSE 0
## 43: 1 0.5 TRUE 0
## 44: 2 0.0 FALSE 0
## 45: 1 0.0 FALSE 0
## 46: 1 0.0 FALSE 0
## 47: 1 0.0 FALSE 0
## 48: 1 0.0 FALSE 0
## 49: 1 0.0 FALSE 0
## 50: 1 0.0 FALSE 0
## 51: 1 0.0 FALSE 0
## 52: 1 0.0 FALSE 0
## 53: 1 0.0 FALSE 0
## 54: 1 0.0 FALSE 0
## 55: 1 0.0 FALSE 0
## 56: 1 0.0 FALSE 0
## 57: 1 0.0 FALSE 0
## 58: 1 0.0 FALSE 0
## 59: 1 0.0 FALSE 0
## 60: 1 0.0 FALSE 0
## 61: 1 0.0 FALSE 0
## 62: 1 0.0 FALSE 0
## 63: 1 0.0 FALSE 0
## 64: 1 0.0 FALSE 0
## NumMeasuredFeature MissingPercentage more50missing NumImputedFeature
In this paragraph we described how to compare proteolytic resistance patterns of different conditions, as reported in Cappelletti et al., 2021, Figure 3. As described in the “Protease digestion accessibility analysis” paragraph of Cappelletti et al., proteolytic resistance is calculated as the ratio of the intensity of fully tryptic peptides in the LiP condition to the TrP condition and can be compared across different conditions using the linear mixed effects models-based differential analysis implemented in the MSstatsLiP package. First, infinite values are filtered out from the result of the groupComparisonLiP function. Next, logFCs and standard errors of the LiP (log2FC, s2) and TrP (log2FC_ref,s2_ref) models are combined and Student’s T-test is applied to compare proteolytic resistance between different conditions.Finally, p-values are adjusted for multiple comparisons (default is Benjamini & Hochberg method). In general, a low Proteolytic resistance value is indicative of high extent of cleavage, while high Proteolytic resistance values indicate low cleavage extent.
## Protein Label log2FC SE Tvalue DF pvalue
## <char> <char> <num> <num> <num> <int> <num>
## 1: P16622 F1 vs F2 -Inf NA NA NA NA
## 2: P16622 F1 vs M2 -Inf NA NA NA NA
## 3: P16622 F1 vs M1 NA NA NA NA NA
## 4: P16622 F2 vs M2 2.961395e-01 0.14247902 2.078478e+00 5 0.09224015
## 5: P16622 F2 vs M1 Inf NA NA NA NA
## 6: P16622 M2 vs M1 Inf NA NA NA NA
## 7: P16622 F1 vs F2 -9.365098e-02 0.18144349 -5.161441e-01 7 0.62165284
## 8: P16622 F1 vs M2 5.546153e-03 0.18144349 3.056683e-02 7 0.97646825
## 9: P16622 F1 vs M1 Inf NA NA NA NA
## 10: P16622 F2 vs M2 9.919714e-02 0.14814799 6.695814e-01 7 0.52458808
## 11: P16622 F2 vs M1 Inf NA NA NA NA
## 12: P16622 M2 vs M1 Inf NA NA NA NA
## 13: P17891 F1 vs F2 Inf NA NA NA NA
## 14: P17891 F1 vs M2 -4.965068e-16 0.00000000 -Inf 2 0.00000000
## 15: P17891 F1 vs M1 -4.965068e-16 0.00000000 -Inf 2 0.00000000
## 16: P17891 F2 vs M2 -Inf NA NA NA NA
## 17: P17891 F2 vs M1 -Inf NA NA NA NA
## 18: P17891 M2 vs M1 -9.860761e-32 0.00000000 -Inf 2 0.00000000
## 19: P17891 F1 vs F2 1.890771e-01 0.09252279 2.043573e+00 2 0.17770087
## 20: P17891 F1 vs M2 6.542349e-02 0.08012709 8.164966e-01 2 0.50000000
## 21: P17891 F1 vs M1 6.461001e-16 0.08012709 8.063441e-15 2 1.00000000
## 22: P17891 F2 vs M2 -1.236536e-01 0.08012709 -1.543219e+00 2 0.26274985
## 23: P17891 F2 vs M1 -1.890771e-01 0.08012709 -2.359715e+00 2 0.14224810
## 24: P17891 M2 vs M1 -6.542349e-02 0.06542349 -1.000000e+00 2 0.42264973
## 25: P17891 F1 vs F2 1.570145e-01 0.25046243 6.268983e-01 1 0.64351635
## 26: P17891 F1 vs M2 4.804366e-02 0.25046243 1.918198e-01 1 0.87934924
## 27: P17891 F1 vs M1 1.252312e-01 0.21690682 5.773503e-01 1 0.66666667
## 28: P17891 F2 vs M2 -1.089708e-01 0.25046243 -4.350785e-01 1 0.73874644
## 29: P17891 F2 vs M1 -3.178325e-02 0.21690682 -1.465295e-01 1 0.90737557
## 30: P17891 M2 vs M1 7.718755e-02 0.21690682 3.558558e-01 1 0.78235114
## 31: Q02908 F1 vs F2 1.114529e-01 0.18217377 6.117947e-01 1 0.65046587
## 32: Q02908 F1 vs M2 9.108688e-02 0.15776711 5.773503e-01 1 0.66666667
## 33: Q02908 F1 vs M1 1.220566e-01 0.18217377 6.700010e-01 1 0.62419863
## 34: Q02908 F2 vs M2 -2.036605e-02 0.15776711 -1.290893e-01 1 0.91827115
## 35: Q02908 F2 vs M1 1.060366e-02 0.18217377 5.820630e-02 1 0.96298648
## 36: Q02908 M2 vs M1 3.096972e-02 0.15776711 1.963002e-01 1 0.87660046
## 37: Q02908 F1 vs F2 2.926351e-02 0.02069243 1.414214e+00 4 0.23019964
## 38: Q02908 F1 vs M2 -2.280353e-16 0.01888951 -1.207206e-14 4 1.00000000
## 39: Q02908 F1 vs M1 -2.318359e-16 0.02534294 -9.147948e-15 4 1.00000000
## 40: Q02908 F2 vs M2 -2.926351e-02 0.01888951 -1.549193e+00 4 0.19626118
## 41: Q02908 F2 vs M1 -2.926351e-02 0.02534294 -1.154701e+00 4 0.31250000
## 42: Q02908 M2 vs M1 -3.800589e-18 0.02389355 -1.590633e-16 4 1.00000000
## Protein Label log2FC SE Tvalue DF pvalue
## adj.pvalue issue MissingPercentage ImputationPercentage
## <num> <char> <num> <num>
## 1: 0.0000000 oneConditionMissing 0.5714286 0
## 2: 0.0000000 oneConditionMissing 1.0000000 0
## 3: NA completeMissing 0.4285714 0
## 4: 0.5254997 <NA> 0.6250000 0
## 5: 0.0000000 oneConditionMissing 0.1250000 0
## 6: 0.0000000 oneConditionMissing 0.5000000 0
## 7: 0.6504659 <NA> 0.6250000 0
## 8: 1.0000000 <NA> 0.9000000 0
## 9: 0.0000000 oneConditionMissing 0.6250000 0
## 10: 0.7868821 <NA> 0.8000000 0
## 11: 0.0000000 oneConditionMissing 0.5000000 0
## 12: 0.0000000 oneConditionMissing 0.8000000 0
## 13: 0.0000000 oneConditionMissing 0.9166667 0
## 14: 0.0000000 <NA> 0.6250000 0
## 15: 0.0000000 <NA> 0.6250000 0
## 16: 0.0000000 oneConditionMissing 0.8333333 0
## 17: 0.0000000 oneConditionMissing 0.8333333 0
## 18: 0.0000000 <NA> 0.5000000 0
## 19: 0.5754991 <NA> 0.7500000 0
## 20: 1.0000000 <NA> 0.6875000 0
## 21: 1.0000000 <NA> 0.6875000 0
## 22: 0.5254997 <NA> 0.6875000 0
## 23: 0.5689924 <NA> 0.6875000 0
## 24: 1.0000000 <NA> 0.6250000 0
## 25: 0.6504659 <NA> 0.7500000 0
## 26: 1.0000000 <NA> 0.6875000 0
## 27: 1.0000000 <NA> 0.7500000 0
## 28: 0.8864957 <NA> 0.6875000 0
## 29: 0.9629865 <NA> 0.7500000 0
## 30: 1.0000000 <NA> 0.6875000 0
## 31: 0.6504659 <NA> 0.7500000 0
## 32: 1.0000000 <NA> 0.7500000 0
## 33: 1.0000000 <NA> 0.6250000 0
## 34: 0.9182711 <NA> 0.7500000 0
## 35: 0.9629865 <NA> 0.6250000 0
## 36: 1.0000000 <NA> 0.6250000 0
## 37: 0.5754991 <NA> 0.5000000 0
## 38: 1.0000000 <NA> 0.6250000 0
## 39: 1.0000000 <NA> 0.3750000 0
## 40: 0.5254997 <NA> 0.6250000 0
## 41: 0.6250000 <NA> 0.3750000 0
## 42: 1.0000000 <NA> 0.5000000 0
## adj.pvalue issue MissingPercentage ImputationPercentage
## PeptideSequence FULL_PEPTIDE
## <char> <char>
## 1: AFSENITK P16622_AFSENITK
## 2: AFSENITK P16622_AFSENITK
## 3: AFSENITK P16622_AFSENITK
## 4: AFSENITK P16622_AFSENITK
## 5: AFSENITK P16622_AFSENITK
## 6: AFSENITK P16622_AFSENITK
## 7: PLTAETYK P16622_PLTAETYK
## 8: PLTAETYK P16622_PLTAETYK
## 9: PLTAETYK P16622_PLTAETYK
## 10: PLTAETYK P16622_PLTAETYK
## 11: PLTAETYK P16622_PLTAETYK
## 12: PLTAETYK P16622_PLTAETYK
## 13: ALQLINQDDADIIGGR P17891_ALQLINQDDADIIGGR
## 14: ALQLINQDDADIIGGR P17891_ALQLINQDDADIIGGR
## 15: ALQLINQDDADIIGGR P17891_ALQLINQDDADIIGGR
## 16: ALQLINQDDADIIGGR P17891_ALQLINQDDADIIGGR
## 17: ALQLINQDDADIIGGR P17891_ALQLINQDDADIIGGR
## 18: ALQLINQDDADIIGGR P17891_ALQLINQDDADIIGGR
## 19: ELQDEAIK P17891_ELQDEAIK
## 20: ELQDEAIK P17891_ELQDEAIK
## 21: ELQDEAIK P17891_ELQDEAIK
## 22: ELQDEAIK P17891_ELQDEAIK
## 23: ELQDEAIK P17891_ELQDEAIK
## 24: ELQDEAIK P17891_ELQDEAIK
## 25: SEVVDQWK P17891_SEVVDQWK
## 26: SEVVDQWK P17891_SEVVDQWK
## 27: SEVVDQWK P17891_SEVVDQWK
## 28: SEVVDQWK P17891_SEVVDQWK
## 29: SEVVDQWK P17891_SEVVDQWK
## 30: SEVVDQWK P17891_SEVVDQWK
## 31: IYPTLVIR Q02908_IYPTLVIR
## 32: IYPTLVIR Q02908_IYPTLVIR
## 33: IYPTLVIR Q02908_IYPTLVIR
## 34: IYPTLVIR Q02908_IYPTLVIR
## 35: IYPTLVIR Q02908_IYPTLVIR
## 36: IYPTLVIR Q02908_IYPTLVIR
## 37: VQPDQVELIR Q02908_VQPDQVELIR
## 38: VQPDQVELIR Q02908_VQPDQVELIR
## 39: VQPDQVELIR Q02908_VQPDQVELIR
## 40: VQPDQVELIR Q02908_VQPDQVELIR
## 41: VQPDQVELIR Q02908_VQPDQVELIR
## 42: VQPDQVELIR Q02908_VQPDQVELIR
## PeptideSequence FULL_PEPTIDE
Save and/or load model data
Save the output of the modeling in a .csv file.
Proteolytic resistance barcodes can be used to visualize FT peptides along the sequence of aSynucelin. Significant peptides showing high protease resistance are colored in red, significant peptides showing a decreased protease resistance are colored in blue and non-significant peptides (no change in protease resistance between conditions) are colored in grey. Black regions represent regions with no identified matching peptide. Position of the NAC domain is indicated by a rectangle.