Title: | LiP Significance Analysis in shotgun mass spectrometry-based proteomic experiments |
---|---|
Description: | Tools for LiP peptide and protein significance analysis. Provides functions for summarization, estimation of LiP peptide abundance, and detection of changes across conditions. Utilizes functionality across the MSstats family of packages. |
Authors: | Devon Kohler [aut, cre], Tsung-Heng Tsai [aut], Ting Huang [aut], Mateusz Staniak [aut], Meena Choi [aut], Valentina Cappelletti [aut], Liliana Malinovska [aut], Olga Vitek [aut] |
Maintainer: | Devon Kohler <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.13.0 |
Built: | 2024-11-29 08:28:46 UTC |
Source: | https://github.com/bioc/MSstatsLiP |
annotSite
annotates modified sites as their residues and locations.
annotSite(aaIndex, residue, lenIndex = NULL)
annotSite(aaIndex, residue, lenIndex = NULL)
aaIndex |
An integer vector. Location of the sites. |
residue |
A string vector. Amino acid residue. |
lenIndex |
An integer. Default is |
A string.
annotSite(10, "K") annotSite(10, "K", 3L)
annotSite(10, "K") annotSite(10, "K", 3L)
Calcutates proteolytic resistance for provided data. Requires input from dataSummarizationLiP function. Can optionally calculate differential analysis using proteolytic resistance. In order for this function to work, Conditions and run numbers must match between the LiP and TrP data.
calculateProteolyticResistance( LiP_data, fasta_file, differential_analysis = FALSE, contrast.matrix = "pairwise" )
calculateProteolyticResistance( LiP_data, fasta_file, differential_analysis = FALSE, contrast.matrix = "pairwise" )
LiP_data |
name of variable containing LiP data. Must be output of dataSummarizationLiP function. |
fasta_file |
name of variable containing FASTA data. If FASTA file has not been processed please run the tidyFasta() function on it before inputting into this function. Protein names in file must match those in LiP_data. |
differential_analysis |
logical indicating whether to run differential analysis. Default is FALSE. Conditions and run numbers must match between the LiP and TrP data. |
contrast.matrix |
either a string of "pairwise" or a matrix including what comparisons to make in the differential analysis. Only required if differential_analysis=TRUE. Default is "pairwise". |
a data.frame
including either the summarized Proteolytic data or
differential analysis depending on parameter selection.
fasta <- tidyFasta(system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP")) #calculateProteolyticResistance(MSstatsLiP_data, fasta)
fasta <- tidyFasta(system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP")) #calculateProteolyticResistance(MSstatsLiP_data, fasta)
Takes as as input LiP data and a fasta file. These can be the outputs of MSstatsLiP functions.
calculateTrypticity(LiP_data, fasta_file)
calculateTrypticity(LiP_data, fasta_file)
LiP_data |
name of variable containing LiP data. Must contain at least two columns named 'PeptideSequence' and 'ProteinName'. The values in these column must match with what is in the corresponding FASTA file. |
fasta_file |
name of variable containing FASTA data. If FASTA file has not been processed please run the tidyFasta() function on it before inputting into this function. |
a data.frame
including protein, peptide, and trypticity metrics.
fasta <- tidyFasta(system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP")) calculateTrypticity(MSstatsLiP_data$LiP, fasta)
fasta <- tidyFasta(system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP")) calculateTrypticity(MSstatsLiP_data$LiP, fasta)
Plot run correlation for provided LiP and TrP experiment.
correlationPlotLiP( data, method = "pearson", value_columns = "INTENSITY", x.axis.size = 10, y.axis.size = 10, legend.size = 10, width = 10, height = 10, address = "" )
correlationPlotLiP( data, method = "pearson", value_columns = "INTENSITY", x.axis.size = 10, y.axis.size = 10, legend.size = 10, width = 10, height = 10, address = "" )
data |
output of MSstatsLiP converter function. Must include at least ProteinName, Run, and Intensity columns |
method |
one of "pearson", "kendall", "spearman". Default is pearson. |
value_columns |
one of "INTENSITY" or "ABUNDANCE". INTENSITY is the raw data, whereas ABUNDANCE is the log transformed INTENSITY column. INTENSITY is default. |
x.axis.size |
size of axes labels, e.g. name of the comparisons in heatmap, and in comparison plot. Default is 10. |
y.axis.size |
size of axes labels, e.g. name of targeted proteins in heatmap. Default is 10. |
legend.size |
size of legend for color at the bottom of volcano plot. Default is 10. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "VolcanoPlot.pdf" or "Heatmap.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window |
plot or pdf
## Use output of dataSummarizationLiP function correlationPlotLiP(MSstatsLiP_Summarized, address = FALSE)
## Use output of dataSummarizationLiP function correlationPlotLiP(MSstatsLiP_Summarized, address = FALSE)
To illustrate the quantitative data and quality control of MS runs, dataProcessPlotsLiP takes the quantitative data from MSstatsLiP converter functions as input and generate two types of figures in pdf files as output : (1) profile plot (specify "ProfilePlot" in option type), to identify the potential sources of variation for each protein; (2) quality control plot (specify "QCPlot" in option type), to evaluate the systematic bias between MS runs.
dataProcessPlotsLiP( data, type = "PROFILEPLOT", ylimUp = FALSE, ylimDown = FALSE, x.axis.size = 10, y.axis.size = 10, text.size = 4, text.angle = 90, legend.size = 7, dot.size.profile = 2, ncol.guide = 5, width = 10, height = 12, lip.title = "All Peptides", protein.title = "All Proteins", which.Peptide = "all", which.Protein = NULL, originalPlot = TRUE, summaryPlot = TRUE, address = "" )
dataProcessPlotsLiP( data, type = "PROFILEPLOT", ylimUp = FALSE, ylimDown = FALSE, x.axis.size = 10, y.axis.size = 10, text.size = 4, text.angle = 90, legend.size = 7, dot.size.profile = 2, ncol.guide = 5, width = 10, height = 12, lip.title = "All Peptides", protein.title = "All Proteins", which.Peptide = "all", which.Protein = NULL, originalPlot = TRUE, summaryPlot = TRUE, address = "" )
data |
name of the list with LiP and (optionally) Protein data, which
can be the output of the MSstatsLiP.
|
type |
choice of visualization. "ProfilePlot" represents profile plot of log intensities across MS runs. "QCPlot" represents box plots of log intensities across channels and MS runs. |
ylimUp |
upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses the upper limit as rounded off maximum of log2(intensities) after normalization + 3.. |
ylimDown |
lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses 0.. |
x.axis.size |
size of x-axis labeling for "Run" and "channel in Profile Plot and QC Plot. |
y.axis.size |
size of y-axis labels. Default is 10. |
text.size |
size of labels represented each condition at the top of Profile plot and QC plot. Default is 4. |
text.angle |
angle of labels represented each condition at the top of Profile plot and QC plot. Default is 0. |
legend.size |
size of legend above Profile plot. Default is 7. |
dot.size.profile |
size of dots in Profile plot. Default is 2. |
ncol.guide |
number of columns for legends at the top of plot. Default is 5. |
width |
width of the saved pdf file. Default is 10. |
height |
height of the saved pdf file. Default is 10. |
lip.title |
title of all LiP QC plot |
protein.title |
title of all Protein QC plot |
which.Peptide |
LiP peptide list to draw plots. List can be names of LiP peptides or order numbers of LiPs. Default is "all", which generates all plots for each protein. For QC plot, "allonly" will generate one QC plot with all proteins. |
which.Protein |
String of protein's to plot if the user would like to plot all Peptides associated with a given Protein. Default is NULL. Please do not include "all" or "allonly" here. |
originalPlot |
TRUE(default) draws original profile plots, without normalization. |
summaryPlot |
TRUE(default) draws profile plots with protein summarization for each channel and MS run. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "ProfilePlot.pdf" or "QCplot.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window. |
plot or pdf
# Use the output of the MSstatsLiP_Summarized function # Profile Plot dataProcessPlotsLiP(MSstatsLiP_Summarized, type = "ProfilePlot") # QCPlot Plot dataProcessPlotsLiP(MSstatsLiP_Summarized, type = "QCPlot")
# Use the output of the MSstatsLiP_Summarized function # Profile Plot dataProcessPlotsLiP(MSstatsLiP_Summarized, type = "ProfilePlot") # QCPlot Plot dataProcessPlotsLiP(MSstatsLiP_Summarized, type = "QCPlot")
Utilizes functionality from MSstats and MSstatsPTM to clean, summarize, and normalize LiP peptide and TrP global protein data. Imputes missing values, protein and LiP peptide level summarization from peptide level quantification. Applies global median normalization on peptide level data and normalizes between runs. Returns list of two summarized datasets.
dataSummarizationLiP( data, logTrans = 2, normalization = "equalizeMedians", normalization.LiP = "equalizeMedians", nameStandards = NULL, nameStandards.LiP = NULL, featureSubset = "all", featureSubset.LiP = "all", remove_uninformative_feature_outlier = FALSE, remove_uninformative_feature_outlier.LiP = FALSE, min_feature_count = 2, min_feature_count.LiP = 1, n_top_feature = 3, n_top_feature.LiP = 3, summaryMethod = "TMP", equalFeatureVar = TRUE, censoredInt = "NA", MBimpute = TRUE, MBimpute.LiP = FALSE, remove50missing = FALSE, fix_missing = NULL, maxQuantileforCensored = 0.999, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL, base = "MSstatsLiP_log_" )
dataSummarizationLiP( data, logTrans = 2, normalization = "equalizeMedians", normalization.LiP = "equalizeMedians", nameStandards = NULL, nameStandards.LiP = NULL, featureSubset = "all", featureSubset.LiP = "all", remove_uninformative_feature_outlier = FALSE, remove_uninformative_feature_outlier.LiP = FALSE, min_feature_count = 2, min_feature_count.LiP = 1, n_top_feature = 3, n_top_feature.LiP = 3, summaryMethod = "TMP", equalFeatureVar = TRUE, censoredInt = "NA", MBimpute = TRUE, MBimpute.LiP = FALSE, remove50missing = FALSE, fix_missing = NULL, maxQuantileforCensored = 0.999, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL, base = "MSstatsLiP_log_" )
data |
name of the list with LiP and TrP data.tables, which can be the output of the MSstatsPTM converter functions |
logTrans |
logarithm transformation with base 2(default) or 10 |
normalization |
normalization for the protein level dataset, to remove systematic bias between MS runs. There are three different normalizations supported. 'equalizeMedians'(default) represents constant normalization (equalizing the medians) based on reference signals is performed. 'quantile' represents quantile normalization based on reference signals is performed. 'globalStandards' represents normalization with global standards proteins. FALSE represents no normalization is performed |
normalization.LiP |
normalization for LiP level dataset. Default is 'equalizeMedians'. Can be adjusted to any of the options described above. |
nameStandards |
vector of global standard peptide names for protein dataset. only for normalization with global standard peptides. |
nameStandards.LiP |
Same as above for LiP dataset. |
featureSubset |
For protein dataset only. "all"(default) uses all features that the data set has. "top3" uses top 3 features which have highest average of log2(intensity) across runs. "topN" uses top N features which has highest average of log2(intensity) across runs. It needs the input for n_top_feature option. "highQuality" flags uninformative feature and outliers |
featureSubset.LiP |
For LiP dataset only. Options same as above. |
remove_uninformative_feature_outlier |
For protein dataset only. It only works after users used featureSubset="highQuality" in dataProcess. TRUE allows to remove 1) the features are flagged in the column, feature_quality="Uninformative" which are features with bad quality, 2) outliers that are flagged in the column, is_outlier=TRUE, for run-level summarization. FALSE (default) uses all features and intensities for run-level summarization. |
remove_uninformative_feature_outlier.LiP |
For LiP dataset only. Options same as above. |
min_feature_count |
optional. Only required if featureSubset = "highQuality". Defines a minimum number of informative features a protein needs to be considered in the feature selection algorithm. |
min_feature_count.LiP |
For LiP dataset only. Options the same as above. |
n_top_feature |
For protein dataset only. The number of top features for featureSubset='topN'. Default is 3, which means to use top 3 features. |
n_top_feature.LiP |
For LiP dataset only. Options same as above. |
summaryMethod |
"TMP"(default) means Tukey's median polish, which is robust estimation method. "linear" uses linear mixed model. |
equalFeatureVar |
only for summaryMethod="linear". default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features. |
censoredInt |
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing. |
MBimpute |
For protein dataset only. only for summaryMethod="TMP" and censoredInt='NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored. |
MBimpute.LiP |
For LiP dataset only. Options same as above. Default is FALSE. |
remove50missing |
only for summaryMethod="TMP". TRUE removes the runs which have more than 50% missing values. FALSE is default. |
fix_missing |
Default is Null. Optional, same as the 'fix_missing' parameter in MSstatsConvert::MSstatsBalancedDesign function |
maxQuantileforCensored |
Maximum quantile for deciding censored missing values. default is 0.999 |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing will be printed to the console. |
log_file_path |
character. Path to a file to which information about
data processing will be saved.
If not provided, such a file will be created automatically.
If |
base |
start of the file name. |
list of summarized LiP and TrP results. These results contain the reformatted input to the summarization function, as well as run-level summarization results.
# Use output of converter head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]]) # Run summarization MSstatsLiP_model <- dataSummarizationLiP(MSstatsLiP_data)
# Use output of converter head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]]) # Run summarization MSstatsLiP_model <- dataSummarizationLiP(MSstatsLiP_data)
Takes as as input both raw LiP and Trp outputs from DIA-NN
DIANNtoMSstatsLiPFormat( lip_data, trp_data = NULL, annotation = NULL, global_qvalue_cutoff = 0.01, qvalue_cutoff = 0.01, pg_qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Feature = FALSE, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL )
DIANNtoMSstatsLiPFormat( lip_data, trp_data = NULL, annotation = NULL, global_qvalue_cutoff = 0.01, qvalue_cutoff = 0.01, pg_qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Feature = FALSE, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL )
lip_data |
name of LiP Skyline output, which is long-format. |
trp_data |
name of TrP Skyline output, which is long-format. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Skyline, use annotation=NULL (default). It will use the annotation information from input. |
global_qvalue_cutoff |
The global qvalue cutoff. Default is 0.01. |
qvalue_cutoff |
Cutoff for DetectionQValue. Default is 0.01. |
pg_qvalue_cutoff |
local qvalue cutoff for protein groups Run should be the same as filename. Default is .01. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides |
TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be saved to a file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
a list
of two data.frames in MSstatsLiP
format
## Output will be in format head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]])
## Output will be in format head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]])
Takes summarized LiP peptide and TrP protein data from dataSummarizationLiP If global protein data is unavailable, LiP data only can be passed into the function. Including protein data allows for adjusting LiP Fold Change by the change in global protein abundance..
groupComparisonLiP( data, contrast.matrix = "pairwise", fasta.path = NULL, log_base = 2, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL, base = "MSstatsLiP_log_" )
groupComparisonLiP( data, contrast.matrix = "pairwise", fasta.path = NULL, log_base = 2, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL, base = "MSstatsLiP_log_" )
data |
list of summarized datasets. Can be output of MSstatsLiP
summarization function |
contrast.matrix |
comparison between conditions of interests. Default models full pairwise comparison between all conditions |
fasta.path |
a file path to a fasta file that includes the proteins listed in the data. Default is NULL. Include this parameter to determine trypticity of peptides in LiP models. |
log_base |
base of the logarithm used in dataProcess. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing will be printed to the console. |
log_file_path |
character. Path to a file to which information about
data processing will be saved.
If not provided, such a file will be created automatically.
If |
base |
start of the file name. |
list of modeling results. Includes LiP, PROTEIN, and ADJUSTED LiP data.tables with their corresponding model results.
## Use output of dataSummarizationLiP function fasta <- system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") # Test for pairwise comparison MSstatsLiP_model <- groupComparisonLiP(MSstatsLiP_Summarized, contrast.matrix = "pairwise", fasta.path = fasta) # Returns list of three models names(MSstatsLiP_model) head(MSstatsLiP_model$LiP.Model) head(MSstatsLiP_model$TrP.Model) head(MSstatsLiP_model$Adjusted.LiP.Model)
## Use output of dataSummarizationLiP function fasta <- system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") # Test for pairwise comparison MSstatsLiP_model <- groupComparisonLiP(MSstatsLiP_Summarized, contrast.matrix = "pairwise", fasta.path = fasta) # Returns list of three models names(MSstatsLiP_model) head(MSstatsLiP_model$LiP.Model) head(MSstatsLiP_model$TrP.Model) head(MSstatsLiP_model$Adjusted.LiP.Model)
To analyze the results of modeling changes in abundance of LiP peptides and overall protein, groupComparisonPlotsLiP takes as input the results of the groupComparisonLiP function. It asses the results of three models: unadjusted LiP, adjusted LiP, and overall protein. To asses the results of the model, the following visualizations can be created: (1) VolcanoPlot (specify "VolcanoPlot" in option type), to plot peptides or proteins and their significance for each model. (2) Heatmap (specify "Heatmap" in option type), to evaluate the fold change between conditions and peptides/proteins
groupComparisonPlotsLiP( data = data, type = type, sig = 0.05, FCcutoff = 1, logBase.pvalue = 10, ylimUp = FALSE, ylimDown = FALSE, xlimUp = FALSE, x.axis.size = 10, y.axis.size = 10, dot.size = 3, text.size = 4, text.angle = 0, legend.size = 13, ProteinName = TRUE, colorkey = TRUE, numProtein = 100, width = 10, height = 10, which.Comparison = "all", which.Peptide = "all", which.Protein = NULL, address = "" )
groupComparisonPlotsLiP( data = data, type = type, sig = 0.05, FCcutoff = 1, logBase.pvalue = 10, ylimUp = FALSE, ylimDown = FALSE, xlimUp = FALSE, x.axis.size = 10, y.axis.size = 10, dot.size = 3, text.size = 4, text.angle = 0, legend.size = 13, ProteinName = TRUE, colorkey = TRUE, numProtein = 100, width = 10, height = 10, which.Comparison = "all", which.Peptide = "all", which.Protein = NULL, address = "" )
data |
name of the list with models, which can be the output of the
MSstatsLiP |
type |
choice of visualization, one of VolcanoPlot or Heatmap |
sig |
FDR cutoff for the adjusted p-values in heatmap and volcano plot. level of significance for comparison plot. 100(1-sig)% confidence interval will be drawn. sig=0.05 is default. |
FCcutoff |
or volcano plot or heatmap, whether involve fold change cutoff or not. FALSE (default) means no fold change cutoff is applied for significance analysis. FCcutoff = specific value means specific fold change cutoff is applied. |
logBase.pvalue |
for volcano plot or heatmap, (-) logarithm transformation of adjusted p-value with base 2 or 10(default). |
ylimUp |
for all three plots, upper limit for y-axis. FALSE (default) for volcano plot/heatmap use maximum of -log2 (adjusted p-value) or -log10 (adjusted p-value). FALSE (default) for comparison plot uses maximum of log-fold change + CI. |
ylimDown |
for all three plots, lower limit for y-axis. FALSE (default) for volcano plot/heatmap use minimum of -log2 (adjusted p-value) or -log10 (adjusted p-value). FALSE (default) for comparison plot uses minimum of log-fold change - CI. |
xlimUp |
for Volcano plot, the limit for x-axis. FALSE (default) for use maximum for absolute value of log-fold change or 3 as default if maximum for absolute value of log-fold change is less than 3. |
x.axis.size |
size of axes labels, e.g. name of the comparisons in heatmap, and in comparison plot. Default is 10. |
y.axis.size |
size of axes labels, e.g. name of targeted proteins in heatmap. Default is 10. |
dot.size |
size of dots in volcano plot and comparison plot. Default is 3. |
text.size |
size of ProteinName label in the graph for Volcano Plot. Default is 4. |
text.angle |
angle of x-axis labels represented each comparison at the bottom of graph in comparison plot. Default is 0. |
legend.size |
size of legend for color at the bottom of volcano plot. Default is 7. |
ProteinName |
for volcano plot only, whether display protein/peptide names or not. TRUE (default) means protein names, which are significant, are displayed next to the points. FALSE means no protein names are displayed. |
colorkey |
TRUE(default) shows colorkey. |
numProtein |
The number of proteins which will be presented in each heatmap. Default is 50. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
which.Comparison |
list of comparisons to draw plots. List can be labels of comparisons or order numbers of comparisons from levels(data$Label) , such as levels(testResultMultiComparisons$ComparisonResult$Label). Default is "all", which generates all plots for each protein. |
which.Peptide |
Peptide list to draw comparison plots. List can be names of Peptides or order numbers of Peptides from levels. Default is "all", which generates all comparison plots for each protein. |
which.Protein |
Protein list to draw comparison plots. Will draw all peptide plots for listed Proteins. List must be names of Proteins. Default is "all", which generates all comparison plots for each protein. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "VolcanoPlot.pdf" or "Heatmap.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window |
plot or pdf
## Use output of the groupComparisonLiP function # Volcano Plot groupComparisonPlotsLiP(MSstatsLiP_model, type = "VOLCANOPLOT") # Heatmap Plot groupComparisonPlotsLiP(MSstatsLiP_model, type = "HEATMAP")
## Use output of the groupComparisonLiP function # Volcano Plot groupComparisonPlotsLiP(MSstatsLiP_model, type = "VOLCANOPLOT") # Heatmap Plot groupComparisonPlotsLiP(MSstatsLiP_model, type = "HEATMAP")
Example of input LiP dataset.
LiPRawData
LiPRawData
A data.table consisting of 546 rows and 29 columns. Raw LiP data for use in testing and examples.
Input to MSstatsLiP converter SpectronauttoMSstatsLiPFormat. Contains the following columns:
R.Condition : Label of conditions (EG Disease/Control)
R.FileName : Name of spectral processing run
R.Replicate : Name of biological replicate
PG.ProteinAccessions : Protein name
PG.ProteinGroups : Protein name, can be multiple
PG.Quantity : Protein Quantity
PEP.GroupingKey : Peptide grouping
PEP.StrippedSequence : Peptide sequence
PEP.Quantity : Peptide quantity
EG.iRTPredicted : Predicted value
EG.Library : Name of library
EG.ModifiedSequence : Peptide sequence including any post-translational modifications
EG.PrecursorId : Peptide sequence wiht modifications including charge
EG.Qvalue : Qvalue
FG.Charge : Identified Ion charge
FG.Id : Peptide sequence with charge
FG.PrecMz : Prec Mz reading
FG.Quantity : Initial quantity reading
F.Charge : F.Charge
F.FrgIon : Fragment ion
F.FrgLossType : Label for loss type
F.FrgMz : Mz reading
F.FrgNum : numeric Frg
F.FrgType : character label for Frg
F.ExcludedFromQuantification : True/False boolean for if to exclude
F.NormalizedPeakArea : Normalized peak intensity
F.NormalizedPeakHeight : Normalized peak height
F.PeakArea : Unnormalized peak area
F.PeakHeight : Unnormalized peak height
head(LiPRawData)
head(LiPRawData)
locateMod
locates modified sites with a peptide.
locateMod(peptide, aaStart, residueSymbol)
locateMod(peptide, aaStart, residueSymbol)
peptide |
A string. Peptide sequence. |
aaStart |
An integer. Starting index of the peptide. |
residueSymbol |
A string. Modification residue and denoted symbol. |
A string.
locateMod("P*EP*TIDE", 3, "\\*")
locateMod("P*EP*TIDE", 3, "\\*")
PTMlocate
annotates modified sites with associated peptides.
locatePTM(peptide, uniprot, fasta, modResidue, modSymbol, rmConfound = FALSE)
locatePTM(peptide, uniprot, fasta, modResidue, modSymbol, rmConfound = FALSE)
peptide |
A string vector of peptide sequences. The peptide sequence does not include its preceding and following AAs. |
uniprot |
A string vector of Uniprot identifiers of the peptides' originating proteins. UniProtKB entry isoform sequence is used. |
fasta |
A tibble with FASTA information. Output of |
modResidue |
A string. Modifiable amino acid residues. |
modSymbol |
A string. Symbol of a modified site. |
rmConfound |
A logical. |
A data frame with three columns: uniprot_iso
, peptide
,
site
.
fasta <- tidyFasta(system.file("extdata", "O13297.fasta", package="MSstatsLiP")) locatePTM("DRVSYIHNDSC*TR", "O13297", fasta, "C", "\\*")
fasta <- tidyFasta(system.file("extdata", "O13297.fasta", package="MSstatsLiP")) locatePTM("DRVSYIHNDSC*TR", "O13297", fasta, "C", "\\*")
A set of tools for detecting differentially abundant LiP peptides in shotgun mass spectrometry-based proteomic experiments. The package includes tools to convert raw data from different spectral processing tools, summarize feature intensities, and fit a linear mixed effects model. If overall protein abundance changes are included, the package will also adjust the LiP peptide fold change for changes in overall protein abundace. Additionally the package includes functionality to plot a variety of data visualizations.
SpectronauttoMSstatsLiPFormat
: Generates MSstatsLiP
required input format for Spectronaut outputs.
trypticHistogramLiP
: Histogram of Half vs Fully
tryptic peptides. Calculates proteotypicity, and then uses calcualtions in
histogram.
correlationPlotLiP
: Plot run correlation for provided
LiP and TrP experiment.
dataSummarizationLiP
: Summarizes PSM level quantification to
peptide (LiP) and protein level quantification.
dataProcessPlotsLiP
: Visualization for explanatory
data analysis. Specifically gives ability to plot Profile and Quality
Control plots.
PCAPlotLiP
:Visualize PCA analysis for LiP and TrP
datasets. Specifically gives ability to plot explanined varaince per
component, Protein/Peptide PCA, and Condition PCA.
groupComparisonLiP
: Tests for significant changes in
LiP and protein abundance across conditions. Adjusts LiP fold change for
changes in protein abundance.
groupComparisonPlotsLiP
: Visualization for model-based
analysis and summarization.
PCAPlotLiP
: Runs PCA on the summarized data. Can
visualize the PCA analysis in three different plots.
StructuralBarcodePlotLiP
: Shows protein coverage of
LiP modified peptides. Shows significant, insignificant, and missing
coverage.
Example output of MSstatsLiP converter functions.
MSstatsLiP_data
MSstatsLiP_data
A data.table consisting of 546 rows and 29 columns. Raw TrP data for use in testing and examples.
Example output of MSstatsLiP converter functions. (Eg. SpectronauttoMSstatsLiPFormat). A list containing two data.tables named LiP and TrP corresponding to the processed LiP and TrP data now in MSstatsLiP format. The data.tables contain the following columns:
ProteinName : Character column of protein names
PeptideSequence : Character column of peptide sequence name
PrecursorCharge : Numeric charge feature
FragmentIon : Character fragment ion feature
ProductCharge : Numeric charge of product
IsotopeLabelType : Character label type
Condition : Character label for condition (Eg. Disease/Control)
BioReplicate : Name of biological replicate
Run : Name of run
Fraction : Fraction number if fractionation is present
Intensity : Unnormalized feature intensity
FULL_PEPTIDE(LiP data only) : Combined protein name and peptide sequence. Used for LiP data only because LiP is summarized to peptide level (not protein)
head(MSstatsLiP_data$LiP) head(MSstatsLiP_data$TrP)
head(MSstatsLiP_data$LiP) head(MSstatsLiP_data$TrP)
Example output of groupComparisonLiP converter functions.
MSstatsLiP_model
MSstatsLiP_model
A data.table consisting of 546 rows and 29 columns. Raw TrP data for use in testing and examples.
Example output of MSstatsLiP groupComparisonLiP function. A list containing three data.tables corresponding to unadjusted LiP, TrP, and adjusted LiP models. The data.tables contain the following columns:
ProteinName : Character column of protein names
PeptideSequence : Character column of peptide sequence name
Label : Condition comparison (Eg. Disease vs Control)
log2FC : Fold Change output results of model
SE : Standard error output of model
Tvalue : Tvalue output of model
DF : Degrees of Freedom output of model
pvalue : Pvalue result of model (unadjusted)
adj.pvalue : Adjusted Pvalue, generally BH adjustement is used
issue : Issue in model if any is reported
MissingPercentage : Percent of missing values in specific model
ImputationPercentage : Percent of values that needed to be imputed
fully_TRI: Boolean indicating if Peptide is fully tryptic
NSEMI_TRI: Boolean indicating if Peptide is NSEMI tryptic
CSEMI_TRI: Boolean indicating if Peptide is CSEMI tryptic
CTERMINUS: Boolean indicating if Peptide is CTERMINUS tryptic
NTERMINUS: Boolean indicating if Peptide is NTERMINUS tryptic
StartPos: Start position of peptide sequence
EndPos: End position of peptide sequence
FULL_PEPTIDE(LiP data only) : Combined protein name and peptide sequence. Used for LiP data only because LiP is summarized to peptide level (not protein)
head(MSstatsLiP_model$LiP.Model) head(MSstatsLiP_model$TrP.Model) head(MSstatsLiP_model$Adjusted.LiP.Model)
head(MSstatsLiP_model$LiP.Model) head(MSstatsLiP_model$TrP.Model) head(MSstatsLiP_model$Adjusted.LiP.Model)
Example output of MSstatsLiP summarization function dataSummarizationLiP.
MSstatsLiP_Summarized
MSstatsLiP_Summarized
A list containing two lists of summarization information for LiP and TrP data.
Example output of MSstatsLiP summarization function dataSummarizationLiP. A list containing two lists named LiP and TrP containing summarization information for LiP and TrP data. Each of LiP and TrP contain data named: FeatureLevelData, ProteinLevelData, SummaryMethod, ModelQC, PredictBySurvival. The two main data.tables (FeatureLevelData and ProteinLevelData are shown below):
FeatureLevelData :
PROTEIN : Protein ID with modification site mapped in. Ex. Protein_1002_S836
FULL_PEPTIDE (LiP Only) : Combined name of protein and peptide sequence
PEPTIDE : Full peptide with charge
TRANSITION: Charge
FEATURE : Combination of Protien, Peptide, and Transition Columns
LABEL :
GROUP : Condition (ex. Healthy, Cancer, Time0)
RUN : Unique ID for technical replicate of one TMT mixture.
SUBJECT : Unique ID for biological subject.
FRACTION : Unique Fraction ID
originalRUN : Run name
censored :
INTENSITY : Original intensity value
ABUNDANCE : Log adjusted intensity value
newABUNDANCE : Normalized abundance column
ProteinLevelData :
RUN : MS run ID
FULL_PEPTIDE (LiP Only) : Combined name of protein and peptide sequence
Protein : Protein ID with modification site mapped in. Ex. Protein_1002_S836
LogIntensities: Protein-level summarized abundance
originalRUN : Labeling information (126, ... 131)
GROUP : Condition (ex. Healthy, Cancer, Time0)
SUBJECT : Unique ID for biological subject.
TotalGroupMeasurements : Unique ID for technical replicate of one TMT mixture.
NumMeasuredFeature : Unique ID for TMT mixture.
MissingPercentage : Unique ID for TMT mixture.
more50missing : Unique ID for TMT mixture.
NumImputedFeature : Unique ID for TMT mixture.
head(MSstatsLiP_Summarized$LiP$FeatureLevelData) head(MSstatsLiP_Summarized$LiP$ProteinLevelData) head(MSstatsLiP_Summarized$TrP$FeatureLevelData) head(MSstatsLiP_Summarized$TrP$ProteinLevelData)
head(MSstatsLiP_Summarized$LiP$FeatureLevelData) head(MSstatsLiP_Summarized$LiP$ProteinLevelData) head(MSstatsLiP_Summarized$TrP$FeatureLevelData) head(MSstatsLiP_Summarized$TrP$ProteinLevelData)
Takes as input LiP and TrP data from summarization function dataSummarizationLiP. Runs PCA on the summarized data. Can visualize the PCA analysis in three different plots: (1) BarPlot (specify "bar.plot=TRUE" in option bar.plot), to plot a bar plot showing the explained variance per PCA component (2) Peptide/Protein PCA (specify "protein.pca = TRUE" in option protein.pca), to create a dot plot with PCA component 1 and 2 on the axis, for different peptides and proteins. (3) Comparison PCA (specify "comparison.pca = TRUE" in option comparison.pca) , to create a arrow plot with PCA component 1 and 2 on the axis, for different comparisons
PCAPlotLiP( data, center.pca = TRUE, scale.pca = TRUE, n.components = 10, bar.plot = TRUE, protein.pca = TRUE, comparison.pca = FALSE, which.pep = "all", which.comparison = "all", width = 10, height = 10, address = "" )
PCAPlotLiP( data, center.pca = TRUE, scale.pca = TRUE, n.components = 10, bar.plot = TRUE, protein.pca = TRUE, comparison.pca = FALSE, which.pep = "all", which.comparison = "all", width = 10, height = 10, address = "" )
data |
data name of the list with LiP and (optionally) Protein data, which
can be the output of the MSstatsLiP.
|
center.pca |
a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale |
scale.pca |
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale. |
n.components |
an integer of PCA components to be returned. Default is 10. |
bar.plot |
a logical value indicating if to visualize PCA bar plot |
protein.pca |
a logical value indicating if to visualize PCA peptide plot |
comparison.pca |
a logical value indicating if to visualize PCA comparison plot |
which.pep |
a list of peptides to be visualized. Default is "all". If too many peptides are plotted the names can overlap. |
which.comparison |
a list of comparisons to be visualized. Default is "all". |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "VolcanoPlot.pdf" or "Heatmap.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window |
plot or pdf
# Use output of dataSummarizationLiP function # BarPlot PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = TRUE, protein.pca = FALSE) # Protein/Peptide PCA Plot PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = TRUE) # Condition PCA Plot PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = FALSE, comparison.pca = TRUE)
# Use output of dataSummarizationLiP function # BarPlot PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = TRUE, protein.pca = FALSE) # Protein/Peptide PCA Plot PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = TRUE) # Condition PCA Plot PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = FALSE, comparison.pca = TRUE)
A different example of input LiP dataset.
raw_lip
raw_lip
A data.table consisting of 6,944 rows and 29 columns. Raw LiP data for use in testing and examples.
Input to MSstatsLiP converter SpectronauttoMSstatsLiPFormat. Contains the following columns:
R.Condition : Label of conditions (EG Disease/Control)
R.FileName : Name of spectral processing run
R.Replicate : Name of biological replicate
PG.ProteinAccessions : Protein name
PG.ProteinGroups : Protein name, can be multiple
PG.Quantity : Protein Quantity
PEP.GroupingKey : Peptide grouping
PEP.StrippedSequence : Peptide sequence
PEP.Quantity : Peptide quantity
EG.iRTPredicted : Predicted value
EG.Library : Name of library
EG.ModifiedSequence : Peptide sequence including any post-translational modifications
EG.PrecursorId : Peptide sequence wiht modifications including charge
EG.Qvalue : Qvalue
FG.Charge : Identified Ion charge
FG.Id : Peptide sequence with charge
FG.PrecMz : Prec Mz reading
FG.Quantity : Initial quantity reading
F.Charge : F.Charge
F.FrgIon : Fragment ion
F.FrgLossType : Label for loss type
F.FrgMz : Mz reading
F.FrgNum : numeric Frg
F.FrgType : character label for Frg
F.ExcludedFromQuantification : True/False boolean for if to exclude
F.NormalizedPeakArea : Normalized peak intensity
F.NormalizedPeakHeight : Normalized peak height
F.PeakArea : Unnormalized peak area
F.PeakHeight : Unnormalized peak height
head(raw_lip)
head(raw_lip)
Example of input TrP dataset.
raw_prot
raw_prot
A data.table consisting of 9,120 rows and 29 columns. Raw TrP data for use in testing and examples.
Input to MSstatsLiP converter SpectronauttoMSstatsLiPFormat. Contains the following columns:
R.Condition : Label of conditions (EG Disease/Control)
R.FileName : Name of spectral processing run
R.Replicate : Name of biological replicate
PG.ProteinAccessions : Protein name
PG.ProteinGroups : Protein name, can be multiple
PG.Quantity : Protein Quantity
PEP.GroupingKey : Peptide grouping
PEP.StrippedSequence : Peptide sequence
PEP.Quantity : Peptide quantity
EG.iRTPredicted : Predicted value
EG.Library : Name of library
EG.ModifiedSequence : Peptide sequence including any post-translational modifications
EG.PrecursorId : Peptide sequence wiht modifications including charge
EG.Qvalue : Qvalue
FG.Charge : Identified Ion charge
FG.Id : Peptide sequence with charge
FG.PrecMz : Prec Mz reading
FG.Quantity : Initial quantity reading
F.Charge : F.Charge
F.FrgIon : Fragment ion
F.FrgLossType : Label for loss type
F.FrgMz : Mz reading
F.FrgNum : numeric Frg
F.FrgType : character label for Frg
F.ExcludedFromQuantification : True/False boolean for if to exclude
F.NormalizedPeakArea : Normalized peak intensity
F.NormalizedPeakHeight : Normalized peak height
F.PeakArea : Unnormalized peak area
F.PeakHeight : Unnormalized peak height
head(raw_prot)
head(raw_prot)
Proteolytic Resistance Barcode plot. Shows accessibility score of different fully tryptic peptides in a protein.
ResistanceBarcodePlotLiP( data, fasta_file, which.prot = "all", which.condition = "all", differential_analysis = FALSE, which.comp = "all", adj.pvalue.cutoff = 0.05, FC.cutoff = 0, width = 12, height = 4, address = "" )
ResistanceBarcodePlotLiP( data, fasta_file, which.prot = "all", which.condition = "all", differential_analysis = FALSE, which.comp = "all", adj.pvalue.cutoff = 0.05, FC.cutoff = 0, width = 12, height = 4, address = "" )
data |
list of data.tables containing LiP and TrP data in MSstatsLiP
format. Should be output of summarization function as
|
fasta_file |
A string of path to a FASTA file |
which.prot |
a list of peptides to be visualized. Default is "all" which will plot a separate barcode plot for each protein. |
which.condition |
a list of conditions to be visualized. Default is "all" which will plot all conditions for a single protein in the same barcode plot. |
differential_analysis |
a boolean indicating if a barcode plot showing
the differential analysis should be plotted. If this is selected you must
have performed differential analysis on the proteoltic data in the
|
which.comp |
a list of comparisons to be visualized, if differential analysis is passed to plot_differential variable. Default is "all" which will plot a separate barcode plot for each comparison and protein. |
adj.pvalue.cutoff |
Default is .05. Alpha value for testing significance of model output. |
FC.cutoff |
Default is 0. Minimum absolute FC before a comparison will be considered significant. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "VolcanoPlot.pdf" or "Heatmap.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window |
plot or pdf
# Specify Fasta path fasta_path = system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") # Use model data to create Barcode Plot #ResistanceBarcodePlotLiP(MSstatsLiP_model, fasta_path)
# Specify Fasta path fasta_path = system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") # Use model data to create Barcode Plot #ResistanceBarcodePlotLiP(MSstatsLiP_model, fasta_path)
Example of input data from Skylinet.
SkylineTest
SkylineTest
A data.table consisting of 2115 rows and 13 columns. Raw data for use in testing and examples.
Input to MSstatsLiP converter SkylinetoMSstatsLiPFormat Contains the following columns:
Protein.Name : Name of Proteins identified by Skyline
Peptide.Modified.Sequence : Peptide sequence
Precursor.Charge : Charge of ion
Fragment.Ion : Fragment ion
Product.Charge : Identified Ion charge
Isotope.Label.Type : Label Type
Condition : Name of condition
BioReplicate : name of bioreplicate annotated to data
File.Name : Name of spectral processing run
Area : Abudance area
Standard.Type : Type name for row
Truncated : Boolean if row was truncated
head(SkylineTest)
head(SkylineTest)
Takes as as input both raw LiP and Trp outputs from Skyline.
SkylinetoMSstatsLiPFormat( LiP.data, TrP.data = NULL, annotation = NULL, msstats_format = FALSE, removeiRT = TRUE, filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Feature = FALSE, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL )
SkylinetoMSstatsLiPFormat( LiP.data, TrP.data = NULL, annotation = NULL, msstats_format = FALSE, removeiRT = TRUE, filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Feature = FALSE, use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL )
LiP.data |
name of LiP Skyline output, which is long-format. |
TrP.data |
name of TrP Skyline output, which is long-format. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Skyline, use annotation=NULL (default). It will use the annotation information from input. |
msstats_format |
logical indicating how the data was output from Skyline. FALSE (default) indicates that standard Skyline output was selected. TRUE should be selected if the Skyline data was output using the MSstats format option in Skyline. |
removeiRT |
TRUE (default) will remove the proteins or peptides which are labeld 'iRT' in 'StandardType' column. FALSE will keep them. |
filter_with_Qvalue |
TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in DetectionQValue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose. |
qvalue_cutoff |
Cutoff for DetectionQValue. default is 0.01. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides |
TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be saved to a file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
a list
of two data.frames in MSstatsLiP
format
## Output will be in format head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]])
## Output will be in format head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]])
Takes as as input both raw LiP and Trp outputs from Spectronautt.
SpectronauttoMSstatsLiPFormat( LiP.data, fasta, Trp.data = NULL, annotation = NULL, intensity = "PeakArea", filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, removeNonUniqueProteins = TRUE, removeModifications = TRUE, removeiRT = TRUE, summaryforMultipleRows = max, which.Conditions = "all", use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL, base = "MSstatsLiP_log_" )
SpectronauttoMSstatsLiPFormat( LiP.data, fasta, Trp.data = NULL, annotation = NULL, intensity = "PeakArea", filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, removeNonUniqueProteins = TRUE, removeModifications = TRUE, removeiRT = TRUE, summaryforMultipleRows = max, which.Conditions = "all", use_log_file = FALSE, append = FALSE, verbose = TRUE, log_file_path = NULL, base = "MSstatsLiP_log_" )
LiP.data |
name of LiP Spectronaut output, which is long-format. |
fasta |
A string of path to a FASTA file, used to match LiP peptides. |
Trp.data |
name of TrP Spectronaut output, which is long-format. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Spectronaut, use annotation=NULL (default). It will use the annotation information from input. |
intensity |
'PeakArea'(default) uses not normalized peak area. 'NormalizedPeakArea' uses peak area normalized by Spectronaut |
filter_with_Qvalue |
TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in EG.Qvalue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose. |
qvalue_cutoff |
Cutoff for EG.Qvalue. default is 0.01. |
useUniquePeptide |
TRUE(default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
removeNonUniqueProteins |
TRUE will remove proteins that were not uniquely identified. IE if the protein column contains multiple proteins seperated by ";". TRUE is default |
removeModifications |
TRUE will remove peptide that contain a modification. Modification must be indicated by "[". TRUE is default |
removeiRT |
TRUE will remove proteins that contain iRT. True is default |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
which.Conditions |
list of conditions to format into MSstatsLiP format. If "all" all conditions will be used. Default is "all". |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing will be printed to the console. |
log_file_path |
character. Path to a file to which information about
data processing will be saved.
If not provided, such a file will be created automatically.
If |
base |
start of the file name. |
a list
of two data.frames
in MSstatsLiP format
# Output datasets of Spectronaut head(LiPRawData) head(TrPRawData) fasta_path <- system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") MSstatsLiP_data <- SpectronauttoMSstatsLiPFormat(LiPRawData, fasta_path, TrPRawData) head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]])
# Output datasets of Spectronaut head(LiPRawData) head(TrPRawData) fasta_path <- system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") MSstatsLiP_data <- SpectronauttoMSstatsLiPFormat(LiPRawData, fasta_path, TrPRawData) head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]])
Barcode plot. Shows protein coverge of LiP modified peptides.
StructuralBarcodePlotLiP( data, fasta, model_type = "Adjusted", which.prot = "all", which.comp = "all", adj.pvalue.cutoff = 0.05, FC.cutoff = 0, FT.only = FALSE, width = 12, height = 4, address = "" )
StructuralBarcodePlotLiP( data, fasta, model_type = "Adjusted", which.prot = "all", which.comp = "all", adj.pvalue.cutoff = 0.05, FC.cutoff = 0, FT.only = FALSE, width = 12, height = 4, address = "" )
data |
list of data.tables containing LiP and TrP data in MSstatsLiP
format. Should be output of modeling function such as
|
fasta |
A string of path to a FASTA file |
model_type |
A string of either "Adjusted" or "Unadjusted", indicating whether to plot the adjusted or unadjusted models. Default is "Adjusted". |
which.prot |
a list of peptides to be visualized. Default is "all" which will plot a separate barcode plot for each protein. |
which.comp |
a list of comparisons to be visualized. Default is "all" which will plot a separate barcode plot for each comparison and protein. |
adj.pvalue.cutoff |
Default is .05. Alpha value for testing significance of model output. |
FC.cutoff |
Default is 0. Minimum absolute FC before a comparison will be considered significant. |
FT.only |
FALSE plots all FT and HT peptides, TRUE plots FT peptides only. Default is FALSE. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "VolcanoPlot.pdf" or "Heatmap.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window |
plot or pdf
# Specify Fasta path fasta_path <- system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") # Use model data to create Barcode Plot StructuralBarcodePlotLiP(MSstatsLiP_model, fasta_path, model_type = "Adjusted", address=FALSE)
# Specify Fasta path fasta_path <- system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP") # Use model data to create Barcode Plot StructuralBarcodePlotLiP(MSstatsLiP_model, fasta_path, model_type = "Adjusted", address=FALSE)
reads and tidys FASTA file.
tidyFasta(path)
tidyFasta(path)
path |
a string of path pointing towards a fasta file |
a tibble
of formatted FASTA information
tidyFasta(system.file("extdata", "O13297.fasta", package="MSstatsLiP"))
tidyFasta(system.file("extdata", "O13297.fasta", package="MSstatsLiP"))
Example of input TrP dataset.
TrPRawData
TrPRawData
A data.table consisting of 4692 rows and 29 columns. Raw TrP data for use in testing and examples.
Input to MSstatsLiP converter SpectronauttoMSstatsLiPFormat. Contains the following columns:
R.Condition : Label of conditions (EG Disease/Control)
R.FileName : Name of spectral processing run
R.Replicate : Name of biological replicate
PG.ProteinAccessions : Protein name
PG.ProteinGroups : Protein name, can be multiple
PG.Quantity : Protein Quantity
PEP.GroupingKey : Peptide grouping
PEP.StrippedSequence : Peptide sequence
PEP.Quantity : Peptide quantity
EG.iRTPredicted : Predicted value
EG.Library : Name of library
EG.ModifiedSequence : Peptide sequence including any post-translational modifications
EG.PrecursorId : Peptide sequence wiht modifications including charge
EG.Qvalue : Qvalue
FG.Charge : Identified Ion charge
FG.Id : Peptide sequence with charge
FG.PrecMz : Prec Mz reading
FG.Quantity : Initial quantity reading
F.Charge : F.Charge
F.FrgIon : Fragment ion
F.FrgLossType : Label for loss type
F.FrgMz : Mz reading
F.FrgNum : numeric Frg
F.FrgType : character label for Frg
F.ExcludedFromQuantification : True/False boolean for if to exclude
F.NormalizedPeakArea : Normalized peak intensity
F.NormalizedPeakHeight : Normalized peak height
F.PeakArea : Unnormalized peak area
F.PeakHeight : Unnormalized peak height
head(TrPRawData)
head(TrPRawData)
Histogram of Half vs Fully tryptic peptides. Calculates proteotypicity, and then uses calcualtions in histogram.
trypticHistogramLiP( data, fasta, x.axis.size = 10, y.axis.size = 10, legend.size = 10, width = 12, height = 4, color_scale = "bright", address = "" )
trypticHistogramLiP( data, fasta, x.axis.size = 10, y.axis.size = 10, legend.size = 10, width = 12, height = 4, color_scale = "bright", address = "" )
data |
output of MSstatsLiP converter function. Must include at least ProteinName, PeptideSequence, BioReplicate, and Condition columns |
fasta |
A string of path to a FASTA file, used to match LiP peptides. |
x.axis.size |
size of x-axis labeling for plot. Default is 10. |
y.axis.size |
size of y-axis labeling for plot. Default is 10. |
legend.size |
size of feature legend for half vs fully tryptic peptides below graph. Default is 7. |
width |
Width of final pdf to be plotted |
height |
Height of final pdf to be plotted |
color_scale |
colors of bar chart. Must be one of "bright" or "grey". Default is "bright". |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "TyrpticPlot.pdf". If address=FALSE, plot will be not saved as pdf file but shown in window.. |
plot or pdf
# Use output of summarization function trypticHistogramLiP(MSstatsLiP_Summarized, system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP"), color_scale = "bright", address = FALSE)
# Use output of summarization function trypticHistogramLiP(MSstatsLiP_Summarized, system.file("extdata", "ExampleFastaFile.fasta", package="MSstatsLiP"), color_scale = "bright", address = FALSE)