Title: | Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments |
---|---|
Description: | A set of tools for statistical relative protein significance analysis in DDA, SRM and DIA experiments. |
Authors: | Meena Choi [aut, cre], Mateusz Staniak [aut], Tsung-Heng Tsai [aut], Ting Huang [aut], Olga Vitek [aut] |
Maintainer: | Meena Choi <[email protected]> |
License: | Artistic-2.0 |
Version: | 4.15.0 |
Built: | 2024-10-30 08:22:29 UTC |
Source: | https://github.com/bioc/MSstats |
Check if data represents repeated measurements design
checkRepeatedDesign(summarization_output)
checkRepeatedDesign(summarization_output)
summarization_output |
output of the dataProcess function |
This extracts information required by the group comparison workflow
logical, TRUE if data represent repeated measurements design
QuantData1 <- dataProcess(SRMRawData, use_log_file = FALSE) checkRepeatedDesign(QuantData1)
QuantData1 <- dataProcess(SRMRawData, use_log_file = FALSE) checkRepeatedDesign(QuantData1)
Process MS data: clean, normalize and summarize before differential analysis
dataProcess( raw, logTrans = 2, normalization = "equalizeMedians", nameStandards = NULL, featureSubset = "all", remove_uninformative_feature_outlier = FALSE, min_feature_count = 2, n_top_feature = 3, summaryMethod = "TMP", equalFeatureVar = TRUE, censoredInt = "NA", MBimpute = TRUE, remove50missing = FALSE, fix_missing = NULL, maxQuantileforCensored = 0.999, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, numberOfCores = 1 )
dataProcess( raw, logTrans = 2, normalization = "equalizeMedians", nameStandards = NULL, featureSubset = "all", remove_uninformative_feature_outlier = FALSE, min_feature_count = 2, n_top_feature = 3, summaryMethod = "TMP", equalFeatureVar = TRUE, censoredInt = "NA", MBimpute = TRUE, remove50missing = FALSE, fix_missing = NULL, maxQuantileforCensored = 0.999, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, numberOfCores = 1 )
raw |
name of the raw (input) data set. |
logTrans |
base of logarithm transformation: 2 (default) or 10. |
normalization |
normalization to remove systematic bias between MS runs. There are three different normalizations supported: 'equalizeMedians' (default) represents constant normalization (equalizing the medians) based on reference signals is performed. 'quantile' represents quantile normalization based on reference signals 'globalStandards' represents normalization with global standards proteins. If FALSE, no normalization is performed. |
nameStandards |
optional vector of global standard peptide names. Required only for normalization with global standard peptides. |
featureSubset |
"all" (default) uses all features that the data set has. "top3" uses top 3 features which have highest average of log-intensity across runs. "topN" uses top N features which has highest average of log-intensity across runs. It needs the input for n_top_feature option. "highQuality" flags uninformative feature and outliers. |
remove_uninformative_feature_outlier |
optional. Only required if featureSubset = "highQuality". TRUE allows to remove 1) noisy features (flagged in the column feature_quality with "Uninformative"), 2) outliers (flagged in the column, is_outlier with TRUE, before run-level summarization. FALSE (default) uses all features and intensities for run-level summarization. |
min_feature_count |
optional. Only required if featureSubset = "highQuality". Defines a minimum number of informative features a protein needs to be considered in the feature selection algorithm. |
n_top_feature |
optional. Only required if featureSubset = 'topN'. It that case, it specifies number of top features that will be used. Default is 3, which means to use top 3 features. |
summaryMethod |
"TMP" (default) means Tukey's median polish, which is robust estimation method. "linear" uses linear mixed model. |
equalFeatureVar |
only for summaryMethod = "linear". default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features. |
censoredInt |
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing. |
MBimpute |
only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored. |
remove50missing |
only for summaryMethod = "TMP". TRUE removes the proteins where every run has at least 50% missing values for each peptide. FALSE is default. |
fix_missing |
Optional, same as the 'fix_missing' parameter in MSstatsConvert::MSstatsBalancedDesign function |
maxQuantileforCensored |
Maximum quantile for deciding censored missing values, default is 0.999 |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
numberOfCores |
Number of cores for parallel processing. When > 1, a logfile named 'MSstats_dataProcess_log_progress.log' is created to track progress. Only works for Linux & Mac OS. Default is 1. |
# Consider a raw data (i.e. SRMRawData) for a label-based SRM experiment from a yeast study # with ten time points (T1-T10) of interests and three biological replicates. # It is a time course experiment. The goal is to detect protein abundance changes # across time points. head(SRMRawData) # Log2 transformation and normalization are applied (default) QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) # Log10 transformation and normalization are applied QuantData1<-dataProcess(SRMRawData, logTrans=10, use_log_file = FALSE) head(QuantData1$FeatureLevelData) # Log2 transformation and no normalization are applied QuantData2<-dataProcess(SRMRawData,normalization=FALSE, use_log_file = FALSE) head(QuantData2$FeatureLevelData)
# Consider a raw data (i.e. SRMRawData) for a label-based SRM experiment from a yeast study # with ten time points (T1-T10) of interests and three biological replicates. # It is a time course experiment. The goal is to detect protein abundance changes # across time points. head(SRMRawData) # Log2 transformation and normalization are applied (default) QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) # Log10 transformation and normalization are applied QuantData1<-dataProcess(SRMRawData, logTrans=10, use_log_file = FALSE) head(QuantData1$FeatureLevelData) # Log2 transformation and no normalization are applied QuantData2<-dataProcess(SRMRawData,normalization=FALSE, use_log_file = FALSE) head(QuantData2$FeatureLevelData)
To illustrate the quantitative data after data-preprocessing and
quality control of MS runs, dataProcessPlots takes the quantitative data from
function (dataProcess
) as input and automatically generate
three types of figures in pdf files as output :
(1) profile plot (specify "ProfilePlot" in option type),
to identify the potential sources of variation for each protein;
(2) quality control plot (specify "QCPlot" in option type),
to evaluate the systematic bias between MS runs;
(3) mean plot for conditions (specify "ConditionPlot" in option type),
to illustrate mean and variability of each condition per protein.
dataProcessPlots( data, type, featureName = "Transition", ylimUp = FALSE, ylimDown = FALSE, scale = FALSE, interval = "CI", x.axis.size = 10, y.axis.size = 10, text.size = 4, text.angle = 0, legend.size = 7, dot.size.profile = 2, dot.size.condition = 3, width = 800, height = 600, which.Protein = "all", originalPlot = TRUE, summaryPlot = TRUE, save_condition_plot_result = FALSE, remove_uninformative_feature_outlier = FALSE, address = "", isPlotly = FALSE )
dataProcessPlots( data, type, featureName = "Transition", ylimUp = FALSE, ylimDown = FALSE, scale = FALSE, interval = "CI", x.axis.size = 10, y.axis.size = 10, text.size = 4, text.angle = 0, legend.size = 7, dot.size.profile = 2, dot.size.condition = 3, width = 800, height = 600, which.Protein = "all", originalPlot = TRUE, summaryPlot = TRUE, save_condition_plot_result = FALSE, remove_uninformative_feature_outlier = FALSE, address = "", isPlotly = FALSE )
data |
name of the (output of dataProcess function) data set. |
type |
choice of visualization. "ProfilePlot" represents profile plot of log intensities across MS runs. "QCPlot" represents quality control plot of log intensities across MS runs. "ConditionPlot" represents mean plot of log ratios (Light/Heavy) across conditions. |
featureName |
for "ProfilePlot" only, "Transition" (default) means printing feature legend in transition-level; "Peptide" means printing feature legend in peptide-level; "NA" means no feature legend printing. |
ylimUp |
upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot use the upper limit as rounded off maximum of log2(intensities) after normalization + 3. FALSE(Default) for Condition Plot is maximum of log ratio + SD or CI. |
ylimDown |
lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot is 0. FALSE(Default) for Condition Plot is minumum of log ratio - SD or CI. |
scale |
for "ConditionPlot" only, FALSE(default) means each conditional level is not scaled at x-axis according to its actual value (equal space at x-axis). TRUE means each conditional level is scaled at x-axis according to its actual value (unequal space at x-axis). |
interval |
for "ConditionPlot" only, "CI"(default) uses confidence interval with 0.95 significant level for the width of error bar. "SD" uses standard deviation for the width of error bar. |
x.axis.size |
size of x-axis labeling for "Run" in Profile Plot and QC Plot, and "Condition" in Condition Plot. Default is 10. |
y.axis.size |
size of y-axis labels. Default is 10. |
text.size |
size of labels represented each condition at the top of graph in Profile Plot and QC plot. Default is 4. |
text.angle |
angle of labels represented each condition at the top of graph in Profile Plot and QC plot or x-axis labeling in Condition plot. Default is 0. |
legend.size |
size of feature legend (transition-level or peptide-level) above graph in Profile Plot. Default is 7. |
dot.size.profile |
size of dots in profile plot. Default is 2. |
dot.size.condition |
size of dots in condition plot. Default is 3. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
which.Protein |
Protein list to draw plots. List can be names of Proteins or order numbers of Proteins from levels(data$FeatureLevelData$PROTEIN). Default is "all", which generates all plots for each protein. For QC plot, "allonly" will generate one QC plot with all proteins. |
originalPlot |
TRUE(default) draws original profile plots. |
summaryPlot |
TRUE(default) draws profile plots with summarization for run levels. |
save_condition_plot_result |
TRUE saves the table with values using condition plots. Default is FALSE. |
remove_uninformative_feature_outlier |
It only works after users used featureSubset="highQuality" in dataProcess. TRUE allows to remove 1) the features are flagged in the column, feature_quality="Uninformative" which are features with bad quality, 2) outliers that are flagged in the column, is_outlier=TRUE in Profile plots. FALSE (default) shows all features and intensities in profile plots. |
address |
prefix for the filename that will store the results. |
isPlotly |
Parameter to use Plotly or ggplot2. If set to TRUE, MSstats will save Plotly plots as HTML files. If set to FALSE MSstats will save ggplot2 plots as PDF files Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "ProfilePlot.pdf" or "QCplot.pdf" or "ConditionPlot.pdf" or "ConditionPlot_value.csv". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window. |
Profile Plot : identify the potential sources of variation of each protein. QuantData$FeatureLevelData is used for plots. X-axis is run. Y-axis is log-intensities of transitions. Reference/endogenous signals are in the left/right panel. Line colors indicate peptides and line types indicate transitions. In summarization plots, gray dots and lines are the same as original profile plots with QuantData$FeatureLevelData. Dark dots and lines are for summarized intensities from QuantData$ProteinLevelData.
QC Plot : illustrate the systematic bias between MS runs. After normalization, the reference signals for all proteins should be stable across MS runs. QuantData$FeatureLevelData is used for plots. X-axis is run. Y-axis is log-intensities of transition. Reference/endogenous signals are in the left/right panel. The pdf file contains (1) QC plot for all proteins and (2) QC plots for each protein separately.
Condition Plot : illustrate the systematic difference between conditions. Summarized intensnties from QuantData$ProteinLevelData are used for plots. X-axis is condition. Y-axis is summarized log transformed intensity. If scale is TRUE, the levels of conditions is scaled according to its actual values at x-axis. Red points indicate the mean for each condition. If interval is "CI", blue error bars indicate the confidence interval with 0.95 significant level for each condition. If interval is "SD", blue error bars indicate the standard deviation for each condition.The interval is not related with model-based analysis.
The input of this function is the quantitative data from function dataProcess
.
# Consider quantitative data (i.e. QuantData) from a yeast study with ten time points of interests, # three biological replicates, and no technical replicates which is a time-course experiment. # The goal is to provide pre-analysis visualization by automatically generate two types of figures # in two separate pdf files. # Protein IDHC (gene name IDP2) is differentially expressed in time point 1 and time point 7, # whereas, Protein PMG2 (gene name GPM2) is not. QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) # Profile plot dataProcessPlots(data=QuantData,type="ProfilePlot") # Quality control plot dataProcessPlots(data=QuantData,type="QCPlot") # Quantification plot for conditions dataProcessPlots(data=QuantData,type="ConditionPlot")
# Consider quantitative data (i.e. QuantData) from a yeast study with ten time points of interests, # three biological replicates, and no technical replicates which is a time-course experiment. # The goal is to provide pre-analysis visualization by automatically generate two types of figures # in two separate pdf files. # Protein IDHC (gene name IDP2) is differentially expressed in time point 1 and time point 7, # whereas, Protein PMG2 (gene name GPM2) is not. QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) # Profile plot dataProcessPlots(data=QuantData,type="ProfilePlot") # Quality control plot dataProcessPlots(data=QuantData,type="QCPlot") # Quantification plot for conditions dataProcessPlots(data=QuantData,type="ConditionPlot")
This is a data set obtained from a published study (Mueller, et. al, 2007). A controlled spike-in experiment, where 6 proteins, (horse myoglobin, bovine carbonic anhydrase, horse Cytochrome C, chicken lysozyme, yeast alcohol dehydrogenase, rabbit aldolase A) were spiked into a complex background in known concentrations in a latin square design. The experiment contained 6 mixtures, and each mixture was analyzed in label-free LC-MS mode with 3 technical replicates (resulting in the total of 18 runs). Each protein was represented by 7-21 peptides, and each peptide was represented by 1-5 transition.
DDARawData
DDARawData
data.frame
The raw data (input data for MSstats) is required to contain variable of ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity. The variable names should be fixed.
If the information of one or more columns is not available for the original raw data, please retain the column variables and type in fixed value. For example, the original raw data does not contain the information of PrecursorCharge and ProductCharge, we retain the column PrecursorCharge and ProductCharge and then type in NA for all transitions in RawData.
Variable Intensity is required to be original signal without any log transformation and can be specified as the peak of height or the peak of area under curve.
data.frame with the required format of MSstats.
Meena Choi, Olga Vitek.
Maintainer: Meena Choi ([email protected])
Meena Choi, Ching-Yun Chang, Timothy Clough, Daniel Broudy, Trevor Killeen, Brendan MacLean and Olga Vitek. "MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments" Bioinformatics, 30(17):1514-1526, 2014.
Timothy Clough, Safia Thaminy, Susanne Ragg, Ruedi Aebersold, Olga Vitek. "Statistical protein quantification and significance analysis in label-free LC-M experiments with complex designs" BMC Bioinformatics, 13:S16, 2012.
Mueller, L. N., Rinner, O., Schmidt, A., Letarte, S., Bodenmiller, B., Brusniak, M., Vitek, O., Aebersold, R., and Muller, M. (2007). SuperHirn - a novel tool for high resolution LC-MS based peptide/protein profiling. Proteomics, 7, 3470-3480. 3, 34
head(DDARawData)
head(DDARawData)
This is a data set obtained from a published study (Mueller, et. al, 2007). A controlled spike-in experiment, where 6 proteins, (horse myoglobin, bovine carbonic anhydrase, horse Cytochrome C, chicken lysozyme, yeast alcohol dehydrogenase, rabbit aldolase A) were spiked into a complex background in known concentrations in a latin square design. The experiment contained 6 mixtures, and each mixture was analyzed in label-free LC-MS mode with 3 technical replicates (resulting in the total of 18 runs). Each protein was represented by 7-21 peptides, and each peptide was represented by 1-5 transition. Skyline is used for processing.
DDARawData.Skyline
DDARawData.Skyline
data.frame
The raw data (input data for MSstats) is required to contain variable of ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity. The variable names should be fixed.
This is 'MSstats input' format from Skyline used by 'MSstats_report.skyr'. The column names, 'FileName' and 'Area', should be changed to 'Run' and 'Intensity'. There are two extra columns called 'StandardType' and 'Truncated'.'StandardType' column can be used for normalization='globalStandard' in dataProcess
. 'Truncated' columns can be used to remove the truncated peaks with skylineReport=TRUE in dataProcess
.
If the information of one or more columns is not available for the original raw data, please retain the column variables and type in fixed value. For example, the original raw data does not contain the information of PrecursorCharge and ProductCharge, we retain the column PrecursorCharge and ProductCharge and then type in NA for all transitions in RawData.
Variable Intensity is required to be original signal without any log transformation and can be specified as the peak of height or the peak of area under curve.
data.frame with the required format of MSstats.
Meena Choi, Olga Vitek.
Maintainer: Meena Choi ([email protected])
Meena Choi, Ching-Yun Chang, Timothy Clough, Daniel Broudy, Trevor Killeen, Brendan MacLean and Olga Vitek. "MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments" Bioinformatics, 30(17):1514-1526, 2014.
Timothy Clough, Safia Thaminy, Susanne Ragg, Ruedi Aebersold, Olga Vitek. "Statistical protein quantification and significance analysis in label-free LC-M experiments with complex designs" BMC Bioinformatics, 13:S16, 2012.
head(DDARawData.Skyline)
head(DDARawData.Skyline)
Calculate sample size for future experiments of a Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS) experiment based on intensity-based linear model. Two options of the calculation: (1) number of biological replicates per condition, (2) power.
designSampleSize( data, desiredFC, FDR = 0.05, numSample = TRUE, power = 0.9, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL )
designSampleSize( data, desiredFC, FDR = 0.05, numSample = TRUE, power = 0.9, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL )
data |
'FittedModel' in testing output from function groupComparison. |
desiredFC |
the range of a desired fold change which includes the lower and upper values of the desired fold change. |
FDR |
a pre-specified false discovery ratio (FDR) to control the overall false positive rate. Default is 0.05 |
numSample |
minimal number of biological replicates per condition. TRUE represents you require to calculate the sample size for this category, else you should input the exact number of biological replicates. |
power |
a pre-specified statistical power which defined as the probability of detecting a true fold change. TRUE represent you require to calculate the power for this category, else you should input the average of power you expect. Default is 0.9 |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
The function fits the model and uses variance components to calculate sample size. The underlying model fitting with intensity-based linear model with technical MS run replication. Estimated sample size is rounded to 0 decimal. The function can only obtain either one of the categories of the sample size calculation (numSample, numPep, numTran, power) at the same time.
data.frame - sample size calculation results including varibles: desiredFC, numSample, FDR, and power.
Meena Choi, Ching-Yun Chang, Olga Vitek.
# Consider quantitative data (i.e. QuantData) from yeast study. # A time course study with ten time points of interests and three biological replicates. QuantData <- dataProcess(SRMRawData) head(QuantData$FeatureLevelData) ## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") colnames(comparison)<-unique(QuantData$ProteinLevelData$GROUP) testResultMultiComparisons<-groupComparison(contrast.matrix=comparison,data=QuantData) ## Calculate sample size for future experiments: #(1) Minimal number of biological replicates per condition designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=TRUE, desiredFC=c(1.25,1.75), FDR=0.05, power=0.8) #(2) Power calculation designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=2, desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE)
# Consider quantitative data (i.e. QuantData) from yeast study. # A time course study with ten time points of interests and three biological replicates. QuantData <- dataProcess(SRMRawData) head(QuantData$FeatureLevelData) ## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") colnames(comparison)<-unique(QuantData$ProteinLevelData$GROUP) testResultMultiComparisons<-groupComparison(contrast.matrix=comparison,data=QuantData) ## Calculate sample size for future experiments: #(1) Minimal number of biological replicates per condition designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=TRUE, desiredFC=c(1.25,1.75), FDR=0.05, power=0.8) #(2) Power calculation designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=2, desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE)
To illustrate the relationship of desired fold change and the calculated
minimal number sample size which are (1) number of biological replicates per condition,
(2) number of peptides per protein,
(3) number of transitions per peptide, and
(4) power. The input is the result from function (designSampleSize
.
designSampleSizePlots(data, isPlotly = FALSE)
designSampleSizePlots(data, isPlotly = FALSE)
data |
output from function designSampleSize. |
isPlotly |
Parameter to use Plotly or ggplot2. If set to TRUE, MSstats will save Plotly plots as HTML files. If set to FALSE MSstats will save ggplot2 plots as PDF files |
Data in the example is based on the results of sample size calculation from function designSampleSize
Plot for estimated sample size with assigned variable.
Meena Choi, Ching-Yun Chang, Olga Vitek.
# Based on the results of sample size calculation from function designSampleSize, # we generate a series of sample size plots for number of biological replicates, or peptides, # or transitions or power plot. QuantData<-dataProcess(SRMRawData) head(QuantData$ProcessedData) ## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") colnames(comparison)<-unique(QuantData$ProteinLevelData$GROUP) testResultMultiComparisons<-groupComparison(contrast.matrix=comparison, data=QuantData) # plot the calculated sample sizes for future experiments: # (1) Minimal number of biological replicates per condition result.sample<-designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=TRUE, desiredFC=c(1.25,1.75), FDR=0.05, power=0.8) designSampleSizePlots(data=result.sample) # (2) Power result.power<-designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=2, desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE) designSampleSizePlots(data=result.power)
# Based on the results of sample size calculation from function designSampleSize, # we generate a series of sample size plots for number of biological replicates, or peptides, # or transitions or power plot. QuantData<-dataProcess(SRMRawData) head(QuantData$ProcessedData) ## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") colnames(comparison)<-unique(QuantData$ProteinLevelData$GROUP) testResultMultiComparisons<-groupComparison(contrast.matrix=comparison, data=QuantData) # plot the calculated sample sizes for future experiments: # (1) Minimal number of biological replicates per condition result.sample<-designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=TRUE, desiredFC=c(1.25,1.75), FDR=0.05, power=0.8) designSampleSizePlots(data=result.sample) # (2) Power result.power<-designSampleSize(data=testResultMultiComparisons$FittedModel, numSample=2, desiredFC=c(1.25,1.75), FDR=0.05, power=TRUE) designSampleSizePlots(data=result.power)
Import Diann files
DIANNtoMSstatsFormat( input, annotation = NULL, global_qvalue_cutoff = 0.01, qvalue_cutoff = 0.01, pg_qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = TRUE, removeProtein_with1Feature = TRUE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, MBR = TRUE, ... )
DIANNtoMSstatsFormat( input, annotation = NULL, global_qvalue_cutoff = 0.01, qvalue_cutoff = 0.01, pg_qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = TRUE, removeProtein_with1Feature = TRUE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, MBR = TRUE, ... )
input |
name of MSstats input report from Diann, which includes feature-level data. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. |
global_qvalue_cutoff |
The global qvalue cutoff |
qvalue_cutoff |
local qvalue cutoff for library |
pg_qvalue_cutoff |
local qvalue cutoff for protein groups Run should be the same as filename. |
useUniquePeptide |
should unique pepties be removed |
removeFewMeasurements |
should proteins with few measurements be removed |
removeOxidationMpeptides |
should peptides with oxidation be removed |
removeProtein_with1Feature |
should proteins with a single feature be removed |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
MBR |
True if analysis was done with match between runs |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Elijah Willie
## Not run: input = fread('diann_pooled_report.tsv') annot = fread('Annotation.csv') colnames(annot) = c('Condition', 'Run', 'BioReplicate') input = DIANNtoMSstatsFormat(input, annotation = annot, MBR = F) head(input) ## End(Not run)
## Not run: input = fread('diann_pooled_report.tsv') annot = fread('Annotation.csv') colnames(annot) = c('Condition', 'Run', 'BioReplicate') input = DIANNtoMSstatsFormat(input, annotation = annot, MBR = F) head(input) ## End(Not run)
This example dataset was obtained from a group comparison study of S. Pyogenes. Two conditions, S. Pyogenes with 0% and 10% of human plasma added (denoted Strep 0% and Strep 10%), were profiled in two replicates, in the label-free mode, with a SWATH-MS-enabled AB SCIEX TripleTOF 5600 System. The identification and quantification of spectral peaks was assisted by a spectral library, and was performed using OpenSWATH software (http: //proteomics.ethz.ch/openswath.html). For reasons of space, the example dataset only contains two proteins from this study. Protein FabG shows strong evidence of differential abundance, while protein Probable RNA helicase exp9 only shows moderate evidence of dif- ferential abundance between conditions.
DIARawData
DIARawData
data.frame
The raw data (input data for MSstats) is required to contain variable of ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity. The variable names should be fixed.
If the information of one or more columns is not available for the original raw data, please retain the column variables and type in fixed value. For example, the original raw data does not contain the information of PrecursorCharge and ProductCharge, we retain the column PrecursorCharge and ProductCharge and then type in NA for all transitions in RawData.
Variable Intensity is required to be original signal without any log transformation and can be specified as the peak of height or the peak of area under curve.
data.frame with the required format of MSstats.
Meena Choi, Olga Vitek.
Maintainer: Meena Choi ([email protected])
head(DIARawData)
head(DIARawData)
Import DIA-Umpire files
DIAUmpiretoMSstatsFormat( raw.frag, raw.pep, raw.pro, annotation, useSelectedFrag = TRUE, useSelectedPep = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
DIAUmpiretoMSstatsFormat( raw.frag, raw.pep, raw.pro, annotation, useSelectedFrag = TRUE, useSelectedPep = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
raw.frag |
name of FragSummary_date.xls data, which includes feature-level data. |
raw.pep |
name of PeptideSummary_date.xls data, which includes selected fragments information. |
raw.pro |
name of ProteinSummary_date.xls data, which includes selected peptides information. |
annotation |
name of annotation data which includes Condition, BioReplicate, Run information. |
useSelectedFrag |
TRUE will use the selected fragment for each peptide. 'Selected_fragments' column is required. |
useSelectedPep |
TRUE will use the selected peptide for each protein. 'Selected_peptides' column is required. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek
diau_frag = system.file("tinytest/raw_data/DIAUmpire/dia_frag.csv", package = "MSstatsConvert") diau_pept = system.file("tinytest/raw_data/DIAUmpire/dia_pept.csv", package = "MSstatsConvert") diau_prot = system.file("tinytest/raw_data/DIAUmpire/dia_prot.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_diau.csv", package = "MSstats") diau_frag = data.table::fread(diau_frag) diau_pept = data.table::fread(diau_pept) diau_prot = data.table::fread(diau_prot) annot = data.table::fread(annot) diau_frag = diau_frag[, lapply(.SD, function(x) if (is.integer(x)) as.numeric(x) else x)] # In case numeric columns are not interpreted correctly diau_imported = DIAUmpiretoMSstatsFormat(diau_frag, diau_pept, diau_prot, annot, use_log_file = FALSE) head(diau_imported)
diau_frag = system.file("tinytest/raw_data/DIAUmpire/dia_frag.csv", package = "MSstatsConvert") diau_pept = system.file("tinytest/raw_data/DIAUmpire/dia_pept.csv", package = "MSstatsConvert") diau_prot = system.file("tinytest/raw_data/DIAUmpire/dia_prot.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_diau.csv", package = "MSstats") diau_frag = data.table::fread(diau_frag) diau_pept = data.table::fread(diau_pept) diau_prot = data.table::fread(diau_prot) annot = data.table::fread(annot) diau_frag = diau_frag[, lapply(.SD, function(x) if (is.integer(x)) as.numeric(x) else x)] # In case numeric columns are not interpreted correctly diau_imported = DIAUmpiretoMSstatsFormat(diau_frag, diau_pept, diau_prot, annot, use_log_file = FALSE) head(diau_imported)
An example SDRF file which is used to store metadata for MS-based protemics experiments.
example_SDRF
example_SDRF
data.frame
An example SDRF file which is used to store metadata for MS-based protemics experiments.
data.frame example of an SDRF file.
Mateusz Staniak, Devon Kohler, Olga Vitek.
head(example_SDRF)
head(example_SDRF)
Extract experimental design from MSstats format into SDRF format
extractSDRF( data, run_name = "comment[data file]", condition_name = "characteristics[disease]", biological_replicate = "characteristics[biological replicate]", fraction = NULL, meta_data = NULL )
extractSDRF( data, run_name = "comment[data file]", condition_name = "characteristics[disease]", biological_replicate = "characteristics[biological replicate]", fraction = NULL, meta_data = NULL )
data |
MSstats formatted data that is the output of a dedicated converter, such as 'MaxQtoMSstatsFormat', 'SkylinetoMSstatsFormat', ect. |
run_name |
Run column name in SDRF data |
condition_name |
Condition column name in SDRF data |
biological_replicate |
Biological replicate column name in SDRF data |
fraction |
Fraction column name in SDRF data (if applicable). Default is 'NULL'. If there are no fractions keep 'NULL'. |
meta_data |
A data.frame including any additional meta data for the SDRF file that is not included in MSstats. This meta data will be added into the final SDRF file. Please ensure the run names in the meta data matches the run names in the MSstats data. |
mq_ev = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert")) mq_pg = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert")) annot = data.table::fread(system.file("tinytest/raw_data/MaxQuant/annotation.csv", package = "MSstatsConvert")) maxq_imported = MaxQtoMSstatsFormat(mq_ev, annot, mq_pg, use_log_file = FALSE) head(maxq_imported) SDRF_file = extractSDRF(maxq_imported)
mq_ev = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert")) mq_pg = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert")) annot = data.table::fread(system.file("tinytest/raw_data/MaxQuant/annotation.csv", package = "MSstatsConvert")) maxq_imported = MaxQtoMSstatsFormat(mq_ev, annot, mq_pg, use_log_file = FALSE) head(maxq_imported) SDRF_file = extractSDRF(maxq_imported)
Import FragPipe files
FragPipetoMSstatsFormat( input, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
FragPipetoMSstatsFormat( input, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
name of FragPipe msstats.csv export. ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity are required. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Devon Kohler
fragpipe_raw = system.file("tinytest/raw_data/FragPipe/fragpipe_input.csv", package = "MSstatsConvert") fragpipe_raw = data.table::fread(fragpipe_raw) fragpipe_imported = FragPipetoMSstatsFormat(fragpipe_raw, use_log_file = FALSE) head(fragpipe_imported)
fragpipe_raw = system.file("tinytest/raw_data/FragPipe/fragpipe_input.csv", package = "MSstatsConvert") fragpipe_raw = data.table::fread(fragpipe_raw) fragpipe_imported = FragPipetoMSstatsFormat(fragpipe_raw, use_log_file = FALSE) head(fragpipe_imported)
Get feature-level data to be used in the MSstatsSummarizationOutput function
getProcessed(input)
getProcessed(input)
input |
data.table processed by dataProcess subfunctions |
data.table processed by dataProcess subfunctions
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input_all = MSstatsSelectFeatures(input, "all") # all features input_5 = MSstatsSelectFeatures(data.table::copy(input), "topN", top_n = 5) # top 5 features proc1 = getProcessed(input_all) proc2 = getProcessed(input_5) proc1 proc2
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input_all = MSstatsSelectFeatures(input, "all") # all features input_5 = MSstatsSelectFeatures(data.table::copy(input), "topN", top_n = 5) # top 5 features proc1 = getProcessed(input_all) proc2 = getProcessed(input_5) proc1 proc2
Get information about number of measurements for each group
getSamplesInfo(summarization_output)
getSamplesInfo(summarization_output)
summarization_output |
output of the dataProcess function |
This function extracts information required to compute percentages of missing and imputed values in group comparison.
data.table
QuantData <- dataProcess(DDARawData, use_log_file = FALSE) samples_info <- getSamplesInfo(QuantData) samples_info
QuantData <- dataProcess(DDARawData, use_log_file = FALSE) samples_info <- getSamplesInfo(QuantData) samples_info
Get proteins based on names or integer IDs
getSelectedProteins(chosen_proteins, all_proteins)
getSelectedProteins(chosen_proteins, all_proteins)
chosen_proteins |
protein names or integers IDs |
all_proteins |
all unique proteins |
character
Whole plot testing
groupComparison( contrast.matrix, data, save_fitted_models = TRUE, log_base = 2, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, numberOfCores = 1 )
groupComparison( contrast.matrix, data, save_fitted_models = TRUE, log_base = 2, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, numberOfCores = 1 )
contrast.matrix |
comparison between conditions of interests. |
data |
name of the (output of dataProcess function) data set. |
save_fitted_models |
logical, if TRUE, fitted models will be added to the output. |
log_base |
base of the logarithm used in dataProcess. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
numberOfCores |
Number of cores for parallel processing. When > 1, a logfile named 'MSstats_groupComparison_log_progress.log' is created to track progress. Only works for Linux & Mac OS. Default is 1. |
contrast.matrix : comparison of interest. Based on the levels of conditions, specify 1 or -1 to the conditions of interests and 0 otherwise. The levels of conditions are sorted alphabetically. Command levels(QuantData$FeatureLevelData$GROUP_ORIGINAL) can illustrate the actual order of the levels of conditions. The underlying model fitting functions are lm and lmer for the fixed effects model and mixed effects model, respectively. The input of this function is the quantitative data from function (dataProcess).
list that consists of three elements: "ComparisonResult" - data.frame with results of statistical testing, "ModelQC" - data.frame with data used to fit models for group comparison and "FittedModel" - list of fitted models.
# Consider quantitative data (i.e. QuantData) from yeast study with ten time points of interests, # three biological replicates, and no technical replicates. # It is a time-course experiment and we attempt to compare differential abundance # between time 1 and 7 in a set of targeted proteins. # In this label-based SRM experiment, MSstats uses the fitted model with expanded scope of # Biological replication. QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] # Tests for differentially abundant proteins with models: # label-based SRM experiment with expanded scope of biological replication. testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) # table for result testResultOneComparison$ComparisonResult
# Consider quantitative data (i.e. QuantData) from yeast study with ten time points of interests, # three biological replicates, and no technical replicates. # It is a time-course experiment and we attempt to compare differential abundance # between time 1 and 7 in a set of targeted proteins. # In this label-based SRM experiment, MSstats uses the fitted model with expanded scope of # Biological replication. QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] # Tests for differentially abundant proteins with models: # label-based SRM experiment with expanded scope of biological replication. testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) # table for result testResultOneComparison$ComparisonResult
To summarize the results of log-fold changes and adjusted p-values for differentially abundant proteins,
groupComparisonPlots takes testing results from function (groupComparison
) as input and
automatically generate three types of figures in pdf files as output :
(1) volcano plot (specify "VolcanoPlot" in option type) for each comparison separately;
(2) heatmap (specify "Heatmap" in option type) for multiple comparisons ;
(3) comparison plot (specify "ComparisonPlot" in option type) for multiple comparisons per protein.
groupComparisonPlots( data, type, sig = 0.05, FCcutoff = FALSE, logBase.pvalue = 10, ylimUp = FALSE, ylimDown = FALSE, xlimUp = FALSE, x.axis.size = 10, y.axis.size = 10, dot.size = 3, text.size = 4, text.angle = 0, legend.size = 13, ProteinName = TRUE, colorkey = TRUE, numProtein = 100, clustering = "both", width = 800, height = 600, which.Comparison = "all", which.Protein = "all", address = "", isPlotly = FALSE )
groupComparisonPlots( data, type, sig = 0.05, FCcutoff = FALSE, logBase.pvalue = 10, ylimUp = FALSE, ylimDown = FALSE, xlimUp = FALSE, x.axis.size = 10, y.axis.size = 10, dot.size = 3, text.size = 4, text.angle = 0, legend.size = 13, ProteinName = TRUE, colorkey = TRUE, numProtein = 100, clustering = "both", width = 800, height = 600, which.Comparison = "all", which.Protein = "all", address = "", isPlotly = FALSE )
data |
'ComparisonResult' in testing output from function groupComparison. |
type |
choice of visualization. "VolcanoPlot" represents volcano plot of log fold changes and adjusted p-values for each comparison separately. "Heatmap" represents heatmap of adjusted p-values for multiple comparisons. "ComparisonPlot" represents comparison plot of log fold changes for multiple comparisons per protein. |
sig |
FDR cutoff for the adjusted p-values in heatmap and volcano plot. level of significance for comparison plot. 100(1-sig)% confidence interval will be drawn. sig=0.05 is default. |
FCcutoff |
for volcano plot or heatmap, whether involve fold change cutoff or not. FALSE (default) means no fold change cutoff is applied for significance analysis. FCcutoff = specific value means specific fold change cutoff is applied. |
logBase.pvalue |
for volcano plot or heatmap, (-) logarithm transformation of adjusted p-value with base 2 or 10(default). |
ylimUp |
for all three plots, upper limit for y-axis. FALSE (default) for volcano plot/heatmap use maximum of -log2 (adjusted p-value) or -log10 (adjusted p-value). FALSE (default) for comparison plot uses maximum of log-fold change + CI. |
ylimDown |
for all three plots, lower limit for y-axis. FALSE (default) for volcano plot/heatmap use minimum of -log2 (adjusted p-value) or -log10 (adjusted p-value). FALSE (default) for comparison plot uses minimum of log-fold change - CI. |
xlimUp |
for Volcano plot, the limit for x-axis. FALSE (default) for use maximum for absolute value of log-fold change or 3 as default if maximum for absolute value of log-fold change is less than 3. |
x.axis.size |
size of axes labels, e.g. name of the comparisons in heatmap, and in comparison plot. Default is 10. |
y.axis.size |
size of axes labels, e.g. name of targeted proteins in heatmap. Default is 10. |
dot.size |
size of dots in volcano plot and comparison plot. Default is 3. |
text.size |
size of ProteinName label in the graph for Volcano Plot. Default is 4. |
text.angle |
angle of x-axis labels represented each comparison at the bottom of graph in comparison plot. Default is 0. |
legend.size |
size of legend for color at the bottom of volcano plot. Default is 7. |
ProteinName |
for volcano plot only, whether display protein names or not. TRUE (default) means protein names, which are significant, are displayed next to the points. FALSE means no protein names are displayed. |
colorkey |
TRUE(default) shows colorkey. |
numProtein |
For ggplot2: The number of proteins which will be presented in each heatmap. Default is 100. Maximum possible number of protein for one heatmap is 180. For Plotly: use this parameter to adjust the number of proteins to be displayed on the heatmap |
clustering |
Determines how to order proteins and comparisons. Hierarchical cluster analysis with Ward method(minimum variance) is performed. 'protein' means that protein dendrogram is computed and reordered based on protein means (the order of row is changed). 'comparison' means comparison dendrogram is computed and reordered based on comparison means (the order of comparison is changed). 'both' means to reorder both protein and comparison. Default is 'protein'. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
which.Comparison |
list of comparisons to draw plots. List can be labels of comparisons or order numbers of comparisons from levels(data$Label), such as levels(testResultMultiComparisons$ComparisonResult$Label). Default is "all", which generates all plots for each protein. |
which.Protein |
Protein list to draw comparison plots. List can be names of Proteins or order numbers of Proteins from levels(testResultMultiComparisons$ComparisonResult$Protein). Default is "all", which generates all comparison plots for each protein. |
address |
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "VolcanoPlot.pdf" or "Heatmap.pdf" or "ComparisonPlot.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window. |
isPlotly |
This parameter is for MSstatsShiny application for plotly render, this cannot be used for saving PDF files as plotly do not have suppprt for PDFs currently. address and isPlotly cannot be set as TRUE at the same time. |
Volcano plot : illustrate actual log-fold changes and adjusted p-values for each comparison separately with all proteins. The x-axis is the log fold change. The base of logarithm transformation is the same as specified in "logTrans" from dataProcess
. The y-axis is the negative log2 or log10 adjusted p-values. The horizontal dashed line represents the FDR cutoff. The points below the FDR cutoff line are non-significantly abundant proteins (colored in black). The points above the FDR cutoff line are significantly abundant proteins (colored in red/blue for up-/down-regulated). If fold change cutoff is specified (FCcutoff = specific value), the points above the FDR cutoff line but within the FC cutoff line are non-significantly abundant proteins (colored in black)/
Heatmap : illustrate up-/down-regulated proteins for multiple comparisons with all proteins. Each column represents each comparison of interest. Each row represents each protein. Color red/blue represents proteins in that specific comparison are significantly up-regulated/down-regulated proteins with FDR cutoff and/or FC cutoff. The color scheme shows the evidences of significance. The darker color it is, the stronger evidence of significance it has. Color gold represents proteins are not significantly different in abundance.
Comparison plot : illustrate log-fold change and its variation of multiple comparisons for single protein. X-axis is comparison of interest. Y-axis is the log fold change. The red points are the estimated log fold change from the model. The blue error bars are the confidence interval with 0.95 significant level for log fold change. This interval is only based on the standard error, which is estimated from the model.
QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) ## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] testResultMultiComparisons<-groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) testResultMultiComparisons$ComparisonResult # Volcano plot with FDR cutoff = 0.05 and no FC cutoff groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="VolcanoPlot", logBase.pvalue=2, address="Ex1_") # Volcano plot with FDR cutoff = 0.05, FC cutoff = 70, upper y-axis limit = 100, # and no protein name displayed # FCcutoff=70 is for demonstration purpose groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="VolcanoPlot", FCcutoff=70, logBase.pvalue=2, ylimUp=100, ProteinName=FALSE,address="Ex2_") # Heatmap with FDR cutoff = 0.05 groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="Heatmap", logBase.pvalue=2, address="Ex1_") # Heatmap with FDR cutoff = 0.05 and FC cutoff = 70 # FCcutoff=70 is for demonstration purpose groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="Heatmap", FCcutoff=70, logBase.pvalue=2, address="Ex2_") # Comparison Plot groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="ComparisonPlot", address="Ex1_") # Comparison Plot groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="ComparisonPlot", ylimUp=8, ylimDown=-1, address="Ex2_")
QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) ## based on multiple comparisons (T1 vs T3; T1 vs T7; T1 vs T9) comparison1<-matrix(c(-1,0,1,0,0,0,0,0,0,0),nrow=1) comparison2<-matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) comparison3<-matrix(c(-1,0,0,0,0,0,0,0,1,0),nrow=1) comparison<-rbind(comparison1,comparison2, comparison3) row.names(comparison)<-c("T3-T1","T7-T1","T9-T1") groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] testResultMultiComparisons<-groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) testResultMultiComparisons$ComparisonResult # Volcano plot with FDR cutoff = 0.05 and no FC cutoff groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="VolcanoPlot", logBase.pvalue=2, address="Ex1_") # Volcano plot with FDR cutoff = 0.05, FC cutoff = 70, upper y-axis limit = 100, # and no protein name displayed # FCcutoff=70 is for demonstration purpose groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="VolcanoPlot", FCcutoff=70, logBase.pvalue=2, ylimUp=100, ProteinName=FALSE,address="Ex2_") # Heatmap with FDR cutoff = 0.05 groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="Heatmap", logBase.pvalue=2, address="Ex1_") # Heatmap with FDR cutoff = 0.05 and FC cutoff = 70 # FCcutoff=70 is for demonstration purpose groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="Heatmap", FCcutoff=70, logBase.pvalue=2, address="Ex2_") # Comparison Plot groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="ComparisonPlot", address="Ex1_") # Comparison Plot groupComparisonPlots(data=testResultMultiComparisons$ComparisonResult, type="ComparisonPlot", ylimUp=8, ylimDown=-1, address="Ex2_")
To check the assumption of linear model for whole plot inference,
groupComparisonQCPlots takes the results after fitting models from function
(groupComparison
) as input and automatically generate two types
of figures in pdf files as output:
(1) normal quantile-quantile plot (specify "QQPlot" in option type) for checking
normally distributed errors.;
(2) residual plot (specify "ResidualPlot" in option type).
groupComparisonQCPlots( data, type, axis.size = 10, dot.size = 3, width = 10, height = 10, which.Protein = "all", address = "" )
groupComparisonQCPlots( data, type, axis.size = 10, dot.size = 3, width = 10, height = 10, which.Protein = "all", address = "" )
data |
output from function groupComparison. |
type |
choice of visualization. "QQPlots" represents normal quantile-quantile plot for each protein after fitting models. "ResidualPlots" represents a plot of residuals versus fitted values for each protein in the dataset. |
axis.size |
size of axes labels. Default is 10. |
dot.size |
size of points in the graph for residual plots and QQ plots. Default is 3. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
which.Protein |
Protein list to draw plots. List can be names of Proteins or order numbers of Proteins from levels(testResultOneComparison$ComparisonResult$Protein). Default is "all", which generates all plots for each protein. |
address |
name that will serve as a prefix to the name of output file. |
Results based on statistical models for whole plot level inference are accurate as long as the assumptions of the model are met. The model assumes that the measurement errors are normally distributed with mean 0 and constant variance. The assumption of a constant variance can be checked by examining the residuals from the model.
QQPlots : a normal quantile-quantile plot for each protein is generated in order to check whether the errors are well approximated by a normal distribution. If points fall approximately along a straight line, then the assumption is appropriate for that protein. Only large deviations from the line are problematic.
ResidualPlots : The plots of residuals against predicted(fitted) values. If it shows a random scatter, then the assumption is appropriate.
produce a pdf file
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) levels(QuantData$FeatureLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" colnames(comparison) <- unique(QuantData$ProteinLevelData$GROUP) # Tests for differentially abundant proteins with models: # label-based SRM experiment with expanded scope of biological replication. testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) # normal quantile-quantile plots groupComparisonQCPlots(data=testResultOneComparison, type="QQPlots", address="") # residual plots groupComparisonQCPlots(data=testResultOneComparison, type="ResidualPlots", address="")
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) levels(QuantData$FeatureLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" colnames(comparison) <- unique(QuantData$ProteinLevelData$GROUP) # Tests for differentially abundant proteins with models: # label-based SRM experiment with expanded scope of biological replication. testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) # normal quantile-quantile plots groupComparisonQCPlots(data=testResultOneComparison, type="QQPlots", address="") # residual plots groupComparisonQCPlots(data=testResultOneComparison, type="ResidualPlots", address="")
Prepare a peptides dictionary for global standards normalization
makePeptidesDictionary(input, normalization)
makePeptidesDictionary(input, normalization)
input |
'data.table' in MSstats standard format |
normalization |
normalization method |
This function extracts information required to perform normalization with global standards. It is useful for running the summarization workflow outside of the dataProcess function.
input = data.table::as.data.table(DDARawData) peptides_dict = makePeptidesDictionary(input, "GLOBALSTANDARDS") head(peptides_dict) # ready to be passed to the MSstatsNormalize function
input = data.table::as.data.table(DDARawData) peptides_dict = makePeptidesDictionary(input, "GLOBALSTANDARDS") head(peptides_dict) # ready to be passed to the MSstatsNormalize function
Import MaxQuant files
MaxQtoMSstatsFormat( evidence, annotation, proteinGroups, proteinID = "Proteins", useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeMpeptides = FALSE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
MaxQtoMSstatsFormat( evidence, annotation, proteinGroups, proteinID = "Proteins", useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeMpeptides = FALSE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
evidence |
name of 'evidence.txt' data, which includes feature-level data. |
annotation |
name of 'annotation.txt' data which includes Raw.file, Condition, BioReplicate, Run, IsotopeLabelType information. |
proteinGroups |
name of 'proteinGroups.txt' data. It needs to matching protein group ID. If proteinGroups=NULL, use 'Proteins' column in 'evidence.txt'. |
proteinID |
'Proteins'(default) or 'Leading.razor.protein' for Protein ID. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeMpeptides |
TRUE will remove the peptides including 'M' sequence. FALSE is default. |
removeOxidationMpeptides |
TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Peptide |
TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Warning: MSstats does not support for metabolic labeling or iTRAQ experiments.
Meena Choi, Olga Vitek.
mq_ev = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert")) mq_pg = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert")) annot = data.table::fread(system.file("tinytest/raw_data/MaxQuant/annotation.csv", package = "MSstatsConvert")) maxq_imported = MaxQtoMSstatsFormat(mq_ev, annot, mq_pg, use_log_file = FALSE) head(maxq_imported)
mq_ev = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert")) mq_pg = data.table::fread(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert")) annot = data.table::fread(system.file("tinytest/raw_data/MaxQuant/annotation.csv", package = "MSstatsConvert")) maxq_imported = MaxQtoMSstatsFormat(mq_ev, annot, mq_pg, use_log_file = FALSE) head(maxq_imported)
To check the assumption of linear model for whole plot inference,
modelBasedQCPlots takes the results after fitting models from function
(groupComparison
) as input and automatically generate two types
of figures in pdf files as output:
(1) normal quantile-quantile plot (specify "QQPlot" in option type) for checking
normally distributed errors.;
(2) residual plot (specify "ResidualPlot" in option type).
modelBasedQCPlots( data, type, axis.size = 10, dot.size = 3, width = 10, height = 10, which.Protein = "all", address = "", displayDeprecationMessage = TRUE )
modelBasedQCPlots( data, type, axis.size = 10, dot.size = 3, width = 10, height = 10, which.Protein = "all", address = "", displayDeprecationMessage = TRUE )
data |
output from function groupComparison. |
type |
choice of visualization. "QQPlots" represents normal quantile-quantile plot for each protein after fitting models. "ResidualPlots" represents a plot of residuals versus fitted values for each protein in the dataset. |
axis.size |
size of axes labels. Default is 10. |
dot.size |
size of points in the graph for residual plots and QQ plots. Default is 3. |
width |
width of the saved file. Default is 10. |
height |
height of the saved file. Default is 10. |
which.Protein |
Protein list to draw plots. List can be names of Proteins or order numbers of Proteins from levels(testResultOneComparison$ComparisonResult$Protein). Default is "all", which generates all plots for each protein. |
address |
name that will serve as a prefix to the name of output file. |
Results based on statistical models for whole plot level inference are accurate as long as the assumptions of the model are met. The model assumes that the measurement errors are normally distributed with mean 0 and constant variance. The assumption of a constant variance can be checked by examining the residuals from the model.
QQPlots : a normal quantile-quantile plot for each protein is generated in order to check whether the errors are well approximated by a normal distribution. If points fall approximately along a straight line, then the assumption is appropriate for that protein. Only large deviations from the line are problematic.
ResidualPlots : The plots of residuals against predicted(fitted) values. If it shows a random scatter, then the assumption is appropriate.
produce a pdf file
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) levels(QuantData$FeatureLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" colnames(comparison) <- unique(QuantData$ProteinLevelData$GROUP) # Tests for differentially abundant proteins with models: # label-based SRM experiment with expanded scope of biological replication. testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) # normal quantile-quantile plots modelBasedQCPlots(data=testResultOneComparison, type="QQPlots", address="") # residual plots modelBasedQCPlots(data=testResultOneComparison, type="ResidualPlots", address="")
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) levels(QuantData$FeatureLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" colnames(comparison) <- unique(QuantData$ProteinLevelData$GROUP) # Tests for differentially abundant proteins with models: # label-based SRM experiment with expanded scope of biological replication. testResultOneComparison <- groupComparison(contrast.matrix=comparison, data=QuantData, use_log_file = FALSE) # normal quantile-quantile plots modelBasedQCPlots(data=testResultOneComparison, type="QQPlots", address="") # residual plots modelBasedQCPlots(data=testResultOneComparison, type="ResidualPlots", address="")
Create a contrast matrix for groupComparison function
MSstatsContrastMatrix(contrasts, conditions, labels = NULL)
MSstatsContrastMatrix(contrasts, conditions, labels = NULL)
contrasts |
One of the following: i) list of lists. Each sub-list consists of two vectors that name conditions that will be compared. See the details section for more information ii) matrix. In this case, it's correctness will be checked iii) "pairwise". In this case, pairwise comparison matrix will be generated iv) data.frame. In this case, input will be converted to matrix |
conditions |
unique condition labels |
labels |
labels for contrasts (row.names of the contrast matrix) |
Group comparison
MSstatsGroupComparison( summarized_list, contrast_matrix, save_fitted_models, repeated, samples_info, numberOfCores = 1 )
MSstatsGroupComparison( summarized_list, contrast_matrix, save_fitted_models, repeated, samples_info, numberOfCores = 1 )
summarized_list |
output of MSstatsPrepareForGroupComparison |
contrast_matrix |
contrast matrix |
save_fitted_models |
if TRUE, fitted models will be included in the output |
repeated |
logical, output of checkRepeatedDesign function |
samples_info |
data.table, output of getSamplesInfo function |
numberOfCores |
Number of cores for parallel processing. When > 1, a logfile named 'MSstats_groupComparison_log_progress.log' is created to track progress. Only works for Linux & Mac OS. |
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input = MSstatsPrepareForGroupComparison(QuantData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] samples_info = getSamplesInfo(QuantData) repeated = checkRepeatedDesign(QuantData) group_comparison = MSstatsGroupComparison(group_comparison_input, comparison, FALSE, repeated, samples_info) length(group_comparison) # list of length equal to number of proteins group_comparison[[1]][[1]] # data used to fit linear model group_comparison[[1]][[2]] # comparison result group_comparison[[2]][[3]] # NULL, because we set save_fitted_models to FALSE
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input = MSstatsPrepareForGroupComparison(QuantData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] samples_info = getSamplesInfo(QuantData) repeated = checkRepeatedDesign(QuantData) group_comparison = MSstatsGroupComparison(group_comparison_input, comparison, FALSE, repeated, samples_info) length(group_comparison) # list of length equal to number of proteins group_comparison[[1]][[1]] # data used to fit linear model group_comparison[[1]][[2]] # comparison result group_comparison[[2]][[3]] # NULL, because we set save_fitted_models to FALSE
Create output of group comparison based on results for individual proteins
MSstatsGroupComparisonOutput(input, summarization_output, log_base = 2)
MSstatsGroupComparisonOutput(input, summarization_output, log_base = 2)
input |
output of MSstatsGroupComparison function |
summarization_output |
output of dataProcess function |
log_base |
base of the logarithm used in fold-change calculation |
list, same as the output of 'groupComparison'
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input = MSstatsPrepareForGroupComparison(QuantData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] samples_info = getSamplesInfo(QuantData) repeated = checkRepeatedDesign(QuantData) group_comparison = MSstatsGroupComparison(group_comparison_input, comparison, FALSE, repeated, samples_info) group_comparison_final = MSstatsGroupComparisonOutput(group_comparison, QuantData) group_comparison_final[["ComparisonResult"]]
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input = MSstatsPrepareForGroupComparison(QuantData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] samples_info = getSamplesInfo(QuantData) repeated = checkRepeatedDesign(QuantData) group_comparison = MSstatsGroupComparison(group_comparison_input, comparison, FALSE, repeated, samples_info) group_comparison_final = MSstatsGroupComparisonOutput(group_comparison, QuantData) group_comparison_final[["ComparisonResult"]]
Group comparison for a single protein
MSstatsGroupComparisonSingleProtein( single_protein, contrast_matrix, repeated, groups, samples_info, save_fitted_models, has_imputed )
MSstatsGroupComparisonSingleProtein( single_protein, contrast_matrix, repeated, groups, samples_info, save_fitted_models, has_imputed )
single_protein |
data.table with summarized data for a single protein |
contrast_matrix |
contrast matrix |
repeated |
if TRUE, repeated measurements will be modeled |
groups |
unique labels of experimental conditions |
samples_info |
number of runs per group |
save_fitted_models |
if TRUE, fitted model will be saved. If not, it will be replaced with NULL |
has_imputed |
TRUE if missing values have been imputed |
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input <- MSstatsPrepareForGroupComparison(QuantData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] samples_info <- getSamplesInfo(QuantData) repeated <- checkRepeatedDesign(QuantData) single_output <- MSstatsGroupComparisonSingleProtein( group_comparison_input[[1]], comparison, repeated, groups, samples_info, FALSE, TRUE) single_output # same as a single element of MSstatsGroupComparison output
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input <- MSstatsPrepareForGroupComparison(QuantData) levels(QuantData$ProteinLevelData$GROUP) comparison <- matrix(c(-1,0,0,0,0,0,1,0,0,0),nrow=1) row.names(comparison) <- "T7-T1" groups = levels(QuantData$ProteinLevelData$GROUP) colnames(comparison) <- groups[order(as.numeric(groups))] samples_info <- getSamplesInfo(QuantData) repeated <- checkRepeatedDesign(QuantData) single_output <- MSstatsGroupComparisonSingleProtein( group_comparison_input[[1]], comparison, repeated, groups, samples_info, FALSE, TRUE) single_output # same as a single element of MSstatsGroupComparison output
Handle censored missing values
MSstatsHandleMissing( input, summary_method, impute, missing_symbol, censored_cutoff )
MSstatsHandleMissing( input, summary_method, impute, missing_symbol, censored_cutoff )
input |
'data.table' in MSstats data format |
summary_method |
summarization method ('summaryMethod' parameter to 'dataProcess') |
impute |
if TRUE, missing values are supposed to be imputed ('MBimpute' parameter to 'dataProcess') |
missing_symbol |
'censoredInt' parameter to 'dataProcess' |
censored_cutoff |
'maxQuantileforCensored' parameter to 'dataProcess' |
data.table
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) head(input)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) head(input)
Re-format the data before feature selection
MSstatsMergeFractions(input)
MSstatsMergeFractions(input)
input |
'data.table' in MSstats format |
data.table
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) head(input)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) head(input)
Normalize MS data
MSstatsNormalize( input, normalization_method, peptides_dict = NULL, standards = NULL )
MSstatsNormalize( input, normalization_method, peptides_dict = NULL, standards = NULL )
input |
data.table in MSstats format |
normalization_method |
name of a chosen normalization method: "NONE" or "FALSE" for no normalization, "EQUALIZEMEDIANS" for median normalization, "QUANTILE" normalization for quantile normalization from 'preprocessCore' package, "GLOBALSTANDARDS" for normalization based on selected peptides or proteins. |
peptides_dict |
'data.table' of names of peptides and their corresponding features. |
standards |
character vector with names of standards, required if "GLOBALSTANDARDS" method was selected. |
data.table
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") # median normalization head(input)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") # median normalization head(input)
Prepare data for processing by 'dataProcess' function
MSstatsPrepareForDataProcess(input, log_base, fix_missing)
MSstatsPrepareForDataProcess(input, log_base, fix_missing)
input |
'data.table' in MSstats format |
log_base |
base of the logarithm to transform intensities |
fix_missing |
str, optional. Defaults to NULL, which means no action. If not NULL, must be one of the options: "zero_to_na" or "na_to_zero". If "zero_to_na", Intensity values equal exactly to 0 will be converted to NA. If "na_to_zero", missing values will be replaced by zeros. |
data.table
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) head(input)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) head(input)
Prepare output for dataProcess for group comparison
MSstatsPrepareForGroupComparison(summarization_output)
MSstatsPrepareForGroupComparison(summarization_output)
summarization_output |
output of dataProcess |
list of run-level data for each protein in the input. This list has a "has_imputed" attribute that indicates if missing values were imputed in the input dataset.
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input = MSstatsPrepareForGroupComparison(QuantData) length(group_comparison_input) # list of length equal to number of proteins # in protein-level data of QuantData head(group_comparison_input[[1]])
QuantData <- dataProcess(SRMRawData, use_log_file = FALSE) group_comparison_input = MSstatsPrepareForGroupComparison(QuantData) length(group_comparison_input) # list of length equal to number of proteins # in protein-level data of QuantData head(group_comparison_input[[1]])
Prepare feature-level data for protein-level summarization
MSstatsPrepareForSummarization( input, method, impute, censored_symbol, remove_uninformative_feature_outlier )
MSstatsPrepareForSummarization( input, method, impute, censored_symbol, remove_uninformative_feature_outlier )
input |
feature-level data processed by dataProcess subfunctions |
method |
summarization method - 'summaryMethod' parameter of the dataProcess function |
impute |
if TRUE, censored missing values will be imputed - 'MBimpute' parameter of the dataProcess function |
censored_symbol |
censored missing value indicator - 'censoredInt' parameter of the dataProcess function |
remove_uninformative_feature_outlier |
if TRUE, features labeled as outlier of uninformative by the MSstatsSelectFeatures function will not be used in summarization |
data.table
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) head(input)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) head(input)
Feature selection before feature-level data summarization
MSstatsSelectFeatures(input, method, top_n = 3, min_feature_count = 2)
MSstatsSelectFeatures(input, method, top_n = 3, min_feature_count = 2)
input |
data.table |
method |
"all" / "highQuality", "topN" |
top_n |
number of features to use for "topN" method |
min_feature_count |
number of quality features for "highQuality" method |
data.table
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input_all = MSstatsSelectFeatures(input, "all") # all features input_5 = MSstatsSelectFeatures(data.table::copy(input), "topN", top_n = 5) # top 5 features input_informative = MSstatsSelectFeatures(input, "highQuality") # feature selection head(input_all) head(input_5) head(input_informative)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input_all = MSstatsSelectFeatures(input, "all") # all features input_5 = MSstatsSelectFeatures(data.table::copy(input), "topN", top_n = 5) # top 5 features input_informative = MSstatsSelectFeatures(input, "highQuality") # feature selection head(input_all) head(input_5) head(input_informative)
Post-processing output from MSstats summarization
MSstatsSummarizationOutput( input, summarized, processed, method, impute, censored_symbol )
MSstatsSummarizationOutput( input, summarized, processed, method, impute, censored_symbol )
input |
'data.table' in MSstats format |
summarized |
output of the 'MSstatsSummarizeWithSingleCore' function |
processed |
output of MSstatsSelectFeatures |
method |
name of the summarization method ('summaryMethod' parameter to 'dataProcess') |
impute |
if TRUE, censored missing values were imputed ('MBimpute' parameter to 'dataProcess') |
censored_symbol |
censored missing value indicator ('censoredInt' parameter to 'dataProcess') |
list that consists of the following elements:
FeatureLevelData - feature-level data after processing
ProteinLevelData - protein-level (summarized) data
SummaryMethod (string) - name of summarization method that was used
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE) output = output = MSstatsSummarizationOutput(input, summarized, processed, method, impute, cens)
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE) output = output = MSstatsSummarizationOutput(input, summarized, processed, method, impute, cens)
Feature-level data summarization
MSstatsSummarize( proteins_list, method, impute, censored_symbol, remove50missing, equal_variance )
MSstatsSummarize( proteins_list, method, impute, censored_symbol, remove50missing, equal_variance )
proteins_list |
list of processed feature-level data |
method |
summarization method: "linear" or "TMP" |
impute |
only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored |
censored_symbol |
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing. |
remove50missing |
only for summaryMethod = "TMP". TRUE removes the proteins where every run has at least 50% missing values for each peptide. FALSE is default. |
equal_variance |
only for summaryMethod = "linear". Default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features. |
list of length one with run-level data.
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, TRUE) length(summarized) # list of summarization outputs for each protein head(summarized[[1]][[1]]) # run-level summary
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, TRUE) length(summarized) # list of summarization outputs for each protein head(summarized[[1]][[1]]) # run-level summary
Linear model-based summarization for a single protein
MSstatsSummarizeSingleLinear(single_protein, equal_variances = TRUE)
MSstatsSummarizeSingleLinear(single_protein, equal_variances = TRUE)
single_protein |
feature-level data for a single protein |
equal_variances |
if TRUE, observation are assumed to be homoskedastic |
list with protein-level data
raw = DDARawData method = "linear" cens = NULL impute = FALSE # currently, MSstats only supports MBimpute = FALSE for linear summarization MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) single_protein_summary = MSstatsSummarizeSingleLinear(input_split[[1]]) head(single_protein_summary[[1]])
raw = DDARawData method = "linear" cens = NULL impute = FALSE # currently, MSstats only supports MBimpute = FALSE for linear summarization MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) single_protein_summary = MSstatsSummarizeSingleLinear(input_split[[1]]) head(single_protein_summary[[1]])
Tukey Median Polish summarization for a single protein
MSstatsSummarizeSingleTMP( single_protein, impute, censored_symbol, remove50missing )
MSstatsSummarizeSingleTMP( single_protein, impute, censored_symbol, remove50missing )
single_protein |
feature-level data for a single protein |
impute |
only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored |
censored_symbol |
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing. |
remove50missing |
only for summaryMethod = "TMP". TRUE removes the proteins where every run has at least 50% missing values for each peptide. FALSE is default. |
list of two data.tables: one with fitted survival model, the other with protein-level data
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE # currently, MSstats only supports MBimpute = FALSE for linear summarization MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]], impute, cens, FALSE) head(single_protein_summary[[1]])
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE # currently, MSstats only supports MBimpute = FALSE for linear summarization MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) input_split = split(input, input$PROTEIN) single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]], impute, cens, FALSE) head(single_protein_summary[[1]])
Feature-level data summarization with multiple cores
MSstatsSummarizeWithMultipleCores( input, method, impute, censored_symbol, remove50missing, equal_variance, numberOfCores = 1 )
MSstatsSummarizeWithMultipleCores( input, method, impute, censored_symbol, remove50missing, equal_variance, numberOfCores = 1 )
input |
feature-level data processed by dataProcess subfunctions |
method |
summarization method: "linear" or "TMP" |
impute |
only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored |
censored_symbol |
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing. |
remove50missing |
only for summaryMethod = "TMP". TRUE removes the proteins where every run has at least 50% missing values for each peptide. FALSE is default. |
equal_variance |
only for summaryMethod = "linear". Default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features. |
numberOfCores |
Number of cores for parallel processing. When > 1, a logfile named 'MSstats_dataProcess_log_progress.log' is created to track progress. Only works for Linux & Mac OS. Default is 1. |
list of length one with run-level data.
Feature-level data summarization with 1 core
MSstatsSummarizeWithSingleCore( input, method, impute, censored_symbol, remove50missing, equal_variance )
MSstatsSummarizeWithSingleCore( input, method, impute, censored_symbol, remove50missing, equal_variance )
input |
feature-level data processed by dataProcess subfunctions |
method |
summarization method: "linear" or "TMP" |
impute |
only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored |
censored_symbol |
Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing. |
remove50missing |
only for summaryMethod = "TMP". TRUE removes the proteins where every run has at least 50% missing values for each peptide. FALSE is default. |
equal_variance |
only for summaryMethod = "linear". Default is TRUE. Logical variable for whether the model should account for heterogeneous variation among intensities from different features. Default is TRUE, which assume equal variance among intensities from features. FALSE means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features. |
list of length one with run-level data.
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE) length(summarized) # list of summarization outputs for each protein head(summarized[[1]][[1]]) # run-level summary
raw = DDARawData method = "TMP" cens = "NA" impute = TRUE MSstatsConvert::MSstatsLogsSettings(FALSE) input = MSstatsPrepareForDataProcess(raw, 2, NULL) input = MSstatsNormalize(input, "EQUALIZEMEDIANS") input = MSstatsMergeFractions(input) input = MSstatsHandleMissing(input, "TMP", TRUE, "NA", 0.999) input = MSstatsSelectFeatures(input, "all") processed = getProcessed(input) input = MSstatsPrepareForSummarization(input, method, impute, cens, FALSE) summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE) length(summarized) # list of summarization outputs for each protein head(summarized[[1]][[1]]) # run-level summary
Import OpenMS files
OpenMStoMSstatsFormat( input, annotation = NULL, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
OpenMStoMSstatsFormat( input, annotation = NULL, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
name of MSstats input report from OpenMS, which includes feature(peptide ion)-level data. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. Run should be the same as filename. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek.
openms_raw = data.table::fread(system.file("tinytest/raw_data/OpenMS/openms_input.csv", package = "MSstatsConvert")) openms_imported = OpenMStoMSstatsFormat(openms_raw, use_log_file = FALSE) head(openms_imported)
openms_raw = data.table::fread(system.file("tinytest/raw_data/OpenMS/openms_input.csv", package = "MSstatsConvert")) openms_imported = OpenMStoMSstatsFormat(openms_raw, use_log_file = FALSE) head(openms_imported)
Import OpenSWATH files
OpenSWATHtoMSstatsFormat( input, annotation, filter_with_mscore = TRUE, mscore_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
OpenSWATHtoMSstatsFormat( input, annotation, filter_with_mscore = TRUE, mscore_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
name of MSstats input report from OpenSWATH, which includes feature-level data. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. Run should be the same as filename. |
filter_with_mscore |
TRUE(default) will filter out the features that have greater than mscore_cutoff in m_score column. Those features will be removed. |
mscore_cutoff |
Cutoff for m_score. Default is 0.01. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek.
os_raw = system.file("tinytest/raw_data/OpenSWATH/openswath_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_os.csv", package = "MSstats") os_raw = data.table::fread(os_raw) annot = data.table::fread(annot) os_imported = OpenSWATHtoMSstatsFormat(os_raw, annot, use_log_file = FALSE) head(os_imported)
os_raw = system.file("tinytest/raw_data/OpenSWATH/openswath_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_os.csv", package = "MSstats") os_raw = data.table::fread(os_raw) annot = data.table::fread(annot) os_imported = OpenSWATHtoMSstatsFormat(os_raw, annot, use_log_file = FALSE) head(os_imported)
Import Proteome Discoverer files
PDtoMSstatsFormat( input, annotation, useNumProteinsColumn = FALSE, useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, which.quantification = "Precursor.Area", which.proteinid = "Protein.Group.Accessions", which.sequence = "Sequence", use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
PDtoMSstatsFormat( input, annotation, useNumProteinsColumn = FALSE, useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, which.quantification = "Precursor.Area", which.proteinid = "Protein.Group.Accessions", which.sequence = "Sequence", use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
PD report or a path to it. |
annotation |
name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run information. 'Run' will be matched with 'Spectrum.File'. |
useNumProteinsColumn |
TRUE removes peptides which have more than 1 in # Proteins column of PD output. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides |
TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Peptide |
TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. |
which.quantification |
Use 'Precursor.Area'(default) column for quantified intensities. 'Intensity' or 'Area' can be used instead. |
which.proteinid |
Use 'Protein.Accessions'(default) column for protein name. 'Master.Protein.Accessions' can be used instead. |
which.sequence |
Use 'Sequence'(default) column for peptide sequence. 'Annotated.Sequence' can be used instead. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek
pd_raw = system.file("tinytest/raw_data/PD/pd_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_pd.csv", package = "MSstats") pd_raw = data.table::fread(pd_raw) annot = data.table::fread(annot) pd_imported = PDtoMSstatsFormat(pd_raw, annot, use_log_file = FALSE) head(pd_imported)
pd_raw = system.file("tinytest/raw_data/PD/pd_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/annotations/annot_pd.csv", package = "MSstats") pd_raw = data.table::fread(pd_raw) annot = data.table::fread(annot) pd_imported = PDtoMSstatsFormat(pd_raw, annot, use_log_file = FALSE) head(pd_imported)
Import Progenesis files
ProgenesistoMSstatsFormat( input, annotation, useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
ProgenesistoMSstatsFormat( input, annotation, useUniquePeptide = TRUE, summaryforMultipleRows = max, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Peptide = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
name of Progenesis output, which is wide-format. 'Accession', 'Sequence', 'Modification', 'Charge' and one column for each run are required. |
annotation |
name of 'annotation.txt' or 'annotation.csv' data which includes Condition, BioReplicate, Run information. It will be matched with the column name of input for MS runs. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides |
TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Peptide |
TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek, Ulrich Omasits
progenesis_raw = system.file("tinytest/raw_data/Progenesis/progenesis_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/raw_data/Progenesis/progenesis_annot.csv", package = "MSstatsConvert") progenesis_raw = data.table::fread(progenesis_raw) annot = data.table::fread(annot) progenesis_imported = ProgenesistoMSstatsFormat(progenesis_raw, annot, use_log_file = FALSE) head(progenesis_imported)
progenesis_raw = system.file("tinytest/raw_data/Progenesis/progenesis_input.csv", package = "MSstatsConvert") annot = system.file("tinytest/raw_data/Progenesis/progenesis_annot.csv", package = "MSstatsConvert") progenesis_raw = data.table::fread(progenesis_raw) annot = data.table::fread(annot) progenesis_imported = ProgenesistoMSstatsFormat(progenesis_raw, annot, use_log_file = FALSE) head(progenesis_imported)
Model-based quantification for each condition or for each biological
sample per protein in a targeted Selected Reaction Monitoring (SRM),
Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition
(DIA or SWATH-MS) experiment. Quantification takes the processed data set
by dataProcess
as input and automatically generate the quantification
results (data.frame) in a long or matrix format.
quantification( data, type = "Sample", format = "matrix", use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL )
quantification( data, type = "Sample", format = "matrix", use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL )
data |
name of the (processed) data set. |
type |
choice of quantification. "Sample" or "Group" for protein sample quantification or group quantification. |
format |
choice of returned format. "long" for long format which has the columns named Protein, Condition, LogIntensities (and BioReplicate if it is subject quantification), NumFeature for number of transitions for a protein, and NumPeaks for number of observed peak intensities for a protein. "matrix" for data matrix format which has the rows for Protein and the columns, which are Groups(or Conditions) for group quantification or the combinations of BioReplicate and Condition (labeled by "BioReplicate"_"Condition") for sample quantification. Default is "matrix" |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
Sample quantification : individual biological sample quantification for each protein. The label of each biological sample is a combination of the corresponding group and the sample ID. If there are no technical replicates or experimental replicates per sample, sample quantification is the same as run summarization from dataProcess. If there are technical replicates or experimental replicates, sample quantification is median among run quantification corresponding MS runs.
Group quantification : quantification for individual group or individual condition per protein. It is median among sample quantification.
The quantification for endogenous samples is based on run summarization from subplot model, with TMP robust estimation.
data.frame as described in details.
# Consider quantitative data (i.e. QuantData) from a yeast study with ten time points of # interests, three biological replicates, and no technical replicates which is # a time-course experiment. # Sample quantification shows model-based estimation of protein abundance in each biological # replicate within each time point. # Group quantification shows model-based estimation of protein abundance in each time point. QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) # Sample quantification sampleQuant<-quantification(QuantData, use_log_file = FALSE) head(sampleQuant) # Group quantification groupQuant<-quantification(QuantData, type="Group", use_log_file = FALSE) head(groupQuant)
# Consider quantitative data (i.e. QuantData) from a yeast study with ten time points of # interests, three biological replicates, and no technical replicates which is # a time-course experiment. # Sample quantification shows model-based estimation of protein abundance in each biological # replicate within each time point. # Group quantification shows model-based estimation of protein abundance in each time point. QuantData<-dataProcess(SRMRawData, use_log_file = FALSE) head(QuantData$FeatureLevelData) # Sample quantification sampleQuant<-quantification(QuantData, use_log_file = FALSE) head(sampleQuant) # Group quantification groupQuant<-quantification(QuantData, type="Group", use_log_file = FALSE) head(groupQuant)
Save a plot to pdf file
savePlot(name_base, file_name, width, height)
savePlot(name_base, file_name, width, height)
name_base |
path to a folder (or "" for working directory) |
file_name |
name of a file to save. If this file already exists, an integer will be appended to this name |
width |
width of a plot |
height |
height of a plot |
Takes an SDRF file and outputs an MSstats annotation file. Note the information in the SDRF file must be correctly annotated for MSstats so that MSstats can identify the experimental design. In particular the biological replicates must be correctly annotated, with group comparison experiments having a unique ID for each BioReplicate. For more information on this please see the Supplementary of the most recent MSstats paper
SDRFtoAnnotation( data, run_name = "comment[data file]", condition_name = "characteristics[disease]", biological_replicate = "characteristics[biological replicate]", fraction = NULL )
SDRFtoAnnotation( data, run_name = "comment[data file]", condition_name = "characteristics[disease]", biological_replicate = "characteristics[biological replicate]", fraction = NULL )
data |
SDRF annotation file |
run_name |
Column name in SDRF file which contains the name of the MS run. The information in this column must match exactly with the run names in the PSM file |
condition_name |
Column name in SDRF file which contains information on the conditions in the data. |
biological_replicate |
Column name in SDRF file which contains the identifier for the biological replicte. Note MSstats uses this column to determine if the experiment is a repeated measure design. BioReplicte IDs should only be reused if the replicate was measured multiple times. |
fraction |
Column name in SDFT file which contains information on the fractionation in the data. Only required if data contains fractions. Default is 'NULL' |
head(example_SDRF) msstats_annotation = SDRFtoAnnotation(example_SDRF) head(msstats_annotation)
head(example_SDRF) msstats_annotation = SDRFtoAnnotation(example_SDRF) head(msstats_annotation)
Import Skyline files
SkylinetoMSstatsFormat( input, annotation = NULL, removeiRT = TRUE, filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Feature = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
SkylinetoMSstatsFormat( input, annotation = NULL, removeiRT = TRUE, filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeOxidationMpeptides = FALSE, removeProtein_with1Feature = FALSE, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
name of MSstats input report from Skyline, which includes feature-level data. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Skyline, use annotation=NULL (default). It will use the annotation information from input. |
removeiRT |
TRUE (default) will remove the proteins or peptides which are labeled 'iRT' in 'StandardType' column. FALSE will keep them. |
filter_with_Qvalue |
TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in DetectionQValue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose. |
qvalue_cutoff |
Cutoff for DetectionQValue. default is 0.01. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeOxidationMpeptides |
TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek
skyline_raw = system.file("tinytest/raw_data/Skyline/skyline_input.csv", package = "MSstatsConvert") skyline_raw = data.table::fread(skyline_raw) skyline_imported = SkylinetoMSstatsFormat(skyline_raw) head(skyline_imported)
skyline_raw = system.file("tinytest/raw_data/Skyline/skyline_input.csv", package = "MSstatsConvert") skyline_raw = data.table::fread(skyline_raw) skyline_imported = SkylinetoMSstatsFormat(skyline_raw) head(skyline_imported)
Import Spectronaut files
SpectronauttoMSstatsFormat( input, annotation = NULL, intensity = "PeakArea", filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
SpectronauttoMSstatsFormat( input, annotation = NULL, intensity = "PeakArea", filter_with_Qvalue = TRUE, qvalue_cutoff = 0.01, useUniquePeptide = TRUE, removeFewMeasurements = TRUE, removeProtein_with1Feature = FALSE, summaryforMultipleRows = max, use_log_file = TRUE, append = FALSE, verbose = TRUE, log_file_path = NULL, ... )
input |
name of Spectronaut output, which is long-format. ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity, F.ExcludedFromQuantification are required. Rows with F.ExcludedFromQuantification=True will be removed. |
annotation |
name of 'annotation.txt' data which includes Condition, BioReplicate, Run. If annotation is already complete in Spectronaut, use annotation=NULL (default). It will use the annotation information from input. |
intensity |
'PeakArea'(default) uses not normalized peak area. 'NormalizedPeakArea' uses peak area normalized by Spectronaut. |
filter_with_Qvalue |
TRUE(default) will filter out the intensities that have greater than qvalue_cutoff in EG.Qvalue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose. |
qvalue_cutoff |
Cutoff for EG.Qvalue. default is 0.01. |
useUniquePeptide |
TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein. |
removeFewMeasurements |
TRUE (default) will remove the features that have 1 or 2 measurements across runs. |
removeProtein_with1Feature |
TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. |
summaryforMultipleRows |
max(default) or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. |
use_log_file |
logical. If TRUE, information about data processing will be saved to a file. |
append |
logical. If TRUE, information about data processing will be added to an existing log file. |
verbose |
logical. If TRUE, information about data processing wil be printed to the console. |
log_file_path |
character. Path to a file to which information about data processing will be saved. If not provided, such a file will be created automatically. If 'append = TRUE', has to be a valid path to a file. |
... |
additional parameters to 'data.table::fread'. |
data.frame in the MSstats required format.
Meena Choi, Olga Vitek
spectronaut_raw = system.file("tinytest/raw_data/Spectronaut/spectronaut_input.csv", package = "MSstatsConvert") spectronaut_raw = data.table::fread(spectronaut_raw) spectronaut_imported = SpectronauttoMSstatsFormat(spectronaut_raw, use_log_file = FALSE) head(spectronaut_imported)
spectronaut_raw = system.file("tinytest/raw_data/Spectronaut/spectronaut_input.csv", package = "MSstatsConvert") spectronaut_raw = data.table::fread(spectronaut_raw) spectronaut_imported = SpectronauttoMSstatsFormat(spectronaut_raw, use_log_file = FALSE) head(spectronaut_imported)
This is a partial data set obtained from a published study (Picotti, et. al, 2009). The experiment targeted 45 proteins in the glycolysis/gluconeogenesis/TCA cycle/glyoxylate cycle network, which spans the range of protein abundance from less than 128 to 10E6 copies per cell. Three biological replicates were analyzed at ten time points (T1-T10), while yeasts transited through exponential growth in a glucose-rich medium (T1-T4), diauxic shift (T5-T6), post-diauxic phase (T7-T9), and stationary phase (T10). Prior to trypsinization, the samples were mixed with an equal amount of proteins from the same N15-labeled yeast sample, which was used as a reference. Each sample was profiled in a single mass spectrometry run, where each protein was represented by up to two peptides and each peptide by up to three transitions. The goal of this study is to detect significantly change in protein abundance across time points. Transcriptional activity under the same experimental conditions has been previously investigated by (DeRisi et. al., 1997). Genes coding for 29 of the proteins are differentially expressed between conditions similar to those represented by T7 and T1 and could be treated as external sources to validate the proteomics analysis. In this exampled data set, two of the targeted proteins are selected and validated with gene expression study: Protein IDHC (gene name IDP2) is differentially expressed in time point 1 and time point 7, whereas, Protein PMG2 (gene name GPM2) is not. The protein names are based on Swiss Prot Name.
SRMRawData
SRMRawData
data.frame
The raw data (input data for MSstats) is required to contain variable of ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity. The variable names should be fixed.
If the information of one or more columns is not available for the original raw data, please retain the column variables and type in fixed value. For example, the original raw data does not contain the information of ProductCharge, we retain the column ProductCharge and type in NA for all transitions in RawData.
The column BioReplicate should label with unique patient ID (i.e., same patients should label with the same ID).
Variable Intensity is required to be original signal without any log transformation and can be specified as the peak of height or the peak of area under curve.
data.frame with the required format of MSstats.
Meena Choi, Olga Vitek.
Maintainer: Meena Choi ([email protected])
Ching-Yun Chang, Paola Picotti, Ruth Huttenhain, Viola Heinzelmann-Schwarz, Marko Jovanovic, Ruedi Aebersold, Olga Vitek. Protein significance analysis in selected reaction monitoring (SRM) measurements. Molecular & Cellular Proteomics, 11:M111.014662, 2012.
head(SRMRawData)
head(SRMRawData)
Theme for MSstats plots
theme_msstats( type, x.axis.size = 10, y.axis.size = 10, legend_size = 13, strip_background = element_rect(fill = "gray95"), strip_text_x = element_text(colour = c("black"), size = 14), legend_position = "top", legend_box = "vertical", text_angle = 0, text_hjust = NULL, text_vjust = NULL, ... )
theme_msstats( type, x.axis.size = 10, y.axis.size = 10, legend_size = 13, strip_background = element_rect(fill = "gray95"), strip_text_x = element_text(colour = c("black"), size = 14), legend_position = "top", legend_box = "vertical", text_angle = 0, text_hjust = NULL, text_vjust = NULL, ... )
type |
type of a plot |
x.axis.size |
size of text on the x axis |
y.axis.size |
size of text on the y axis |
legend_size |
size of the legend |
strip_background |
background of facet |
strip_text_x |
size of text on facets |
legend_position |
position of the legend |
legend_box |
legend.box |
text_angle |
angle of text on the x axis (for condition and comparison plots) |
text_hjust |
hjust parameter for x axis text (for condition and comparison plots) |
text_vjust |
vjust parameter for x axis text (for condition and comparison plots) |
... |
additional parameters passed on to ggplot2::theme() |
Check if annotation matches intended experimental design
validateAnnotation(msstats_table, design_type = "group comparison")
validateAnnotation(msstats_table, design_type = "group comparison")
msstats_table |
output of a converter function |
design_type |
character, "group comparison" or "repeated measures" |
TRUE if annotation file is consistent with intended experimental design. Otherwise, an error is thrown