| Title: | Robust statistical inference for quantitative LC-MS proteomics |
|---|---|
| Description: | msqrob2 provides a robust linear mixed model framework for assessing differential abundance in MS-based Quantitative proteomics experiments. Our workflows can start from raw peptide intensities or summarised protein expression values. The model parameter estimates can be stabilized by ridge regression, empirical Bayes variance estimation and robust M-estimation. msqrob2's hurde workflow can handle missing data without having to rely on hard-to-verify imputation assumptions, and, outcompetes state-of-the-art methods with and without imputation for both high and low missingness. It builds on QFeature infrastructure for quantitative mass spectrometry data to store the model results together with the raw data and preprocessed data. |
| Authors: | Lieven Clement [aut, cre] (ORCID: <https://orcid.org/0000-0002-9050-4370>), Laurent Gatto [aut] (ORCID: <https://orcid.org/0000-0002-1520-2268>), Oliver M. Crook [aut] (ORCID: <https://orcid.org/0000-0001-5669-8506>), Adriaan Sticker [ctb], Ludger Goeminne [ctb], Milan Malfait [ctb] (ORCID: <https://orcid.org/0000-0001-9144-3701>), Stijn Vandenbulcke [aut] |
| Maintainer: | Lieven Clement <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.21.0 |
| Built: | 2026-05-30 09:40:35 UTC |
| Source: | https://github.com/bioc/msqrob2 |
Helper function to calculate sample-specific normalization factors on the log2 scale using conventional median normalisation
.computeNfLogMedian(mat, na.rm = TRUE).computeNfLogMedian(mat, na.rm = TRUE)
mat |
A |
na.rm |
Logical; indicates if missing values are to be removed.
Default is |
Implementation assumes that the assay values are already on the log scale. The normalization factors are computed on the log scale.
A numeric vector of log-scale normalization factors, one per sample (column).
Helper function to calculate sample-specific normalization factors on the log2 scale using a median-of-ratios approach similar to that used in DESeq2 for bulk RNA-seq data.
The method proceeds as follows:
A pseudo-reference sample is constructed as the row-wise mean of the log2 intensities (equivalent to the log2-transformed geometric mean).
For each sample, log2 ratios relative to pseudo-reference are computed.
The normalization factor for each sample is obtained as the median of these log2 ratios (column-wise median).
.computeNfLogMedianOfRatios(mat, na.rm = TRUE).computeNfLogMedianOfRatios(mat, na.rm = TRUE)
mat |
A |
na.rm |
Logical; indicates if missing values are to be removed.
Default is |
This implementation assumes that assay values are already on the log scale. The normalization factors are computed on the log scale.
A numeric vector of log2-scale normalization factors, one per sample (column).
Construct all contrasts for all pairwise comparisons between all levels of a factor.
createPairwiseContrasts( formula, coldata, var, ridge = FALSE, nullHypothesis = " = 0" )createPairwiseContrasts( formula, coldata, var, ridge = FALSE, nullHypothesis = " = 0" )
formula |
Model formula. The model is built based on the covariates in the 'coldata' object. |
coldata |
data.frame or DFrame with information on the design. |
var |
object of type 'character' with the name of the factor in the formula for which the pairwise comparisons will be made. |
ridge |
logical indicating if the msqrob models are fitted with or without ridge regression. The default is 'ridge = FALSE' |
nullHypothesis |
object of type character that specifies the value of the contrast under the null hypothesis. The default is ' = 0'. |
Vector of type character with nulhypotheses for all contrasts.
# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Model with intercept (conditiona is reference class) createPairwiseContrasts(~ condition, colData(pe), "condition") # Model without intercept createPairwiseContrasts(~ -1 + condition, colData(pe), "condition")# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Model with intercept (conditiona is reference class) createPairwiseContrasts(~ condition, colData(pe), "condition") # Model without intercept createPairwiseContrasts(~ -1 + condition, colData(pe), "condition")
Methods for StatModel class
to calculate contrasts of the model parameters
to calculate the variance-covariance matrix of the contrasts
## S4 method for signature 'StatModel' getContrast(object, L) ## S4 method for signature 'StatModel' varContrast(object, L)## S4 method for signature 'StatModel' getContrast(object, L) ## S4 method for signature 'StatModel' varContrast(object, L)
object |
A list with elements of the class |
L |
contrast |
A matrix with the calculated contrasts or variance-covariance matrix of contrasts
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Define contrast for log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) getContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L) varContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L)data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Define contrast for log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) getContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L) varContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L)
Accessor functions for StatModel class
to get model
to get the parameter estimation method
to get the parameter estimates of the mean model
to get the residual degrees of freedom of the model
to get the residual variance of the model
to get the residual standard deviation of the model
to get the degrees of freedom of the empirical Bayes variance estimator
to get the empirical Bayes variance
to get the empirical Bayes standard deviation
to get the unscaled variance covariance matrix of the model parameters
## S4 method for signature 'StatModel' getModel(object) ## S4 method for signature 'StatModel' getFitMethod(object) ## S4 method for signature 'StatModel' getCoef(object) ## S4 method for signature 'StatModel' getDfPosterior(object) ## S4 method for signature 'StatModel' getVarPosterior(object) ## S4 method for signature 'StatModel' getSigmaPosterior(object) ## S4 method for signature 'StatModel' getDF(object) ## S4 method for signature 'StatModel' getVar(object) ## S4 method for signature 'StatModel' getSigma(object) ## S4 method for signature 'StatModel' getVcovUnscaled(object)## S4 method for signature 'StatModel' getModel(object) ## S4 method for signature 'StatModel' getFitMethod(object) ## S4 method for signature 'StatModel' getCoef(object) ## S4 method for signature 'StatModel' getDfPosterior(object) ## S4 method for signature 'StatModel' getVarPosterior(object) ## S4 method for signature 'StatModel' getSigmaPosterior(object) ## S4 method for signature 'StatModel' getDF(object) ## S4 method for signature 'StatModel' getVar(object) ## S4 method for signature 'StatModel' getSigma(object) ## S4 method for signature 'StatModel' getVcovUnscaled(object)
object |
|
The requested parameter of the StatModel object
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) getModel(rowData(pe[["protein"]])$msqrobModels[[1]]) getFitMethod(rowData(pe[["protein"]])$msqrobModels[[1]]) # Similar for the remaining accessorsdata(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) getModel(rowData(pe[["protein"]])$msqrobModels[[1]]) getFitMethod(rowData(pe[["protein"]])$msqrobModels[[1]]) # Similar for the remaining accessors
Summary table of the estimates for differential expression of features
## S4 method for signature 'SummarizedExperiment' hypothesisTest( object, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'SummarizedExperiment' hypothesisTestHurdle( object, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTest( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTestHurdle( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE )## S4 method for signature 'SummarizedExperiment' hypothesisTest( object, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'SummarizedExperiment' hypothesisTestHurdle( object, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTest( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTestHurdle( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE )
object |
|
contrast |
|
adjust.method |
|
modelColumn |
|
resultsColumnNamePrefix |
|
overwrite |
|
i |
|
A SummarizedExperiment or a QFeatures instance augmented with the test
results.
Lieven Clement
# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b L <- makeContrast( "conditionc - conditionb=0", c("conditionb", "conditionc") ) # example SummarizedExperiment instance se <- pe[["protein"]] se <- hypothesisTest(se, L) head(rowData(se)$"conditionc - conditionb", 10) # Volcano plot plot(-log10(pval) ~ logFC, rowData(se)$"conditionc - conditionb", col = (adjPval < 0.05) + 1 ) # Example for QFeatures instance # Assess log2 fold change between condition b and condition a (reference class), # condition c and condition a, and, condition c and condition b. L <- makeContrast( c( "conditionb=0", "conditionc=0", "conditionc - conditionb=0" ), c("conditionb", "conditionc") ) pe <- hypothesisTest(pe, i = "protein", L) head(rowData(pe[["protein"]])$"conditionb", 10) # Volcano plots par(mfrow = c(1, 3)) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC b-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc", col = (adjPval < 0.05) + 1, main = "log2 FC c-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc - conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC c-b" ) # Hurdle method pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) pe <- hypothesisTestHurdle(pe, i = "protein", L) head(rowData(pe[["protein"]])$"hurdle_conditionb", 10)# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b L <- makeContrast( "conditionc - conditionb=0", c("conditionb", "conditionc") ) # example SummarizedExperiment instance se <- pe[["protein"]] se <- hypothesisTest(se, L) head(rowData(se)$"conditionc - conditionb", 10) # Volcano plot plot(-log10(pval) ~ logFC, rowData(se)$"conditionc - conditionb", col = (adjPval < 0.05) + 1 ) # Example for QFeatures instance # Assess log2 fold change between condition b and condition a (reference class), # condition c and condition a, and, condition c and condition b. L <- makeContrast( c( "conditionb=0", "conditionc=0", "conditionc - conditionb=0" ), c("conditionb", "conditionc") ) pe <- hypothesisTest(pe, i = "protein", L) head(rowData(pe[["protein"]])$"conditionb", 10) # Volcano plots par(mfrow = c(1, 3)) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC b-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc", col = (adjPval < 0.05) + 1, main = "log2 FC c-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc - conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC c-b" ) # Hurdle method pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) pe <- hypothesisTestHurdle(pe, i = "protein", L) head(rowData(pe[["protein"]])$"hurdle_conditionb", 10)
Construct the contrast matrix corresponding to specified contrasts of a set of parameters.
makeContrast(contrasts, parameterNames)makeContrast(contrasts, parameterNames)
contrasts |
character vector specifying contrasts, i.e. the linear combination of the modelparameters that equals to zero. |
parameterNames |
character vector specifying the model parameters that are involved in the contrasts, e.g if we model data of three conditions using a factor condition with three levels a, b and c then our model will have 3 mean parameters named (Intercept), conditionb and conditionc. Hence the log2 fold change between b and a is conditionb. Under the null hypothesis the log2 fold change equals 0. Which is to be encoded as "conditionb=0". If we would like to test for log2 fold change between condition c and b we assess if the log2 fold change conditionc-conditionb equals 0, encoded as "condtionb-conditionc=0". |
A numeric contrast matrix with rownames that equal the model parameters that are involved in the contrasts
makeContrast(c("conditionb = 0"), parameterNames = c( "(Intercept)", "conditionb", "conditionc" ) ) makeContrast(c("conditionc=0"), parameterNames = c("conditionc") ) makeContrast(c( "conditionb=0", "conditionc=0", "conditionc-conditionb=0" ), parameterNames = c( "conditionb", "conditionc" ) )makeContrast(c("conditionb = 0"), parameterNames = c( "(Intercept)", "conditionb", "conditionc" ) ) makeContrast(c("conditionc=0"), parameterNames = c("conditionc") ) makeContrast(c( "conditionb=0", "conditionc=0", "conditionc-conditionb=0" ), parameterNames = c( "conditionb", "conditionc" ) )
Parameter estimation of msqrob models for QFeatures
and SummarizedExperiment instance.
## S4 method for signature 'SummarizedExperiment' msqrob( object, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) ) ## S4 method for signature 'QFeatures' msqrob( object, i, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )## S4 method for signature 'SummarizedExperiment' msqrob( object, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) ) ## S4 method for signature 'QFeatures' msqrob( object, i, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
object |
|
formula |
Model formula. The model is built based on the covariates in the data object. |
modelColumnName |
|
overwrite |
|
robust |
|
ridge |
|
maxitRob |
|
tol |
|
doQR |
|
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
i |
|
A SummarizedExperiment or a QFeatures instance with the models.
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust linear regression upon summarization of # peptide intensities into protein expression values. # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrob(se, formula = ~condition, modelColumnName = "rlm") getCoef(rowData(se)$rlm[[1]]) # For features object pe <- msqrob(pe, i = "protein", formula = ~condition, modelColumnName = "rlm") # with ridge regression (slower) pe <- msqrob(pe, i = "protein", formula = ~condition, ridge = TRUE, modelColumnName = "ridge") # compare for human protein (no DE)==> large shrinkage to zero cbind(getCoef(rowData(pe[["protein"]])$rlm[[1]]), getCoef(rowData(pe[["protein"]])$ridge[[1]])) # compare for ecoli protein (DE)==> almost no shrinkage to zero cbind( getCoef(rowData(pe[["protein"]])$rlm[["P00956"]]), getCoef(rowData(pe[["protein"]])$ridge[["P00956"]]) )# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust linear regression upon summarization of # peptide intensities into protein expression values. # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrob(se, formula = ~condition, modelColumnName = "rlm") getCoef(rowData(se)$rlm[[1]]) # For features object pe <- msqrob(pe, i = "protein", formula = ~condition, modelColumnName = "rlm") # with ridge regression (slower) pe <- msqrob(pe, i = "protein", formula = ~condition, ridge = TRUE, modelColumnName = "ridge") # compare for human protein (no DE)==> large shrinkage to zero cbind(getCoef(rowData(pe[["protein"]])$rlm[[1]]), getCoef(rowData(pe[["protein"]])$ridge[[1]])) # compare for ecoli protein (DE)==> almost no shrinkage to zero cbind( getCoef(rowData(pe[["protein"]])$rlm[["P00956"]]), getCoef(rowData(pe[["protein"]])$ridge[["P00956"]]) )
Parameter estimation of msqrob models for QFeaturesinstance.
The method aggregates features within the model e.g. from peptides to proteins.
It provides fold change estimates and their associated uncertainty at the aggregated
level (e.g. protein level) while correcting for the peptide species that are observed
in each sample. It also addresses the correlation in the data, e.g. the peptide data
for the same protein in a sample are correlate because they originate from the same
protein pool. The method however does not return aggregated expression values for each sample.
For visualisation purposes aggregated expression values are provide by the aggregateFeatures
function from the QFeatures Package
## S4 method for signature 'SummarizedExperiment' msqrobAggregate( object, formula, fcol, aggregateFun = MsCoreUtils::robustSummary, modelColumnName = "msqrobModels", robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) ) ## S4 method for signature 'QFeatures' msqrobAggregate( object, formula, i, fcol, name = "msqrobAggregate", aggregateFun = MsCoreUtils::robustSummary, modelColumnName = "msqrobModels", robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )## S4 method for signature 'SummarizedExperiment' msqrobAggregate( object, formula, fcol, aggregateFun = MsCoreUtils::robustSummary, modelColumnName = "msqrobModels", robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) ) ## S4 method for signature 'QFeatures' msqrobAggregate( object, formula, i, fcol, name = "msqrobAggregate", aggregateFun = MsCoreUtils::robustSummary, modelColumnName = "msqrobModels", robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
object |
|
formula |
Model formula. The model is built based on the covariates in the data object. |
fcol |
The feature variable of assay ‘i’ defining how to summarise the features. |
aggregateFun |
A function used for quantitative feature aggregation.
Details can be found in the documentation of the |
modelColumnName |
|
robust |
|
ridge |
|
maxitRob |
|
tol |
|
doQR |
|
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
i |
|
name |
A ‘character(1)’ naming the new assay. Default is ‘newAssay’. Note that the function will fail if there's already an assay with ‘name’. |
A ‘QFeatures’ object with an additional assay.
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. colData(pe)$samples <- rownames(colData(pe)) pe <- msqrobAggregate(pe, i = "peptide", fcol = "Proteins", formula = ~condition + (1|samples) + (1|Sequence), ridge = TRUE) getCoef(rowData(pe[["msqrobAggregate"]])$msqrobModels[["P00956"]]) ## Same but on a SummarizedExperiment object se <- getWithColData(pe, "peptide") se <- msqrobAggregate(se, fcol = "Proteins", formula = ~condition + (1|samples) + (1|Sequence), ridge = TRUE) getCoef(rowData(se)$msqrobModels[["P00956"]])# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. colData(pe)$samples <- rownames(colData(pe)) pe <- msqrobAggregate(pe, i = "peptide", fcol = "Proteins", formula = ~condition + (1|samples) + (1|Sequence), ridge = TRUE) getCoef(rowData(pe[["msqrobAggregate"]])$msqrobModels[["P00956"]]) ## Same but on a SummarizedExperiment object se <- getWithColData(pe, "peptide") se <- msqrobAggregate(se, fcol = "Proteins", formula = ~condition + (1|samples) + (1|Sequence), ridge = TRUE) getCoef(rowData(se)$msqrobModels[["P00956"]])
Function to collect the inference tables generated by the msqrob2 statistical inference workflow.
msqrobCollect(object, contrast, resultsColumnNamePrefix = "", combine = TRUE)msqrobCollect(object, contrast, resultsColumnNamePrefix = "", combine = TRUE)
object |
A |
contrast |
A numeric |
resultsColumnNamePrefix |
A |
combine |
A |
Result tables for contrasts combined in a single table or list
## Load example data ## The data are a Feature object containing ## a SummarizedExperiment named "peptide" with MaxQuant peptide intensities ## The data are a subset of spike-in the human-ecoli study ## The variable condition in the colData of the Feature object ## contains information on the spike in condition a-e (from low to high) data(pe) ## Aggregate peptide intensities in protein expression values ## Note that the peptide intensities were already normalised! pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") ## Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) ## Define contrast design <- model.matrix(~condition, data = colData(pe)) ## Assess log2 fold change between reference condition a and condition b L <- makeContrast( contrasts = c("conditionb = 0", "conditionc = 0", "conditionc - conditionb = 0"), parameterNames = colnames(design)) pe <- hypothesisTest(pe, i = "protein", L) ## Extract Results inference <- msqrobCollect(pe[["protein"]], L) head(inference)## Load example data ## The data are a Feature object containing ## a SummarizedExperiment named "peptide" with MaxQuant peptide intensities ## The data are a subset of spike-in the human-ecoli study ## The variable condition in the colData of the Feature object ## contains information on the spike in condition a-e (from low to high) data(pe) ## Aggregate peptide intensities in protein expression values ## Note that the peptide intensities were already normalised! pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") ## Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) ## Define contrast design <- model.matrix(~condition, data = colData(pe)) ## Assess log2 fold change between reference condition a and condition b L <- makeContrast( contrasts = c("conditionb = 0", "conditionc = 0", "conditionc - conditionb = 0"), parameterNames = colnames(design)) pe <- hypothesisTest(pe, i = "protein", L) ## Extract Results inference <- msqrobCollect(pe[["protein"]], L) head(inference)
Low-level function for parameter estimation with msqrob by modeling peptide counts using quasibinomial glm
msqrobGlm(y, npep, formula, data, priorCount = 0.1, binomialBound = TRUE)msqrobGlm(y, npep, formula, data, priorCount = 0.1, binomialBound = TRUE)
y |
A |
npep |
A vector with number of peptides per protein. It has as length the number of rows of y. The counts are equal or larger than the largest peptide count in y. |
formula |
Model formula. The model is built based on the covariates in the data object. |
data |
A |
priorCount |
A 'numeric(1)', which is a prior count to be added to the observations to shrink the estimated log-fold-changes towards zero. |
binomialBound |
logical, if ‘TRUE’ then the quasibinomial variance estimator will be never smaller than 1 (no underdispersion). |
A list of objects of the StatModel class.
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobGlm( aggcounts(pe[["protein"]]), rowData(pe[["protein"]])[[".n"]], ~condition, colData(pe) ) getCoef(models[[1]])# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobGlm( aggcounts(pe[["protein"]]), rowData(pe[["protein"]])[[".n"]], ~condition, colData(pe) ) getCoef(models[[1]])
Fitting a hurdle msqrob model with an intensity component for assessing differential abundance and an count component that is modeling peptide counts using quasibinomial glm for differential detection of the number of features that were not missing and used for aggregation
## S4 method for signature 'SummarizedExperiment' msqrobHurdle( object, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobHurdle( object, i, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE )## S4 method for signature 'SummarizedExperiment' msqrobHurdle( object, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobHurdle( object, i, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE )
object |
|
formula |
Model formula. Both model components are built based on the covariates in the data object. |
modelColumnName |
|
overwrite |
|
robust |
|
ridge |
|
maxitRob |
|
tol |
|
doQR |
|
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
priorCount |
A 'numeric(1)', which is a prior count to be added to the observations to shrink the estimated odds ratios of the count component towards zero. Default is 0.1. |
binomialBound |
logical, if ‘TRUE’ then the quasibinomial variance estimator will be never smaller than 1 (no underdispersion). Default is TRUE. |
i |
|
SummarizedExperiment or QFeatures instance
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities to protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit Hurdle MSqrob model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobHurdle(se, formula = ~condition) getCoef(rowData(se)$msqrobHurdleIntensity[[1]]) getCoef(rowData(se)$msqrobHurdleCount[[1]]) # For features object pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobHurdleIntensity[[1]]) getCoef(rowData(pe[["protein"]])$msqrobHurdleCount[[1]])# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities to protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit Hurdle MSqrob model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobHurdle(se, formula = ~condition) getCoef(rowData(se)$msqrobHurdleIntensity[[1]]) getCoef(rowData(se)$msqrobHurdleCount[[1]]) # For features object pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobHurdleIntensity[[1]]) getCoef(rowData(pe[["protein"]])$msqrobHurdleCount[[1]])
Low-level function for parameter estimation with msqrob using the ordinary least squares or robust regression base on the MASS::rlm function.
msqrobLm(y, formula, data, robust = TRUE, maxitRob = 5)msqrobLm(y, formula, data, robust = TRUE, maxitRob = 5)
y |
A |
formula |
Model formula. The model is built based on the covariates in the data object. |
data |
A |
robust |
|
maxitRob |
|
A list of objects of the StatModel class.
Lieven Clement, Oliver M. Crook
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobLm(assay(pe[["protein"]]), ~condition, colData(pe)) #' getCoef(models[[1]])# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobLm(assay(pe[["protein"]]), ~condition, colData(pe)) #' getCoef(models[[1]])
Low-level function for parameter estimation with msqrob using the robust ridge regression. The models can be fitted for each feature (e.g. summarised protein expression values) or multiple features belonging to the same accession can be modelled simultaneously e.g. peptide-based models where all peptide intensities for the same protein are modelled simultaneously. The fold changes and uncertainty estimates are then calculated at the protein level while correcting for peptide species and within sample correlation.
msqrobLmer( y, formula, data, rowdata = NULL, tol = 1e-06, robust = TRUE, ridge = FALSE, maxitRob = 1, doQR = TRUE, featureGroups = NULL, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )msqrobLmer( y, formula, data, rowdata = NULL, tol = 1e-06, robust = TRUE, ridge = FALSE, maxitRob = 1, doQR = TRUE, featureGroups = NULL, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
y |
A |
formula |
Model formula. The model is built based on the covariates in the data object. |
data |
A |
rowdata |
A |
tol |
|
robust |
|
ridge |
|
maxitRob |
|
doQR |
|
featureGroups |
vector of type |
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
A list of objects of the StatModel class.
Lieven Clement, Oliver M. Crook
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust ridge regression upon summarization of # peptide intensities into protein expression values modelsRidge <- msqrobLmer(assay(pe[["protein"]]), ~condition, data = colData(pe), ridge = TRUE) getCoef(modelsRidge[[1]]) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. # Add the samples variable to colData colData(pe)$samples <- rownames(colData(pe)) modelsPepBased <- msqrobLmer(assay(pe[["peptide"]]), formula = ~condition + (1|samples) + (1|Sequence), data = colData(pe), rowdata = rowData(pe[["peptide"]]), featureGroups = rowData(pe[["peptide"]])$Proteins, ridge = TRUE) getCoef(modelsPepBased[[1]])# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust ridge regression upon summarization of # peptide intensities into protein expression values modelsRidge <- msqrobLmer(assay(pe[["protein"]]), ~condition, data = colData(pe), ridge = TRUE) getCoef(modelsRidge[[1]]) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. # Add the samples variable to colData colData(pe)$samples <- rownames(colData(pe)) modelsPepBased <- msqrobLmer(assay(pe[["peptide"]]), formula = ~condition + (1|samples) + (1|Sequence), data = colData(pe), rowdata = rowData(pe[["peptide"]]), featureGroups = rowData(pe[["peptide"]])$Proteins, ridge = TRUE) getCoef(modelsPepBased[[1]])
Low-level function for parameter estimation with msqrob by modeling peptide counts using quasibinomial glm
## S4 method for signature 'SummarizedExperiment' msqrobQB( object, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobQB( object, i, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE )## S4 method for signature 'SummarizedExperiment' msqrobQB( object, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobQB( object, i, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE )
object |
|
formula |
Model formula. The model is built based on the covariates in the data object. |
modelColumnName |
|
overwrite |
|
priorCount |
A 'numeric(1)', which is a prior count to be added to the observations to shrink the estimated log-fold-changes towards zero. Default is 0.1. |
binomialBound |
logical, if ‘TRUE’ then the quasibinomial variance estimator will be never smaller than 1 (no underdispersion). Default is TRUE. |
i |
|
SummarizedExperiment or QFeatures instance
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate by counting how many peptide we observe for each protein pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model to peptide counts using a quasi-binomial model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobQB(se, formula = ~condition) getCoef(rowData(se)$msqrobQbModels[[1]]) # For features object pe <- msqrobQB(pe, i = "protein", formula = ~condition)# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate by counting how many peptide we observe for each protein pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model to peptide counts using a quasi-binomial model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobQB(se, formula = ~condition) getCoef(rowData(se)$msqrobQbModels[[1]]) # For features object pe <- msqrobQB(pe, i = "protein", formula = ~condition)
Methods to computes sample-specific normalization factors on the log scale using conventional median summarisation.
## S4 method for signature 'matrix' nfLogMedian(object, na.rm = TRUE) ## S4 method for signature 'SummarizedExperiment' nfLogMedian(object, i, na.rm = TRUE) ## S4 method for signature 'QFeatures' nfLogMedian(object, i, na.rm = TRUE)## S4 method for signature 'matrix' nfLogMedian(object, na.rm = TRUE) ## S4 method for signature 'SummarizedExperiment' nfLogMedian(object, i, na.rm = TRUE) ## S4 method for signature 'QFeatures' nfLogMedian(object, i, na.rm = TRUE)
object |
|
na.rm |
Logical; indicates if missing values are to be removed.
Default is |
i |
An integer or character specifying which assay to use,
only needed when object is |
Implementation assumes that the assay values are already on the log scale. The normalization factors are computed on the log scale.
A numeric vector of log2-scale normalization factors, one per sample (column).
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) ############################### ### Example on QFeatures object ############################### data(pe) # Calculate log2 norm factor nf_log <- nfLogMedian(pe, i="peptide") nf_log # Normalise peptide level data pe <- sweep(pe, MARGIN = 2, STATS = nf_log, i = "peptide", name = "peptide_norm") # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(pe,"peptide")) boxplot(assay(pe,"peptide_norm")) ############################### ### Example on SummarizedExperiment object ############################### data(pe) # Extract a summarised experiment from QFeatures object pe se <- getWithColData(pe, i="peptide") # Calculate log2 norm factor nf_log <- nfLogMedian(se, i=1) nf_log # Normalise peptide level data and store it as a new assay in # the SummarizedExperiment object assays(se)[["peptide_norm"]] <- sweep(assay(se), MARGIN = 2, STATS = nf_log) # Also give first assay a name assayNames(se)[1] <- "peptide" # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(se,"peptide")) boxplot(assay(se,"peptide_norm")) ############################### ### Example on matrix object ############################### data(pe) # Extract log2 transformed intensity matrix from QFeatures object pe mat <- assay(pe,"peptide") # Calculate Norm factors nf <- nfLogMedian(mat) nf # Normalise peptide level data matnorm <- sweep(mat, MARGIN = 2, STATS = nf_log) # Evaluate normalisation par(mfrow=c(1,2)) boxplot(mat) boxplot(matnorm)# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) ############################### ### Example on QFeatures object ############################### data(pe) # Calculate log2 norm factor nf_log <- nfLogMedian(pe, i="peptide") nf_log # Normalise peptide level data pe <- sweep(pe, MARGIN = 2, STATS = nf_log, i = "peptide", name = "peptide_norm") # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(pe,"peptide")) boxplot(assay(pe,"peptide_norm")) ############################### ### Example on SummarizedExperiment object ############################### data(pe) # Extract a summarised experiment from QFeatures object pe se <- getWithColData(pe, i="peptide") # Calculate log2 norm factor nf_log <- nfLogMedian(se, i=1) nf_log # Normalise peptide level data and store it as a new assay in # the SummarizedExperiment object assays(se)[["peptide_norm"]] <- sweep(assay(se), MARGIN = 2, STATS = nf_log) # Also give first assay a name assayNames(se)[1] <- "peptide" # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(se,"peptide")) boxplot(assay(se,"peptide_norm")) ############################### ### Example on matrix object ############################### data(pe) # Extract log2 transformed intensity matrix from QFeatures object pe mat <- assay(pe,"peptide") # Calculate Norm factors nf <- nfLogMedian(mat) nf # Normalise peptide level data matnorm <- sweep(mat, MARGIN = 2, STATS = nf_log) # Evaluate normalisation par(mfrow=c(1,2)) boxplot(mat) boxplot(matnorm)
Computes sample-specific normalization factors on the log2 scale using a median-of-ratios approach similar to that used in DESeq2 for bulk RNA-seq data.
The method proceeds as follows:
A pseudo-reference sample is constructed as the row-wise mean of the log2 intensities (equivalent to the log2-transformed geometric mean).
For each sample, log2 ratios relative to pseudo-reference are computed.
The normalization factor for each sample is obtained as the median of these log2 ratios (column-wise median).
## S4 method for signature 'matrix' nfLogMedianOfRatios(object, na.rm = TRUE) ## S4 method for signature 'SummarizedExperiment' nfLogMedianOfRatios(object, i, na.rm = TRUE) ## S4 method for signature 'QFeatures' nfLogMedianOfRatios(object, i, na.rm = TRUE)## S4 method for signature 'matrix' nfLogMedianOfRatios(object, na.rm = TRUE) ## S4 method for signature 'SummarizedExperiment' nfLogMedianOfRatios(object, i, na.rm = TRUE) ## S4 method for signature 'QFeatures' nfLogMedianOfRatios(object, i, na.rm = TRUE)
object |
|
na.rm |
Logical; indicates if missing values are to be removed.
Default is |
i |
An integer or character specifying which assay to use,
only needed when object is |
This implementation assumes that the assay values are already on the log scale. The normalization factors are computed on the log scale.
A numeric vector of log2-scale normalization factors, one per sample (column).
Love, M.I., Huber, W., Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550.
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) ############################### ### Example on QFeatures object ############################### data(pe) # Calculate log2 norm factor nf_log <- nfLogMedianOfRatios(pe, i="peptide") nf_log # Normalise peptide level data pe <- sweep(pe, MARGIN = 2, STATS = nf_log, i = "peptide", name = "peptide_norm") # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(pe,"peptide")) boxplot(assay(pe,"peptide_norm")) ############################### ### Example on SummarizedExperiment object ############################### data(pe) # Extract a summarised experiment from QFeatures object pe se <- getWithColData(pe, i="peptide") # Calculate log2 norm factor nf_log <- nfLogMedianOfRatios(se, i=1) nf_log # Normalise peptide level data and store it as a new assay in # the SummarizedExperiment object assays(se)[["peptide_norm"]] <- sweep(assay(se), MARGIN = 2, STATS = nf_log) # Also give first assay a name assayNames(se)[1] <- "peptide" # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(se,"peptide")) boxplot(assay(se,"peptide_norm")) ############################### ### Example on matrix object ############################### data(pe) mat <- assay(pe,"peptide") nf <- nfLogMedianOfRatios(mat) nf # Normalise peptide level data matnorm <- sweep(mat, MARGIN = 2, STATS = nf_log) # Evaluate normalisation par(mfrow=c(1,2)) boxplot(mat) boxplot(matnorm)# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) ############################### ### Example on QFeatures object ############################### data(pe) # Calculate log2 norm factor nf_log <- nfLogMedianOfRatios(pe, i="peptide") nf_log # Normalise peptide level data pe <- sweep(pe, MARGIN = 2, STATS = nf_log, i = "peptide", name = "peptide_norm") # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(pe,"peptide")) boxplot(assay(pe,"peptide_norm")) ############################### ### Example on SummarizedExperiment object ############################### data(pe) # Extract a summarised experiment from QFeatures object pe se <- getWithColData(pe, i="peptide") # Calculate log2 norm factor nf_log <- nfLogMedianOfRatios(se, i=1) nf_log # Normalise peptide level data and store it as a new assay in # the SummarizedExperiment object assays(se)[["peptide_norm"]] <- sweep(assay(se), MARGIN = 2, STATS = nf_log) # Also give first assay a name assayNames(se)[1] <- "peptide" # Evaluate normalisation par(mfrow=c(1,2)) boxplot(assay(se,"peptide")) boxplot(assay(se,"peptide_norm")) ############################### ### Example on matrix object ############################### data(pe) mat <- assay(pe,"peptide") nf <- nfLogMedianOfRatios(mat) nf # Normalise peptide level data matnorm <- sweep(mat, MARGIN = 2, STATS = nf_log) # Evaluate normalisation par(mfrow=c(1,2)) boxplot(mat) boxplot(matnorm)
Subset of peptides from 100 proteins from a quantitative mass spectrometry based proteomics dataset (PRIDE identifier: PXD003881 Shen et al. (2018)). E. Coli lysates were spiked at five different concentrations (3%, 4.5%, 6%, 7.5% and 9%wt/wt) in a stable human background (4 repl. per treatment). The twenty resulting samples were run on an Orbitrap Fusion mass spectrometer. Raw data files were processed with MaxQuant (version 1.6.1.0, Cox and Mann (2008)) using default search settings unless otherwise noted. Spectra were searched against the UniProtKB/SwissProt human and E. Coli reference proteome databases (07/06/2018), concatenated with the default Maxquant contaminant database. Carbamidomethylation of Cystein was set as a fixed modification, and oxidation of Methionine and acetylation of the protein amino-terminus were allowed as variable modifications. In silico cleavage was set to use trypsin/P, allowing two miscleavages. Match between runs was also enabled using default settings. The resulting peptide-to-spectrum matches (PSMs) were filtered by MaxQuant at 1% FDR.
data(pe)data(pe)
Feature set with an instance "peptide":
contains the raw peptide intensities
contains a variable "Proteins" with the protein accession and an variable ecoli to indicate if the protein is a spikin.
contains a factor condition indicating the spike-in condition
data(pe) head(colData(pe)) head(rowData(pe)) head(assay(pe))data(pe) head(colData(pe)) head(rowData(pe)) head(assay(pe))
plot_volcano generates a volcano plot for the results of hypothesis
tests generated with the msqrob2 hypothesisTest method.
plotVolcano( resultsTable, significanceLevel = 0.05, significance_colors = c(`FALSE` = "black", `TRUE` = "red"), opacity = 0.5 )plotVolcano( resultsTable, significanceLevel = 0.05, significance_colors = c(`FALSE` = "black", `TRUE` = "red"), opacity = 0.5 )
resultsTable |
data frame returned by the msqrob2 hypothesisTest method |
significanceLevel |
nominal FDR-level at which features are considered significant (value between 0 and 1) |
significance_colors |
a named vector with two colors, one for non-significant features named 'FALSE' and one for significant features named 'TRUE'. The default is 'c('FALSE' = 'black', 'TRUE' = 'red')'. |
opacity |
refers to the opacity of the dots in the plot.
Values of opacity range from 0 to 1, with lower values corresponding to more
transparent colors. The default is |
A ggplot object with the volcano plot.
# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values # Note that the peptide intensities were already normalised! pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast design <- model.matrix(~condition, data = colData(pe)) # Assess log2 fold change between reference condition a and condition b L <- makeContrast( contrasts = c("conditionb = 0", "conditionc = 0", "conditionc - conditionb = 0"), parameterNames = colnames(design)) pe <- hypothesisTest(pe, i = "protein", L) # Volcano plots library(ggplot2) inference <- msqrobCollect(pe[["protein"]], L) plotVolcano(inference) + facet_wrap(~contrast) plotVolcano(rowData(pe[["protein"]])$`conditionc`) plotVolcano(rowData(pe[["protein"]])$`conditionc - conditionb`)# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values # Note that the peptide intensities were already normalised! pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast design <- model.matrix(~condition, data = colData(pe)) # Assess log2 fold change between reference condition a and condition b L <- makeContrast( contrasts = c("conditionb = 0", "conditionc = 0", "conditionc - conditionb = 0"), parameterNames = colnames(design)) pe <- hypothesisTest(pe, i = "protein", L) # Volcano plots library(ggplot2) inference <- msqrobCollect(pe[["protein"]], L) plotVolcano(inference) + facet_wrap(~contrast) plotVolcano(rowData(pe[["protein"]])$`conditionc`) plotVolcano(rowData(pe[["protein"]])$`conditionc - conditionb`)
For a given vector of protein group names, outputs the names of those protein groups for which none of its member proteins is present in a smaller protein group.
smallestUniqueGroups(proteins, split = ";")smallestUniqueGroups(proteins, split = ";")
proteins |
A vector of characters or factors containing single proteins and/or protein groups (i.e. proteins separated by a separator symbol). |
split |
The character string that is used to separate the indivudual protein names in each protein group. |
A character vector containing the names of the protein groups for which none of its proteins is present in a smaller protein group.
data(pe) smallestUniqueGroups(rowData(pe[["peptide"]])$Proteins)data(pe) smallestUniqueGroups(rowData(pe[["peptide"]])$Proteins)
The StatModel class contains a statistical model as applied on a
feature.
Models are created by the dedicated user-level functions
(msqrob(), mqrobAggregate()) or manually, using the
StatModel() constructor. In the former case, each quantitative
feature is assigned its statistical model and the models are stored
as a variable in a DataFrame object, as illustred in the example
below.
Function for constructing a new StatModel object.
## S4 method for signature 'StatModel' show(object) StatModel( type = "fitError", params = list(), varPosterior = NA_real_, dfPosterior = NA_real_ )## S4 method for signature 'StatModel' show(object) StatModel( type = "fitError", params = list(), varPosterior = NA_real_, dfPosterior = NA_real_ )
object |
|
type |
default set to fit-error, can be a "lm", "rlm" (robust lm with M estimation), "lmer" (when mixed models or ridge regression is adopted), "quasibinomial" (when peptide counts are fitted) |
params |
A list containing the parameters of the fitted model |
varPosterior |
Numeric, posterior variance, default is NA |
dfPosterior |
Numeric, posterior degrees of freedom, default is NA |
A StatModel object
typecharacter(1) defining type of the used model. Default
is "fitError", i.e. a error model. Other include "lm",
"rlm", ...
paramsA list() containing information of the used model.
varPosteriornumeric() of posterior variance.
dfPosteriornumeric() of posterior degrees of freedom.
Oliver M. Crook, Laurent Gatto, Lieven Clement
## A fully specified dummy model myModel <- StatModel( type = "rlm", params = list(x = 3, y = 7, b = 4), varPosterior = c(0.1, 0.2, 0.3), dfPosterior = c(6, 7, 8) ) myModel ## A collection of models stored as a variable in a DataFrame mod1 <- StatModel(type = "rlm") mod2 <- StatModel(type = "lm") df <- DataFrame(x = 1:2) df$mods <- c(mod1, mod2) df # TODO## A fully specified dummy model myModel <- StatModel( type = "rlm", params = list(x = 3, y = 7, b = 4), varPosterior = c(0.1, 0.2, 0.3), dfPosterior = c(6, 7, 8) ) myModel ## A collection of models stored as a variable in a DataFrame mod1 <- StatModel(type = "rlm") mod2 <- StatModel(type = "lm") df <- DataFrame(x = 1:2) df$mods <- c(mod1, mod2) df # TODO
Summary table of the differentially expressed Features
topFeatures(models, contrast, adjust.method = "BH", sort = TRUE, alpha = 1)topFeatures(models, contrast, adjust.method = "BH", sort = TRUE, alpha = 1)
models |
A list with elements of the class |
contrast |
|
adjust.method |
|
sort |
|
alpha |
|
A dataframe with log2 fold changes (logFC), standard errors (se), degrees of freedom of the test (df), t-test statistic (t), p-values (pval) and adjusted pvalues (adjPval) using the specified adjust.method in the p.adjust function of the stats package.
Lieven Clement
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) topDeProteins <- topFeatures(rowData(pe[["protein"]])$msqrobModels, L)data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) topDeProteins <- topFeatures(rowData(pe[["protein"]])$msqrobModels, L)