Title: | Robust statistical inference for quantitative LC-MS proteomics |
---|---|
Description: | msqrob2 provides a robust linear mixed model framework for assessing differential abundance in MS-based Quantitative proteomics experiments. Our workflows can start from raw peptide intensities or summarised protein expression values. The model parameter estimates can be stabilized by ridge regression, empirical Bayes variance estimation and robust M-estimation. msqrob2's hurde workflow can handle missing data without having to rely on hard-to-verify imputation assumptions, and, outcompetes state-of-the-art methods with and without imputation for both high and low missingness. It builds on QFeature infrastructure for quantitative mass spectrometry data to store the model results together with the raw data and preprocessed data. |
Authors: | Lieven Clement [aut, cre] , Laurent Gatto [aut] , Oliver M. Crook [aut] , Adriaan Sticker [ctb], Ludger Goeminne [ctb], Milan Malfait [ctb] , Stijn Vandenbulcke [aut] |
Maintainer: | Lieven Clement <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.15.0 |
Built: | 2024-10-30 08:59:30 UTC |
Source: | https://github.com/bioc/msqrob2 |
Methods for StatModel class
to calculate contrasts of the model parameters
to calculate the variance-covariance matrix of the contrasts
## S4 method for signature 'StatModel' getContrast(object, L) ## S4 method for signature 'StatModel' varContrast(object, L)
## S4 method for signature 'StatModel' getContrast(object, L) ## S4 method for signature 'StatModel' varContrast(object, L)
object |
A list with elements of the class |
L |
contrast |
A matrix with the calculated contrasts or variance-covariance matrix of contrasts
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Define contrast for log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) getContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L) varContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L)
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Define contrast for log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) getContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L) varContrast(rowData(pe[["protein"]])$msqrobModels[[1]], L)
Accessor functions for StatModel class
to get model
to get the parameter estimation method
to get the parameter estimates of the mean model
to get the residual degrees of freedom of the model
to get the residual variance of the model
to get the residual standard deviation of the model
to get the degrees of freedom of the empirical Bayes variance estimator
to get the empirical Bayes variance
to get the empirical Bayes standard deviation
to get the unscaled variance covariance matrix of the model parameters
## S4 method for signature 'StatModel' getModel(object) ## S4 method for signature 'StatModel' getFitMethod(object) ## S4 method for signature 'StatModel' getCoef(object) ## S4 method for signature 'StatModel' getDfPosterior(object) ## S4 method for signature 'StatModel' getVarPosterior(object) ## S4 method for signature 'StatModel' getSigmaPosterior(object) ## S4 method for signature 'StatModel' getDF(object) ## S4 method for signature 'StatModel' getVar(object) ## S4 method for signature 'StatModel' getSigma(object) ## S4 method for signature 'StatModel' getVcovUnscaled(object)
## S4 method for signature 'StatModel' getModel(object) ## S4 method for signature 'StatModel' getFitMethod(object) ## S4 method for signature 'StatModel' getCoef(object) ## S4 method for signature 'StatModel' getDfPosterior(object) ## S4 method for signature 'StatModel' getVarPosterior(object) ## S4 method for signature 'StatModel' getSigmaPosterior(object) ## S4 method for signature 'StatModel' getDF(object) ## S4 method for signature 'StatModel' getVar(object) ## S4 method for signature 'StatModel' getSigma(object) ## S4 method for signature 'StatModel' getVcovUnscaled(object)
object |
|
The requested parameter of the StatModel object
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) getModel(rowData(pe[["protein"]])$msqrobModels[[1]]) getFitMethod(rowData(pe[["protein"]])$msqrobModels[[1]]) # Similar for the remaining accessors
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) getModel(rowData(pe[["protein"]])$msqrobModels[[1]]) getFitMethod(rowData(pe[["protein"]])$msqrobModels[[1]]) # Similar for the remaining accessors
Summary table of the estimates for differential expression of features
## S4 method for signature 'SummarizedExperiment' hypothesisTest( object, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'SummarizedExperiment' hypothesisTestHurdle( object, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTest( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTestHurdle( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE )
## S4 method for signature 'SummarizedExperiment' hypothesisTest( object, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'SummarizedExperiment' hypothesisTestHurdle( object, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTest( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobModels", resultsColumnNamePrefix = "", overwrite = FALSE ) ## S4 method for signature 'QFeatures' hypothesisTestHurdle( object, i, contrast, adjust.method = "BH", modelColumn = "msqrobHurdle", resultsColumnNamePrefix = "hurdle_", overwrite = FALSE )
object |
|
contrast |
|
adjust.method |
|
modelColumn |
|
resultsColumnNamePrefix |
|
overwrite |
|
i |
|
A SummarizedExperiment or a QFeatures
instance augmented with the test
results.
Lieven Clement
# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b L <- makeContrast( "conditionc - conditionb=0", c("conditionb", "conditionc") ) # example SummarizedExperiment instance se <- pe[["protein"]] se <- hypothesisTest(se, L) head(rowData(se)$"conditionc - conditionb", 10) # Volcano plot plot(-log10(pval) ~ logFC, rowData(se)$"conditionc - conditionb", col = (adjPval < 0.05) + 1 ) # Example for QFeatures instance # Assess log2 fold change between condition b and condition a (reference class), # condition c and condition a, and, condition c and condition b. L <- makeContrast( c( "conditionb=0", "conditionc=0", "conditionc - conditionb=0" ), c("conditionb", "conditionc") ) pe <- hypothesisTest(pe, i = "protein", L) head(rowData(pe[["protein"]])$"conditionb", 10) # Volcano plots par(mfrow = c(1, 3)) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC b-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc", col = (adjPval < 0.05) + 1, main = "log2 FC c-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc - conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC c-b" ) # Hurdle method pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) pe <- hypothesisTestHurdle(pe, i = "protein", L) head(rowData(pe[["protein"]])$"hurdle_conditionb", 10)
# Load example data # The data are a Feature object containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b L <- makeContrast( "conditionc - conditionb=0", c("conditionb", "conditionc") ) # example SummarizedExperiment instance se <- pe[["protein"]] se <- hypothesisTest(se, L) head(rowData(se)$"conditionc - conditionb", 10) # Volcano plot plot(-log10(pval) ~ logFC, rowData(se)$"conditionc - conditionb", col = (adjPval < 0.05) + 1 ) # Example for QFeatures instance # Assess log2 fold change between condition b and condition a (reference class), # condition c and condition a, and, condition c and condition b. L <- makeContrast( c( "conditionb=0", "conditionc=0", "conditionc - conditionb=0" ), c("conditionb", "conditionc") ) pe <- hypothesisTest(pe, i = "protein", L) head(rowData(pe[["protein"]])$"conditionb", 10) # Volcano plots par(mfrow = c(1, 3)) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC b-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc", col = (adjPval < 0.05) + 1, main = "log2 FC c-a" ) plot(-log10(pval) ~ logFC, rowData(pe[["protein"]])$"conditionc - conditionb", col = (adjPval < 0.05) + 1, main = "log2 FC c-b" ) # Hurdle method pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) pe <- hypothesisTestHurdle(pe, i = "protein", L) head(rowData(pe[["protein"]])$"hurdle_conditionb", 10)
Construct the contrast matrix corresponding to specified contrasts of a set of parameters.
makeContrast(contrasts, parameterNames)
makeContrast(contrasts, parameterNames)
contrasts |
character vector specifying contrasts, i.e. the linear combination of the modelparameters that equals to zero. |
parameterNames |
character vector specifying the model parameters that are involved in the contrasts, e.g if we model data of three conditions using a factor condition with three levels a, b and c then our model will have 3 mean parameters named (Intercept), conditionb and conditionc. Hence the log2 fold change between b and a is conditionb. Under the null hypothesis the log2 fold change equals 0. Which is to be encoded as "conditionb=0". If we would like to test for log2 fold change between condition c and b we assess if the log2 fold change conditionc-conditionb equals 0, encoded as "condtionb-conditionc=0". |
A numeric contrast matrix with rownames that equal the model parameters that are involved in the contrasts
makeContrast(c("conditionb = 0"), parameterNames = c( "(Intercept)", "conditionb", "conditionc" ) ) makeContrast(c("conditionc=0"), parameterNames = c("conditionc") ) makeContrast(c( "conditionb=0", "conditionc=0", "conditionc-conditionb=0" ), parameterNames = c( "conditionb", "conditionc" ) )
makeContrast(c("conditionb = 0"), parameterNames = c( "(Intercept)", "conditionb", "conditionc" ) ) makeContrast(c("conditionc=0"), parameterNames = c("conditionc") ) makeContrast(c( "conditionb=0", "conditionc=0", "conditionc-conditionb=0" ), parameterNames = c( "conditionb", "conditionc" ) )
Parameter estimation of msqrob models for QFeatures
and SummarizedExperiment
instance.
## S4 method for signature 'SummarizedExperiment' msqrob( object, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) ) ## S4 method for signature 'QFeatures' msqrob( object, i, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
## S4 method for signature 'SummarizedExperiment' msqrob( object, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) ) ## S4 method for signature 'QFeatures' msqrob( object, i, formula, modelColumnName = "msqrobModels", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
object |
|
formula |
Model formula. The model is built based on the covariates in the data object. |
modelColumnName |
|
overwrite |
|
robust |
|
ridge |
|
maxitRob |
|
tol |
|
doQR |
|
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
i |
|
A SummarizedExperiment or a QFeatures
instance with the models.
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust linear regression upon summarization of # peptide intensities into protein expression values. # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrob(se, formula = ~condition, modelColumnName = "rlm") getCoef(rowData(se)$rlm[[1]]) # For features object pe <- msqrob(pe, i = "protein", formula = ~condition, modelColumnName = "rlm") # with ridge regression (slower) pe <- msqrob(pe, i = "protein", formula = ~condition, ridge = TRUE, modelColumnName = "ridge") # compare for human protein (no DE)==> large shrinkage to zero cbind(getCoef(rowData(pe[["protein"]])$rlm[[1]]), getCoef(rowData(pe[["protein"]])$ridge[[1]])) # compare for ecoli protein (DE)==> almost no shrinkage to zero cbind( getCoef(rowData(pe[["protein"]])$rlm[["P00956"]]), getCoef(rowData(pe[["protein"]])$ridge[["P00956"]]) )
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust linear regression upon summarization of # peptide intensities into protein expression values. # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrob(se, formula = ~condition, modelColumnName = "rlm") getCoef(rowData(se)$rlm[[1]]) # For features object pe <- msqrob(pe, i = "protein", formula = ~condition, modelColumnName = "rlm") # with ridge regression (slower) pe <- msqrob(pe, i = "protein", formula = ~condition, ridge = TRUE, modelColumnName = "ridge") # compare for human protein (no DE)==> large shrinkage to zero cbind(getCoef(rowData(pe[["protein"]])$rlm[[1]]), getCoef(rowData(pe[["protein"]])$ridge[[1]])) # compare for ecoli protein (DE)==> almost no shrinkage to zero cbind( getCoef(rowData(pe[["protein"]])$rlm[["P00956"]]), getCoef(rowData(pe[["protein"]])$ridge[["P00956"]]) )
Parameter estimation of msqrob models for QFeatures
instance.
The method aggregates features within the model e.g. from peptides to proteins.
It provides fold change estimates and their associated uncertainty at the aggregated
level (e.g. protein level) while correcting for the peptide species that are observed
in each sample. It also addresses the correlation in the data, e.g. the peptide data
for the same protein in a sample are correlate because they originate from the same
protein pool. The method however does not return aggregated expression values for each sample.
For visualisation purposes aggregated expression values are provide by the aggregateFeatures
function from the QFeatures
Package
## S4 method for signature 'QFeatures' msqrobAggregate( object, formula, i, fcol, name = "msqrobAggregate", aggregateFun = MsCoreUtils::robustSummary, modelColumnName = "msqrobModels", robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
## S4 method for signature 'QFeatures' msqrobAggregate( object, formula, i, fcol, name = "msqrobAggregate", aggregateFun = MsCoreUtils::robustSummary, modelColumnName = "msqrobModels", robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
object |
|
formula |
Model formula. The model is built based on the covariates in the data object. |
i |
|
fcol |
The feature variable of assay ‘i’ defining how to summerise the features. |
name |
A ‘character(1)’ naming the new assay. Default is ‘newAssay’. Note that the function will fail if there's already an assay with ‘name’. |
aggregateFun |
A function used for quantitative feature aggregation.
Details can be found in the documentation of the |
modelColumnName |
|
robust |
|
ridge |
|
maxitRob |
|
tol |
|
doQR |
|
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
A ‘QFeatures’ object with an additional assay.
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. colData(pe)$samples <- rownames(colData(pe)) pe <- msqrobAggregate(pe, i = "peptide", fcol = "Proteins", formula = ~condition + (1|samples) + (1|Sequence), ridge = TRUE) getCoef(rowData(pe[["msqrobAggregate"]])$msqrobModels[["P00956"]])
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. colData(pe)$samples <- rownames(colData(pe)) pe <- msqrobAggregate(pe, i = "peptide", fcol = "Proteins", formula = ~condition + (1|samples) + (1|Sequence), ridge = TRUE) getCoef(rowData(pe[["msqrobAggregate"]])$msqrobModels[["P00956"]])
Low-level function for parameter estimation with msqrob by modeling peptide counts using quasibinomial glm
msqrobGlm(y, npep, formula, data, priorCount = 0.1, binomialBound = TRUE)
msqrobGlm(y, npep, formula, data, priorCount = 0.1, binomialBound = TRUE)
y |
A |
npep |
A vector with number of peptides per protein. It has as length the number of rows of y. The counts are equal or larger than the largest peptide count in y. |
formula |
Model formula. The model is built based on the covariates in the data object. |
data |
A |
priorCount |
A 'numeric(1)', which is a prior count to be added to the observations to shrink the estimated log-fold-changes towards zero. |
binomialBound |
logical, if ‘TRUE’ then the quasibinomial variance estimator will be never smaller than 1 (no underdispersion). |
A list of objects of the StatModel
class.
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobGlm( aggcounts(pe[["protein"]]), rowData(pe[["protein"]])[[".n"]], ~condition, colData(pe) ) getCoef(models[[1]])
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobGlm( aggcounts(pe[["protein"]]), rowData(pe[["protein"]])[[".n"]], ~condition, colData(pe) ) getCoef(models[[1]])
Fitting a hurdle msqrob model with an intensity component for assessing differential abundance and an count component that is modeling peptide counts using quasibinomial glm for differential detection of the number of features that were not missing and used for aggregation
## S4 method for signature 'SummarizedExperiment' msqrobHurdle( object, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobHurdle( object, i, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE )
## S4 method for signature 'SummarizedExperiment' msqrobHurdle( object, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobHurdle( object, i, formula, modelColumnName = "msqrobHurdle", overwrite = FALSE, robust = TRUE, ridge = FALSE, maxitRob = 1, tol = 1e-06, doQR = TRUE, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)), priorCount = 0.1, binomialBound = TRUE )
object |
|
formula |
Model formula. Both model components are built based on the covariates in the data object. |
modelColumnName |
|
overwrite |
|
robust |
|
ridge |
|
maxitRob |
|
tol |
|
doQR |
|
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
priorCount |
A 'numeric(1)', which is a prior count to be added to the observations to shrink the estimated odds ratios of the count component towards zero. Default is 0.1. |
binomialBound |
logical, if ‘TRUE’ then the quasibinomial variance estimator will be never smaller than 1 (no underdispersion). Default is TRUE. |
i |
|
SummarizedExperiment or QFeatures instance
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities to protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit Hurdle MSqrob model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobHurdle(se, formula = ~condition) getCoef(rowData(se)$msqrobHurdleIntensity[[1]]) getCoef(rowData(se)$msqrobHurdleCount[[1]]) # For features object pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobHurdleIntensity[[1]]) getCoef(rowData(pe[["protein"]])$msqrobHurdleCount[[1]])
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities to protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit Hurdle MSqrob model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobHurdle(se, formula = ~condition) getCoef(rowData(se)$msqrobHurdleIntensity[[1]]) getCoef(rowData(se)$msqrobHurdleCount[[1]]) # For features object pe <- msqrobHurdle(pe, i = "protein", formula = ~condition) getCoef(rowData(pe[["protein"]])$msqrobHurdleIntensity[[1]]) getCoef(rowData(pe[["protein"]])$msqrobHurdleCount[[1]])
Low-level function for parameter estimation with msqrob using the ordinary least squares or robust regression base on the MASS::rlm function.
msqrobLm(y, formula, data, robust = TRUE, maxitRob = 5)
msqrobLm(y, formula, data, robust = TRUE, maxitRob = 5)
y |
A |
formula |
Model formula. The model is built based on the covariates in the data object. |
data |
A |
robust |
|
maxitRob |
|
A list of objects of the StatModel
class.
Lieven Clement, Oliver M. Crook
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobLm(assay(pe[["protein"]]), ~condition, colData(pe)) #' getCoef(models[[1]])
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") pe # Fit MSqrob model using robust regression with the MASS rlm function models <- msqrobLm(assay(pe[["protein"]]), ~condition, colData(pe)) #' getCoef(models[[1]])
Low-level function for parameter estimation with msqrob using the robust ridge regression. The models can be fitted for each feature (e.g. summarised protein expression values) or multiple features belonging to the same accession can be modelled simultaneously e.g. peptide-based models where all peptide intensities for the same protein are modelled simultaneously. The fold changes and uncertainty estimates are then calculated at the protein level while correcting for peptide species and within sample correlation.
msqrobLmer( y, formula, data, rowdata = NULL, tol = 1e-06, robust = TRUE, ridge = FALSE, maxitRob = 1, doQR = TRUE, featureGroups = NULL, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
msqrobLmer( y, formula, data, rowdata = NULL, tol = 1e-06, robust = TRUE, ridge = FALSE, maxitRob = 1, doQR = TRUE, featureGroups = NULL, lmerArgs = list(control = lmerControl(calc.derivs = FALSE)) )
y |
A |
formula |
Model formula. The model is built based on the covariates in the data object. |
data |
A |
rowdata |
A |
tol |
|
robust |
|
ridge |
|
maxitRob |
|
doQR |
|
featureGroups |
vector of type |
lmerArgs |
a list (of correct class, resulting from ‘lmerControl()’
containing control parameters, including the nonlinear optimizer to be used
and parameters to be passed through to the nonlinear optimizer, see the
‘lmerControl’ documentation of the lme4 package for more details.
Default is |
A list of objects of the StatModel
class.
Lieven Clement, Oliver M. Crook
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust ridge regression upon summarization of # peptide intensities into protein expression values modelsRidge <- msqrobLmer(assay(pe[["protein"]]), ~condition, data = colData(pe), ridge = TRUE) getCoef(modelsRidge[[1]]) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. # Add the samples variable to colData colData(pe)$samples <- rownames(colData(pe)) modelsPepBased <- msqrobLmer(assay(pe[["peptide"]]), formula = ~condition + (1|samples) + (1|Sequence), data = colData(pe), rowdata = rowData(pe[["peptide"]]), featureGroups = rowData(pe[["peptide"]])$Proteins, ridge = TRUE) getCoef(modelsPepBased[[1]])
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model using robust ridge regression upon summarization of # peptide intensities into protein expression values modelsRidge <- msqrobLmer(assay(pe[["protein"]]), ~condition, data = colData(pe), ridge = TRUE) getCoef(modelsRidge[[1]]) # Fit MSqrob model using robust ridge regression starting from peptide intensities # The fold changes are calculated at the protein level while correcting for # the different peptide species in each sample and the correlation between # peptide intensities of peptides of the same protein in the same sample. # Add the samples variable to colData colData(pe)$samples <- rownames(colData(pe)) modelsPepBased <- msqrobLmer(assay(pe[["peptide"]]), formula = ~condition + (1|samples) + (1|Sequence), data = colData(pe), rowdata = rowData(pe[["peptide"]]), featureGroups = rowData(pe[["peptide"]])$Proteins, ridge = TRUE) getCoef(modelsPepBased[[1]])
Low-level function for parameter estimation with msqrob by modeling peptide counts using quasibinomial glm
## S4 method for signature 'SummarizedExperiment' msqrobQB( object, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobQB( object, i, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE )
## S4 method for signature 'SummarizedExperiment' msqrobQB( object, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE ) ## S4 method for signature 'QFeatures' msqrobQB( object, i, formula, modelColumnName = "msqrobQbModels", overwrite = FALSE, priorCount = 0.1, binomialBound = TRUE )
object |
|
formula |
Model formula. The model is built based on the covariates in the data object. |
modelColumnName |
|
overwrite |
|
priorCount |
A 'numeric(1)', which is a prior count to be added to the observations to shrink the estimated log-fold-changes towards zero. Default is 0.1. |
binomialBound |
logical, if ‘TRUE’ then the quasibinomial variance estimator will be never smaller than 1 (no underdispersion). Default is TRUE. |
i |
|
SummarizedExperiment or QFeatures instance
Lieven Clement
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate by counting how many peptide we observe for each protein pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model to peptide counts using a quasi-binomial model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobQB(se, formula = ~condition) getCoef(rowData(se)$msqrobQbModels[[1]]) # For features object pe <- msqrobQB(pe, i = "protein", formula = ~condition)
# Load example data # The data are a Feature object with containing # a SummarizedExperiment named "peptide" with MaxQuant peptide intensities # The data are a subset of spike-in the human-ecoli study # The variable condition in the colData of the Feature object # contains information on the spike in condition a-e (from low to high) data(pe) # Aggregate by counting how many peptide we observe for each protein pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit MSqrob model to peptide counts using a quasi-binomial model # For summarized SummarizedExperiment se <- pe[["protein"]] se colData(se) <- colData(pe) se <- msqrobQB(se, formula = ~condition) getCoef(rowData(se)$msqrobQbModels[[1]]) # For features object pe <- msqrobQB(pe, i = "protein", formula = ~condition)
Subset of peptides from 100 proteins from a quantitative mass spectrometry based proteomics dataset (PRIDE identifier: PXD003881 Shen et al. (2018)). E. Coli lysates were spiked at five different concentrations (3%, 4.5%, 6%, 7.5% and 9%wt/wt) in a stable human background (4 repl. per treatment). The twenty resulting samples were run on an Orbitrap Fusion mass spectrometer. Raw data files were processed with MaxQuant (version 1.6.1.0, Cox and Mann (2008)) using default search settings unless otherwise noted. Spectra were searched against the UniProtKB/SwissProt human and E. Coli reference proteome databases (07/06/2018), concatenated with the default Maxquant contaminant database. Carbamidomethylation of Cystein was set as a fixed modification, and oxidation of Methionine and acetylation of the protein amino-terminus were allowed as variable modifications. In silico cleavage was set to use trypsin/P, allowing two miscleavages. Match between runs was also enabled using default settings. The resulting peptide-to-spectrum matches (PSMs) were filtered by MaxQuant at 1% FDR.
data(pe)
data(pe)
Feature set with an instance "peptide":
contains the raw peptide intensities
contains a variable "Proteins" with the protein accession and an variable ecoli to indicate if the protein is a spikin.
contains a factor condition indicating the spike-in condition
data(pe) head(colData(pe)) head(rowData(pe)) head(assay(pe))
data(pe) head(colData(pe)) head(rowData(pe)) head(assay(pe))
For a given vector of protein group names, outputs the names of those protein groups for which none of its member proteins is present in a smaller protein group.
smallestUniqueGroups(proteins, split = ";")
smallestUniqueGroups(proteins, split = ";")
proteins |
A vector of characters or factors containing single proteins and/or protein groups (i.e. proteins separated by a separator symbol). |
split |
The character string that is used to separate the indivudual protein names in each protein group. |
A character vector containing the names of the protein groups for which none of its proteins is present in a smaller protein group.
data(pe) smallestUniqueGroups(rowData(pe[["peptide"]])$Proteins)
data(pe) smallestUniqueGroups(rowData(pe[["peptide"]])$Proteins)
The StatModel
class contains a statistical model as applied on a
feature.
Models are created by the dedicated user-level functions
(msqrob()
, mqrobAggregate()
) or manually, using the
StatModel()
constructor. In the former case, each quantitative
feature is assigned its statistical model and the models are stored
as a variable in a DataFrame
object, as illustred in the example
below.
Function for constructing a new StatModel
object.
## S4 method for signature 'StatModel' show(object) StatModel( type = "fitError", params = list(), varPosterior = NA_real_, dfPosterior = NA_real_ )
## S4 method for signature 'StatModel' show(object) StatModel( type = "fitError", params = list(), varPosterior = NA_real_, dfPosterior = NA_real_ )
object |
|
type |
default set to fit-error, can be a "lm", "rlm" (robust lm with M estimation), "lmer" (when mixed models or ridge regression is adopted), "quasibinomial" (when peptide counts are fitted) |
params |
A list containing the parameters of the fitted model |
varPosterior |
Numeric, posterior variance, default is NA |
dfPosterior |
Numeric, posterior degrees of freedom, default is NA |
A StatModel object
type
character(1)
defining type of the used model. Default
is "fitError"
, i.e. a error model. Other include "lm"
,
"rlm"
, ...
params
A list()
containing information of the used model.
varPosterior
numeric()
of posterior variance.
dfPosterior
numeric()
of posterior degrees of freedom.
Oliver M. Crook, Laurent Gatto, Lieven Clement
## A fully specified dummy model myModel <- StatModel( type = "rlm", params = list(x = 3, y = 7, b = 4), varPosterior = c(0.1, 0.2, 0.3), dfPosterior = c(6, 7, 8) ) myModel ## A collection of models stored as a variable in a DataFrame mod1 <- StatModel(type = "rlm") mod2 <- StatModel(type = "lm") df <- DataFrame(x = 1:2) df$mods <- c(mod1, mod2) df # TODO
## A fully specified dummy model myModel <- StatModel( type = "rlm", params = list(x = 3, y = 7, b = 4), varPosterior = c(0.1, 0.2, 0.3), dfPosterior = c(6, 7, 8) ) myModel ## A collection of models stored as a variable in a DataFrame mod1 <- StatModel(type = "rlm") mod2 <- StatModel(type = "lm") df <- DataFrame(x = 1:2) df$mods <- c(mod1, mod2) df # TODO
Summary table of the differentially expressed Features
topFeatures(models, contrast, adjust.method = "BH", sort = TRUE, alpha = 1)
topFeatures(models, contrast, adjust.method = "BH", sort = TRUE, alpha = 1)
models |
A list with elements of the class |
contrast |
|
adjust.method |
|
sort |
|
alpha |
|
A dataframe with log2 fold changes (logFC), standard errors (se), degrees of freedom of the test (df), t-test statistic (t), p-values (pval) and adjusted pvalues (adjPval) using the specified adjust.method in the p.adjust function of the stats package.
Lieven Clement
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) topDeProteins <- topFeatures(rowData(pe[["protein"]])$msqrobModels, L)
data(pe) # Aggregate peptide intensities in protein expression values pe <- aggregateFeatures(pe, i = "peptide", fcol = "Proteins", name = "protein") # Fit msqrob model pe <- msqrob(pe, i = "protein", formula = ~condition) # Define contrast getCoef(rowData(pe[["protein"]])$msqrobModels[[1]]) # Assess log2 fold change between condition c and condition b: L <- makeContrast("conditionc - conditionb=0", c("conditionb", "conditionc")) topDeProteins <- topFeatures(rowData(pe[["protein"]])$msqrobModels, L)