Title: | Speedy quality assurance via lasso labeling for LC-MS data |
---|---|
Description: | squallms is a Bioconductor R package that implements a "semi-labeled" approach to untargeted mass spectrometry data. It pulls in raw data from mass-spec files to calculate several metrics that are then used to label MS features in bulk as high or low quality. These metrics of peak quality are then passed to a simple logistic model that produces a fully-labeled dataset suitable for downstream analysis. |
Authors: | William Kumler [aut, cre, cph] |
Maintainer: | William Kumler <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2025-01-22 06:16:20 UTC |
Source: | https://github.com/bioc/squallms |
This function takes flat-form XC-MS data (i.e. the format produced by
'makeXcmsObjectFlat') and calculates metrics of peak shape and
similarity for downstream processing (e.g. by updateXcmsObjFeats
).
The core metrics are those described in https://doi.org/10.1186/s12859-023-05533-4,
corresponding to peak shape (correlation coefficient between the raw data
and an idealized bell curve, beta_cor) and within-peak signal-to-noise (
maximum intensity divided by the standard deviation of the residuals after
the best-fitting bell is subtracted from the raw data). Additionally,
the function interpolates the raw data to fixed retention time intervals
and performs a principal components analysis (PCA) across a file-by-RT matrix
which typically extracts good-looking peaks in the first component (PC1).
These functions are calculated on the raw MS data as obtained via
RaMS, so either the filepaths must be included in the peak_data object
or supplied as ms1_data.
extractChromMetrics( peak_data, recalc_betas = FALSE, ms1_data = NULL, verbosity = 0 )
extractChromMetrics( peak_data, recalc_betas = FALSE, ms1_data = NULL, verbosity = 0 )
peak_data |
Flat-form XC-MS data with columns for the bounding box of a chromatographic peak (mzmin, mzmax, rtmin, rtmax) as grouped by a feature ID. Must be provided WITHOUT retention time correction for proper matching to the values in the raw data. |
recalc_betas |
Scalar boolean controlling whether existing beta values
(as calculated in the XCMS object when peakpicked with |
ms1_data |
Optional data.table object produced by RaMS containing MS1 data with columns for filename, rt, mz, and int. If not provided, the files are detected from the filepath column in peak_data. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing, 1 should return text-based progress markers, and 2 will return diagnostic plots if available. |
A data.frame containing one row for each feature in peak_data, with columns containing the median peak shape value (med_cor), and the median SNR value (med_snr).
library(xcms) library(MSnbase) library(dplyr) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, recalc_betas = TRUE)
library(xcms) library(MSnbase) library(dplyr) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, recalc_betas = TRUE)
This function interpolates multi-file chromatograms to a shared set of retention time points then performs a PCA to place similar chromatograms near each other in a reduced dimensionality space. Features can then be labeled in groups instead of one at a time, massively reducing the burden of creating a high-quality dataset. This implementation relies on R's shiny package to provide interactive support in a browser environment and the plotly package for selection tools. Classified features are then returned as a simple R object for downstream use.
labelFeatsLasso( peak_data, ms1_data = NULL, rt_window_width = 1, ppm_window_width = 5, verbosity = 1 )
labelFeatsLasso( peak_data, ms1_data = NULL, rt_window_width = 1, ppm_window_width = 5, verbosity = 1 )
peak_data |
Flat-form XC-MS data with columns for the bounding box of a chromatographic peak (mzmin, mzmax, rtmin, rtmax) as grouped by a feature ID. Must be provided WITHOUT retention time correction for proper matching to the values in the raw data. |
ms1_data |
Optional data.table object produced by RaMS containing MS1 data with columns for filename, rt, mz, and int. If not provided, the files are detected from the filepath column in peak_data. |
rt_window_width |
The width of the retention time window that should be used for PCA construction, in minutes. |
ppm_window_width |
The width of the m/z window that should be used for PCA construction, in parts per million. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing, 1 should return text-based progress markers, and 2 will return diagnostic plots if available. |
A character vector named with feature IDs containing the classifications of each peak that was viewed during the interactive phase. NA values indicate those features that were not classified.
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) if (interactive()) { lasso_labels <- labelFeatsLasso(peak_data) }
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) if (interactive()) { lasso_labels <- labelFeatsLasso(peak_data) }
This function allows the user to view and label individual chromatographic features using the keyboard. Running labelFeatsManual launches a browser which shows a single feature extracted from multiple files. The feature can then be classified by pressing a key (defaults are left=bad, right=good) on the keyboard which is recorded and a new feature is automatically shown.This implementation relies on R's shiny package to provide interactive support in a browser environment and the keys package for key binding within a browser. Classified features are then returned as a simple R object for downstream use.
labelFeatsManual(peak_data, ms1_data = NULL, verbosity = 1)
labelFeatsManual(peak_data, ms1_data = NULL, verbosity = 1)
peak_data |
Flat-form XC-MS data with columns for the bounding box of a chromatographic peak (mzmin, mzmax, rtmin, rtmax) as grouped by a feature ID. Must be provided WITHOUT retention time correction for proper matching to the values in the raw data. |
ms1_data |
Optional data.table object produced by RaMS containing MS1 data with columns for filename, rt, mz, and int. If not provided, the files are detected from the filepath column in peak_data. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing while 1 will report diagnostic messages. |
A character vector named with feature IDs containing the classifications of each peak that was viewed during the interactive phase. NA values indicate those features that were not classified.
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) if (interactive()) { manual_labels <- labelFeatsManual(peak_data) }
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) if (interactive()) { manual_labels <- labelFeatsManual(peak_data) }
After metrics have been extracted (typically with 'extractChromMetrics') and labeling has occurred (typically with 'labelFeatsManual' or 'labelFeatsLasso'), a model can be built to robustly classify the peaks that were not labeled based on the metrics. The default regression equation fits the med_cor and med_snr columns in feature_metrics to the feature_labels, but additional metrics can be passed as well (but be careful of overfitting!).
logModelFeatProb( feature_metrics, feature_labels, log_formula = feat_class ~ med_cor + med_snr, verbosity = 2 )
logModelFeatProb( feature_metrics, feature_labels, log_formula = feat_class ~ med_cor + med_snr, verbosity = 2 )
feature_metrics |
A data.frame with columns used to construct the feature quality model. |
feature_labels |
A character vector named with feature IDs and entries corresponding to the peak quality (either "Good", "Bad", or NA). |
log_formula |
The formula to use when predicting feature quality from the feature metrics. This formula is passed to 'glm' as-is, so make sure that the predictive features exist in the feature_metrics data.frame. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing, 1 should return text-based progress markers, and 2 will return diagnostic plots if available. |
A numeric vector of probabilities returned by the logistic model named by feature ID
library(xcms) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") |> list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files |> readMSData(msLevel. = 1, mode = "onDisk") |> findChromPeaks(cwp) |> adjustRtime(obp) |> groupChromPeaks(pdp) |> fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, verbosity = 0) # Load demo labels previously assigned using the lasso method lasso_classes <- readRDS(system.file("extdata", "intro_lasso_labels.rds", package = "squallms")) feat_probs <- logModelFeatProb(feat_metrics, lasso_classes)
library(xcms) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") |> list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files |> readMSData(msLevel. = 1, mode = "onDisk") |> findChromPeaks(cwp) |> adjustRtime(obp) |> groupChromPeaks(pdp) |> fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, verbosity = 0) # Load demo labels previously assigned using the lasso method lasso_classes <- readRDS(system.file("extdata", "intro_lasso_labels.rds", package = "squallms")) feat_probs <- logModelFeatProb(feat_metrics, lasso_classes)
This function wraps 'logModelFeatProb' for modeling of feature quality to estimate the quality of every feature in the dataset and classify them as good/bad based on whether they exceed the provided 'likelihood_threshold'. Lower likelihood thresholds (0.01, 0.1) will produce more false positives (noise peaks included when they shouldn't be). Higher likelihood thresholds (0.9, 0.99) will produce more false negatives (good peaks removed when they shouldn't be).
logModelFeatQuality( feature_metrics, feature_labels, log_formula = feat_class ~ med_cor + med_snr, likelihood_threshold = 0.5, verbosity = 2 )
logModelFeatQuality( feature_metrics, feature_labels, log_formula = feat_class ~ med_cor + med_snr, likelihood_threshold = 0.5, verbosity = 2 )
feature_metrics |
A data.frame with columns used to construct the feature quality model. |
feature_labels |
A character vector named with feature IDs and entries corresponding to the peak quality (either "Good", "Bad", or NA). |
log_formula |
The formula to use when predicting feature quality from the feature metrics. This formula is passed to 'glm' as-is, so make sure that the predictive features exist in the feature_metrics data.frame. |
likelihood_threshold |
A scalar numeric above which features will be kept if their predicted probability exceeds. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing, 1 should return text-based progress markers, and 2 will return diagnostic plots if available. |
A character vector of feature quality assessments returned by the logistic model and named by feature ID
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, verbosity = 0) # Load demo labels previously assigned using the lasso method lasso_classes <- readRDS(system.file("extdata", "intro_lasso_labels.rds", package = "squallms")) feat_classes <- logModelFeatQuality(feat_metrics, lasso_classes)
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, verbosity = 0) # Load demo labels previously assigned using the lasso method lasso_classes <- readRDS(system.file("extdata", "intro_lasso_labels.rds", package = "squallms")) feat_classes <- logModelFeatQuality(feat_metrics, lasso_classes)
XCMSnExp objects are complicated S4 objects that make it difficult to access information about the features and their associated peaks. This function turns the output into a "flat" file format (a data.frame) so it can interact with tidyverse functions more easily.
makeXcmsObjFlat(xcms_obj, revert_rts = TRUE, verbosity = 0)
makeXcmsObjFlat(xcms_obj, revert_rts = TRUE, verbosity = 0)
xcms_obj |
A XCMSnExp object produced by the typical XCMS workflow which should include retention time correction and peak correspondence and filling |
revert_rts |
Scalar boolean controlling whether the adjusted retention times found in the XCMS object are propagated or returned as-is |
verbosity |
Scalar numberic. Will report XCMS info if greater than zero or run silently if not (the default). |
A data.frame with columns for feature information (from featureDefinitions: feature, feat_mzmed, feat_rtmed, feat_npeaks, and peakidx) and peak information (from chromPeaks: mz, mzmin, mzmax, rt, rtmin, rtmax, into, intb, maxo, sn, sample) as well as the full path to the associated file and the file name alone (filepath and filename).
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) makeXcmsObjFlat(xcms_filled)
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) makeXcmsObjFlat(xcms_filled)
Internal function, mostly.
pickyPCA( peak_data, ms1_data, rt_window_width = NULL, ppm_window_width = NULL, verbosity = 1 )
pickyPCA( peak_data, ms1_data, rt_window_width = NULL, ppm_window_width = NULL, verbosity = 1 )
peak_data |
Flat-form XC-MS data with columns for the bounding box of a chromatographic peak (mzmin, mzmax, rtmin, rtmax) as grouped by a feature ID. Must be provided WITHOUT retention time correction for proper matching to the values in the raw data. |
ms1_data |
Optional data.table object produced by RaMS containing MS1 data with columns for filename, rt, mz, and int. If not provided, the files are detected from the filepath column in peak_data. |
rt_window_width |
The width of the retention time window that should be used for PCA construction, in minutes. |
ppm_window_width |
The width of the m/z window that should be used for PCA construction, in parts per million. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing, 1 should return text-based progress markers, and 2 will return diagnostic plots if available. |
A list with two named components, interp_df and pcamat. interp_df is the MS1 data associated with each peak interpolated to a retention time spacing shared across the feature. pcamat is the result of cleaning this object up, pivoting it wider, and converting it to a matrix so that a PCA can be performed.
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) msdata <- RaMS::grabMSdata(unique(peak_data$filepath), grab_what = "MS1", verbosity = 0) pixel_pca <- pickyPCA(peak_data, msdata$MS1)
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) msdata <- RaMS::grabMSdata(unique(peak_data$filepath), grab_what = "MS1", verbosity = 0) pixel_pca <- pickyPCA(peak_data, msdata$MS1)
After metrics have been extracted (typically with 'extractChromMetrics') and labeling has occurred (typically with 'labelFeatsManual' or 'labelFeatsLasso'), low quality features can be removed from the XCMS object to improve downstream processing. This function wraps 'logModelFeatQuality' and automatically edits the features within the provided XCMS object.
updateXcmsObjFeats( xcms_obj, feature_metrics, feature_labels, log_formula = feat_class ~ med_cor + med_snr, likelihood_threshold = 0.5, verbosity = 2 )
updateXcmsObjFeats( xcms_obj, feature_metrics, feature_labels, log_formula = feat_class ~ med_cor + med_snr, likelihood_threshold = 0.5, verbosity = 2 )
xcms_obj |
The XCMS object from which features were initially extracted, usually with 'extractChromMetrics' |
feature_metrics |
A data.frame with columns used to construct the feature quality model. |
feature_labels |
A character vector named with feature IDs and entries corresponding to the peak quality (either "Good", "Bad", or NA). |
log_formula |
The formula to use when predicting feature quality from the feature metrics. This formula is passed to 'glm' as-is, so make sure that the predictive features exist in the feature_metrics data.frame. |
likelihood_threshold |
A scalar numeric above which features will be kept if their predicted probability exceeds. |
verbosity |
Scalar value between zero and two determining how much diagnostic information is produced. 0 should return nothing, 1 should return text-based progress markers, and 2 will return diagnostic plots if available. |
The initial xcms_obj but only containing features that exceed the provided quality threshold (as established in likelihood space)
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, verbosity = 0) lasso_classes <- readRDS(system.file("extdata", "intro_lasso_labels.rds", package = "squallms")) xcms_filled <- updateXcmsObjFeats(xcms_filled, feat_metrics, lasso_classes)
library(xcms) library(dplyr) library(MSnbase) mzML_files <- system.file("extdata", package = "RaMS") %>% list.files(full.names = TRUE, pattern = "[A-F].mzML") register(BPPARAM = SerialParam()) cwp <- CentWaveParam(snthresh = 0, extendLengthMSW = TRUE, integrate = 2) obp <- ObiwarpParam(binSize = 0.1, response = 1, distFun = "cor_opt") pdp <- PeakDensityParam( sampleGroups = 1:3, bw = 12, minFraction = 0, binSize = 0.001, minSamples = 0 ) xcms_filled <- mzML_files %>% readMSData(msLevel. = 1, mode = "onDisk") %>% findChromPeaks(cwp) %>% adjustRtime(obp) %>% groupChromPeaks(pdp) %>% fillChromPeaks(FillChromPeaksParam(ppm = 5)) peak_data <- makeXcmsObjFlat(xcms_filled) feat_metrics <- extractChromMetrics(peak_data, verbosity = 0) lasso_classes <- readRDS(system.file("extdata", "intro_lasso_labels.rds", package = "squallms")) xcms_filled <- updateXcmsObjFeats(xcms_filled, feat_metrics, lasso_classes)