The hdxmsqc
package is a quality control assessment
package from hydrogen-deuterium exchange mass-spectrometry (HDX-MS)
data. The functions look for outliers in retention time and ion
mobility. They also examine missing values, mass errors, intensity based
outliers, deviations of the data from monotonicity, the correlation of
charge states, whether uptake values are coherent based on overlapping
peptides and finally the similarity of the observed to the theoretical
spectra observed. This package is designed to help those performing
iterative quality control through manual inspection but also a set of
metric and visualizations by which practitioners can use to demonstrate
they have high quality data.
The packages required are the following.
suppressMessages(require(hdxmsqc))
require(S4Vectors)
suppressMessages(require(dplyr))
require(tidyr)
## Loading required package: tidyr
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:S4Vectors':
##
## expand
## Loading required package: RColorBrewer
## Loading required package: ggplot2
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: pheatmap
## Loading required package: patchwork
##
## Attaching package: 'patchwork'
## The following object is masked from 'package:MASS':
##
## area
We first load the data, as exported from HDExaminer.
BRD4uncurated <- data.frame(read.csv(system.file("extdata", "ELN55049_AllResultsTables_Uncurated.csv", package = "hdxmsqc", mustWork = TRUE)))
The following code chunk tidies dataset, which improves the formatting and converts to wide format. It will also note the number of states, timepoints and peptides.
## Number of peptide sequence: 167
## Number of timepoints: 7
## Number of Protein States: 2
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
The next code chunk extracts the columns with the quantitative data.
We now parse the object into an object of class
Qfeatures
. This standardises the formatting of the
data.
BRD4df <- readQFeatures(assayData = BRD4uncurated_wide,
ecol = i,
names = "Deuteration",
fnames = "fnames")
## Checking arguments.
## Warning in .checkWarnEcol(quantCols, ecol): 'ecol' is deprecated, use
## 'quantCols' instead.
## Loading data as a 'SummarizedExperiment' object.
## Formatting sample annotations (colData).
## Formatting data as a 'QFeatures' object.
A simple heatmap of our data can give us a sense of it.
Here, we can plot where the missing values are:
Here, we can filter data that is not missing at random:
## 'mnar' found in 1 out of 1 assay(s)
## Number of peptides filtered:10
We can then replot missing-ness:
The values that are missing are all at the zero time-points where deuterium uptake should be 0, we can simply impute these values.
massError <- computeMassError(object = BRD4df_filtered_imputed)
plotMassError(object = BRD4df_filtered_imputed)
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).
Using linear-model based outlier detection we see whether there are Spectra that have variable intensity based on their mean intensity. A linear model is fitted to the log-mean and log-variance of the intensities. These should follow a linear trend. Cook’s distance is used to determine outliers are consider if their distance is greater than 2/$\sqrt(n)$, where n is the number of peptides.
Retention time outlier detection looks at the usual variability of retention time search window and the actual left/right windows of the retention time. Outliers are flagged if their retention time falls outside 1.5 * interquartile range.
dfrt <- rTimeOutliers(object = BRD4df_filtered_imputed)
plotrTimeOutliers(object = BRD4df_filtered_imputed)
## $leftRTgg
##
## $rightRTgg
This uses a statistic to detect differences from monotonic behavior. First, we need to specify the experimental design and the timepoints used.
The monotonicity statistic measure the deviation from monotoncity. Whilst some deviation is expected from random fluctuations, it is worth double checking those that are strong deviates compare to the rest of the data.
monoStat <- computeMonotoneStats(object = BRD4df_filtered_imputed,
experiment = experiment,
timepoints = timepoints)
out <- plotMonotoneStat(object = BRD4df_filtered_imputed,
experiment = experiment,
timepoints = timepoints)
out
## [[1]]
##
## [[2]]
In a similar analysis to the retention time analysis, for ion mobility time we can also see whether there are random deviation in the ion mobility windows. Again, we define outliers that deviate outside the typical 1.5 * IQR.
imTimeOut <- imTimeOutlier(object = BRD4df_filtered_imputed)
plotImTimeOutlier(object = BRD4df_filtered_imputed)
## $leftIMSgg
##
## $rightIMSgg
We check that charge states are correlated. Whilst we don’t expect exactly the same before - low correlation maybe concerning.
csCor <- chargeCorrelationHdx(object = BRD4df_filtered_imputed,
experiment = experiment,
timepoints = timepoints)
csCor
## $wt
## LNLPDYYKIIKTPMDMGTIKKR LNLPDYYKIIKTPMDMGTIKKRLENN
## 1 1.0000000 1.0000000
## 2 0.9248554 0.9953514
## 3 0.9248554 0.9953514
## 4 1.0000000 1.0000000
##
## $iBET
## LNLPDYYKIIKTPMDMGTIKKR LNLPDYYKIIKTPMDMGTIKKRLENN
## 1 1.0000000 1.0000000
## 2 0.8947091 0.9919185
## 3 0.8947091 0.9919185
## 4 1.0000000 1.0000000
We can also check whether uptakes are compatible with overlapping peptides. The difference in uptake cannot be more different than the difference in the number of exchangeable amides. The default methodology only checks whether sequence with up-to 5 different exchangeable amides are compatible to keep run-times lower. Larger difference may indicate different chemical changes or back-exchange properties.
In this section, we can directly examine the differences between the theoretical spectra one would expect from the computed deuterium uptake and the actual observed spectra. Deviations observed in the spectra could suggest contamination, false identifications or poor quality spectra. A score is generated using the cosine similarity between the spectra - which is equivalent to the normalized dot product. The spectra pairs can be also be visualized.
Load in some Spectra from HDsite which should match those of HDExaminer
hdxsite <- data.frame(read.csv(system.file("extdata", "BRD4_RowChecked_20220628_HDsite.csv",
package = "hdxmsqc", mustWork = TRUE),
header = TRUE, fileEncoding = 'UTF-8-BOM'))
BRD4matched <- read.csv(system.file("extdata", "BRD4_RowChecked_20220628_HDE.csv",
package = "hdxmsqc", mustWork = TRUE),
header = TRUE, fileEncoding = 'UTF-8-BOM')
spectraCompare <- spectraSimilarity(peaks = hdxsite,
object = BRD4matched,
experiment = experiment,
numSpectra = NULL)
The scores can be accesses as follows:
## [1] 0.8878149 0.8492476 0.8093282 0.6894649 0.5788934 0.7713417
To visualise these spectra we can use the following function
Finally, a summarise quality control table can be produced and saved in a .csv file if desired.
qctable <- qualityControl(object = BRD4df_filtered_imputed,
massError = massError,
intensityOutlier = intensityOutlier,
retentionOutlier = dfrt,
monotonicityStat = monoStat,
mobilityOutlier = imTimeOut,
chargeCorrelation = csCor,
replicateCorrelation = replicateVar,
replicateOutlier = replicateOut,
sequenceCheck = tocheck,
spectraCheck = spectraCompare,
experiment = experiment,
timepoints = timepoints )
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] patchwork_1.3.0 pheatmap_1.0.12
## [3] MASS_7.3-61 ggplot2_3.5.1
## [5] RColorBrewer_1.1-3 tidyr_1.3.1
## [7] dplyr_1.1.4 hdxmsqc_1.3.0
## [9] Spectra_1.17.1 BiocParallel_1.41.0
## [11] QFeatures_1.17.0 MultiAssayExperiment_1.33.1
## [13] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [15] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
## [17] IRanges_2.41.1 S4Vectors_0.45.2
## [19] BiocGenerics_0.53.3 generics_0.1.3
## [21] MatrixGenerics_1.19.0 matrixStats_1.4.1
## [23] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 farver_2.1.2 fastmap_1.2.0
## [4] lazyeval_0.2.2 digest_0.6.37 lifecycle_1.0.4
## [7] cluster_2.1.6 ProtGenerics_1.39.0 magrittr_2.0.3
## [10] compiler_4.4.2 rlang_1.1.4 sass_0.4.9
## [13] tools_4.4.2 igraph_2.1.1 utf8_1.2.4
## [16] yaml_2.3.10 knitr_1.49 labeling_0.4.3
## [19] S4Arrays_1.7.1 DelayedArray_0.33.2 plyr_1.8.9
## [22] abind_1.4-8 withr_3.0.2 purrr_1.0.2
## [25] sys_3.4.3 grid_4.4.2 fansi_1.0.6
## [28] colorspace_2.1-1 scales_1.3.0 cli_3.6.3
## [31] rmarkdown_2.29 crayon_1.5.3 httr_1.4.7
## [34] reshape2_1.4.4 BiocBaseUtils_1.9.0 cachem_1.1.0
## [37] stringr_1.5.1 zlibbioc_1.52.0 parallel_4.4.2
## [40] AnnotationFilter_1.31.0 BiocManager_1.30.25 XVector_0.47.0
## [43] vctrs_0.6.5 Matrix_1.7-1 jsonlite_1.8.9
## [46] clue_0.3-66 maketools_1.3.1 jquerylib_0.1.4
## [49] glue_1.8.0 codetools_0.2-20 stringi_1.8.4
## [52] gtable_0.3.6 UCSC.utils_1.3.0 munsell_0.5.1
## [55] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.8.1
## [58] GenomeInfoDbData_1.2.13 R6_2.5.1 evaluate_1.0.1
## [61] lattice_0.22-6 bslib_0.8.0 MetaboCoreUtils_1.15.0
## [64] Rcpp_1.0.13-1 SparseArray_1.7.2 xfun_0.49
## [67] MsCoreUtils_1.19.0 fs_1.6.5 buildtools_1.0.0
## [70] pkgconfig_2.0.3