Title: | Dynamic Programming Based Alignment of MS2 Chromatograms |
---|---|
Description: | To obtain unbiased proteome coverage from a biological sample, mass-spectrometer is operated in Data Independent Acquisition (DIA) mode. Alignment of these DIA runs establishes consistency and less missing values in complete data-matrix. This package implements dynamic programming with affine gap penalty based approach for pair-wise alignment of analytes. A hybrid approach of global alignment (through MS2 features) and local alignment (with MS2 chromatograms) is implemented in this tool. |
Authors: | Shubham Gupta [aut, cre] , Hannes Rost [aut] , Justin Sing [aut] |
Maintainer: | Shubham Gupta <[email protected]> |
License: | GPL-3 |
Version: | 2.15.0 |
Built: | 2024-11-18 04:12:03 UTC |
Source: | https://github.com/bioc/DIAlignR |
s is a point-wise similarity matrix between signalA and signalB. Intermediate matrices M,A,B are calculated from s for affine-alignment. Each cell of the Traceback matrix has coordinate of its parent cell. path matrix is a binary matrix with ones indicating path of maximum cumulative score. GapOpen and GapExten are gap-opening and gap-extension penalties used by affine alignment algorithm. indexA_aligned and indexB_aligned are aligned indices of signalA and SignalB. The cumulative alignment score is in score vector.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
indexA_aligned and indexB_aligned are aligned indices of signalA and SignalB. The cumulative alignment score is in score vector.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
s is a point-wise similarity matrix between signalA and signalB. path matrix is a binary matrix with ones indicating path of maximum cumulative score. GapOpen and GapExten are gap-opening and gap-extension penalties used by affine alignment algorithm. indexA_aligned and indexB_aligned are aligned indices of signalA and SignalB. The cumulative alignment score is in score vector.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
Aligns MS2 extracted-ion chromatograms(XICs) pair.
alignChromatogramsCpp( l1, l2, alignType, tA, tB, normalization, simType, B1p = 0, B2p = 0, noBeef = 0L, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100, objType = "heavy" )
alignChromatogramsCpp( l1, l2, alignType, tA, tB, normalization, simType, B1p = 0, B2p = 0, noBeef = 0L, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100, objType = "heavy" )
l1 |
(list) A list of numeric vectors. l1 and l2 should have same length. |
l2 |
(list) A list of numeric vectors. l1 and l2 should have same length. |
alignType |
(char) A character string. Available alignment methods are "global", "local" and "hybrid". |
tA |
(numeric) A numeric vector. This vector has equally spaced timepoints of XIC A. |
tB |
(numeric) A numeric vector. This vector has equally spaced timepoints of XIC B. |
normalization |
(char) A character string. Normalization must be selected from (L2, mean or none). |
simType |
(char) A character string. Similarity type must be selected from (dotProductMasked, dotProduct, cosineAngle, cosine2Angle, euclideanDist, covariance, correlation, crossCorrelation). |
B1p |
(numeric) Timepoint mapped by global fit for tA[1]. |
B2p |
(numeric) Timepoint mapped by global fit for tA[length(tA)]. |
noBeef |
(integer) It defines the distance from the global fit, upto which no penalization is performed. |
goFactor |
(numeric) Penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty. |
geFactor |
(numeric) Penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty. |
cosAngleThresh |
(numeric) In simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
dotProdThresh |
(numeric) In simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
gapQuantile |
(numeric) Must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
hardConstrain |
(logical) if false; indices farther from noBeef distance are filled with distance from linear fit line. |
samples4gradient |
(numeric) This parameter modulates penalization of masked indices. |
objType |
(char) A character string. Must be either light, medium or heavy. |
affineAlignObj (S4class) A S4class object from C++ AffineAlignObj struct.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR data(oswFiles_DIAlignR, package="DIAlignR") oswFiles <- oswFiles_DIAlignR XICs.ref <- XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] XICs.eXp <- XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] tVec.ref <- XICs.ref[[1]][["time"]] # Extracting time component tVec.eXp <- XICs.eXp[[1]][["time"]] # Extracting time component B1p <- 4964.752 B2p <- 5565.462 noBeef <- 77.82315/3.414 l1 <- lapply(XICs.ref, `[[`, 2) l2 <- lapply(XICs.eXp, `[[`, 2) AlignObj <- alignChromatogramsCpp(l1, l2, alignType = "hybrid", tA = tVec.ref, tB = tVec.eXp, normalization = "mean", simType = "dotProductMasked", B1p = B1p, B2p = B2p, noBeef = noBeef, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100, objType = "light")
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR data(oswFiles_DIAlignR, package="DIAlignR") oswFiles <- oswFiles_DIAlignR XICs.ref <- XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] XICs.eXp <- XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] tVec.ref <- XICs.ref[[1]][["time"]] # Extracting time component tVec.eXp <- XICs.eXp[[1]][["time"]] # Extracting time component B1p <- 4964.752 B2p <- 5565.462 noBeef <- 77.82315/3.414 l1 <- lapply(XICs.ref, `[[`, 2) l2 <- lapply(XICs.eXp, `[[`, 2) AlignObj <- alignChromatogramsCpp(l1, l2, alignType = "hybrid", tA = tVec.ref, tB = tVec.eXp, normalization = "mean", simType = "dotProductMasked", B1p = B1p, B2p = B2p, noBeef = noBeef, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100, objType = "light")
Aligned XICs of peptide (ID = 4618) 14299_QFNNTDIVLLEDFQK/3 across two SWATH runs:
run1 : hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
run2 : hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
alignObj_DIAlignR
alignObj_DIAlignR
A S4 object of 16 slots:
similarity score matrix.
Match or Mismatch matrix, residues of A and B are aligned without a gap. M(i,j) = Best score upto (i,j) given Ai is aligned to Bj.
Insert in sequence A, residue in A is aligned to gap in B. A(i,j) is the best score given that Ai is aligned to a gap in B.
Insert in sequence B, residue in B is aligned to gap in A. B(i,j) is the best score given that Bj is aligned to a gap in A.
Traceback matrices store source matrix name and direction as matrices are filled with dynamic programming.
Path matrix would represent alignment path through similarity matrix as binary-hot encoding.
Number of data-points in signal A.
Number of data-points in signal B.
Penalty for Gap opening. For n consecutive gaps: Penalty = GapOpen + (n-1)*GapExten.
Penalty for Gap extension. For n consecutive gaps: Penalty = GapOpen + (n-1)*GapExten.
True for Overlap alignment.
Aligned signalA indices after affine alignment.
Aligned signalB indices after affine alignment.
Cumulative score along the aligned path.
Not needed, will be removed.
Total number of gaps in the alignment path.
C++ code is exaplained at DIAlign namespace. File test_GenerateData.R has source code to generate the example data.
s is a point-wise similarity matrix between signalA and signalB. Intermediate matrices M is calculated from s for alignment. Each cell of the Traceback matrix has coordinate of its parent cell. path matrix is a binary matrix with ones indicating path of maximum cumulative score. GapOpen and GapExten are gap-opening and gap-extension penalties used by alignment algorithm. They must be the same. indexA_aligned and indexB_aligned are aligned indices of signalA and SignalB. The cumulative alignment score is in score vector.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
This function expects osw and xics directories at dataPath. It first reads osw files and fetches chromatogram indices for each analyte. It then align XICs of its reference XICs. Best peak, which has lowest m-score, about the aligned retention time is picked for quantification.
alignTargetedRuns( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, runs = NULL, peps = NULL, refRun = NULL, applyFun = lapply, saveAlignedPeaks = FALSE )
alignTargetedRuns( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, runs = NULL, peps = NULL, refRun = NULL, applyFun = lapply, saveAlignedPeaks = FALSE )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
scoreFile |
(string) path to the peptide score file, needed when oswMerged is FALSE. |
runs |
(string) names of xics file without extension. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
refRun |
(string) reference for alignment. If no run is provided, m-score is used to select reference run. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
saveAlignedPeaks |
(logical) Save a mapping table to track aligned feature ids against reference feature id |
An output table with following columns: precursor, run, intensity, RT, leftWidth, rightWidth, peak_group_rank, m_score, alignment_rank, peptide_id, sequence, charge, group_label.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
Gupta S, Ahadi S, Zhou W, Röst H. "DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics." Mol Cell Proteomics. 2019 Apr;18(4):806-817. doi: https://doi.org/10.1074/mcp.TIR118.001132 Epub 2019 Jan 31.
getRunNames, getFeatures, setAlignmentRank, getMultipeptide
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) alignTargetedRuns(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply)
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) alignTargetedRuns(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply)
This is needed when leaves of the tree are directly aligned to the root
alignToRoot4( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
alignToRoot4( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
params |
(list) parameters are entered as list. Output of the |
outFile |
(string) name of the output file. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-09-25
This function sums all the intensities between left-index and right-index.
areaIntegrator( l1, l2, left, right, integrationType, baselineType, fitEMG, baseSubtraction, kernelLen = 0L, polyOrd = 3L )
areaIntegrator( l1, l2, left, right, integrationType, baselineType, fitEMG, baseSubtraction, kernelLen = 0L, polyOrd = 3L )
l1 |
(list) A list of time vectors. |
l2 |
(list) A list of intensity vectors. |
left |
(numeric) left boundary of the peak. |
right |
(numeric) right boundary of the peak. |
integrationType |
(string) method to ompute the area of a peak contained in XICs. Must be from "intensity_sum", "trapezoid", "simpson". |
baselineType |
(string) method to estimate the background of a peak contained in XICs. Must be from "base_to_base", "vertical_division_min", "vertical_division_max". |
fitEMG |
(logical) enable/disable exponentially modified gaussian peak model fitting. |
baseSubtraction |
(logical) TRUE: remove background from peak signal using estimated noise levels. |
kernelLen |
(integer) length of filter. Must be an odd number. |
polyOrd |
(integer) TRUE: remove background from peak signal using estimated noise levels. |
area (numeric).
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR", package = "DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] l1 <- lapply(XICs, `[[`, 1) # time l2 <- lapply(XICs, `[[`, 2) # intensity areaIntegrator(l1, l2, left = 5203.7, right = 5268.5, "intensity_sum", "base_to_base", FALSE, TRUE) # 66.10481 69.39996 46.53095 16.34266 13.13564 13.42331 areaIntegrator(l1, l2, left = 5203.7, right = 5268.5, kernelLen = 9L, "intensity_sum", "base_to_base", FALSE, TRUE) # 65.01449 71.74432 52.73518 23.84420 17.61869 16.48190
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR", package = "DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] l1 <- lapply(XICs, `[[`, 1) # time l2 <- lapply(XICs, `[[`, 2) # intensity areaIntegrator(l1, l2, left = 5203.7, right = 5268.5, "intensity_sum", "base_to_base", FALSE, TRUE) # 66.10481 69.39996 46.53095 16.34266 13.13564 13.42331 areaIntegrator(l1, l2, left = 5203.7, right = 5268.5, kernelLen = 9L, "intensity_sum", "base_to_base", FALSE, TRUE) # 65.01449 71.74432 52.73518 23.84420 17.61869 16.48190
Converts instances of class AffineAlignObj into list
## S4 method for signature 'AffineAlignObj' as.list(x)
## S4 method for signature 'AffineAlignObj' as.list(x)
x |
An object of class AffineAlignObj. |
list
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2020-03-31
Converts instances of class AffineAlignObjLight into list
## S4 method for signature 'AffineAlignObjLight' as.list(x)
## S4 method for signature 'AffineAlignObjLight' as.list(x)
x |
An object of class AffineAlignObjLight |
list
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2020-03-31
Converts instances of class AffineAlignObjMedium into list
## S4 method for signature 'AffineAlignObjMedium' as.list(x)
## S4 method for signature 'AffineAlignObjMedium' as.list(x)
x |
An object of class AffineAlignObjMedium |
list
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2020-03-31
Converts instances of class AlignObj into list
## S4 method for signature 'AlignObj' as.list(x)
## S4 method for signature 'AlignObj' as.list(x)
x |
An object of class AlignObj |
list
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2020-03-31
Get child chromatograms from parents
childXICs( XICs.ref, XICs.eXp, alignedIndices, method = "spline", polyOrd = 4, kernelLen = 9, splineMethod = "fmm", wRef = 0.5, mergeStrategy = "avg", keepFlanks = TRUE )
childXICs( XICs.ref, XICs.eXp, alignedIndices, method = "spline", polyOrd = 4, kernelLen = 9, splineMethod = "fmm", wRef = 0.5, mergeStrategy = "avg", keepFlanks = TRUE )
XICs.ref |
(list of data-frames) extracted ion chromatograms from reference run. |
XICs.eXp |
(list of data-frames) extracted ion chromatograms from experiment run. |
alignedIndices |
(data-frame) must have two columns "indexAligned.ref" and "indexAligned.eXp". |
method |
(string) must be either "spline", "sgolay" or "linear". |
polyOrd |
(integer) must be less than kernelLen. |
kernelLen |
(integer) must be an odd integer. |
splineMethod |
(string) must be either "fmm" or "natural". |
wRef |
(numeric) Weight of the reference XIC. Must be between 0 and 1. |
mergeStrategy |
(string) must be either ref, avg, refStart or refEnd. |
keepFlanks |
(logical) TRUE: Flanking chromatogram is not removed. |
(list) the first element is a list of chromatograms. The second element is aligned parent time-vectors.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-05-23
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(alignObj_DIAlignR, package="DIAlignR") XICs.ref <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] XICs.eXp <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] alignedIndices <- cbind(alignObj_DIAlignR@indexA_aligned, alignObj_DIAlignR@indexB_aligned) colnames(alignedIndices) <- c("indexAligned.ref", "indexAligned.eXp") alignedIndices[, 1:2][alignedIndices[, 1:2] == 0] <- NA_integer_ newXICs <- childXICs(XICs.ref, XICs.eXp, alignedIndices)[[1]] plotXICgroup(XICs.ref) plotXICgroup(newXICs)
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(alignObj_DIAlignR, package="DIAlignR") XICs.ref <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] XICs.eXp <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] alignedIndices <- cbind(alignObj_DIAlignR@indexA_aligned, alignObj_DIAlignR@indexB_aligned) colnames(alignedIndices) <- c("indexAligned.ref", "indexAligned.eXp") alignedIndices[, 1:2][alignedIndices[, 1:2] == 0] <- NA_integer_ newXICs <- childXICs(XICs.ref, XICs.eXp, alignedIndices)[[1]] plotXICgroup(XICs.ref) plotXICgroup(newXICs)
Constrain similarity matrix with a mask
constrainSimCpp(sim, MASK, samples4gradient = 100)
constrainSimCpp(sim, MASK, samples4gradient = 100)
sim |
(matrix) A numeric matrix. Input similarity matrix. |
MASK |
(matrix) A numeric matrix. Masked indices have non-zero values. |
samples4gradient |
(numeric) This paarameter modulates penalization of masked indices. |
s_new (matrix) A constrained similarity matrix.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
sim <- matrix(c(-2, 10, -2, -2, -2, -2, 10, -2, 10, -2, -2, -2, -2, -2, -2, 10, 10, -2,-2, -2), 4, 5, byrow = FALSE) MASK <- matrix(c(0.000, 0.000, 0.707, 1.414, 0.000, 0.000, 0.000, 0.707, 0.707, 0.000, 0.000, 0.000, 1.414, 0.707, 0, 0, 2.121, 1.414, 0, 0), 4, 5, byrow = FALSE) constrainSimCpp(sim, MASK, 10) matrix(c(-2, 10, -3.414, -4.828, -2, -2, 10, -3.414, 8.586, -2, -2, -2, -4.828, -3.414, -2, 10, 5.758, -4.828, -2, -2), 4, 5, byrow = FALSE)
sim <- matrix(c(-2, 10, -2, -2, -2, -2, 10, -2, 10, -2, -2, -2, -2, -2, -2, 10, 10, -2,-2, -2), 4, 5, byrow = FALSE) MASK <- matrix(c(0.000, 0.000, 0.707, 1.414, 0.000, 0.000, 0.000, 0.707, 0.707, 0.000, 0.000, 0.000, 1.414, 0.707, 0, 0, 2.121, 1.414, 0, 0), 4, 5, byrow = FALSE) constrainSimCpp(sim, MASK, 10) matrix(c(-2, 10, -3.414, -4.828, -2, -2, 10, -3.414, 8.586, -2, -2, -2, -4.828, -3.414, -2, 10, 5.758, -4.828, -2, -2), 4, 5, byrow = FALSE)
Writes an mzML file having chromatograms and their native IDs.
createMZML(ropenms, filename, XICs, transitionIDs)
createMZML(ropenms, filename, XICs, transitionIDs)
ropenms |
(pyopenms module) get this python module through get_ropenms(). |
filename |
(string) name of the mzML file to be written. Extension should be .chrom.mzML. |
XICs |
(list of list of data-frames) list of extracted ion chromatograms of all precursors. |
transitionIDs |
(list of integer) length must be the same as of XICs. |
(None)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-06-06
dataPath <- system.file("extdata", package = "DIAlignR") filename <- paste0(dataPath, "/xics/hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML") data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR) XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]] nativeIds <- list(27706:27711) ## Not run: ropenms <- get_ropenms(condaEnv = "envName") createMZML(ropenms, "testfile.chrom.mzML", XICs, nativeIds) mzR::chromatogramHeader(mzR::openMSfile("testfile.chrom.mzML", backend = "pwiz")) file.remove("testfile.chrom.mzML") ## End(Not run)
dataPath <- system.file("extdata", package = "DIAlignR") filename <- paste0(dataPath, "/xics/hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML") data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR) XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]] nativeIds <- list(27706:27711) ## Not run: ropenms <- get_ropenms(condaEnv = "envName") createMZML(ropenms, "testfile.chrom.mzML", XICs, nativeIds) mzR::chromatogramHeader(mzR::openMSfile("testfile.chrom.mzML", backend = "pwiz")) file.remove("testfile.chrom.mzML") ## End(Not run)
Writes a sqMass file having chromatograms and their native IDs.
createSqMass(filename, XICs, transitionIDs, lossy)
createSqMass(filename, XICs, transitionIDs, lossy)
filename |
(string) name of the mzML file to be written. Extension should be .chrom.sqMass. |
XICs |
(list of list of data-frames) list of extracted ion chromatograms of all precursors. |
transitionIDs |
(list of integer) length must be the same as of XICs. |
lossy |
(logical) if TRUE, time and intensity are lossy-compressed. |
- compression is one of 0 = no, 1 = zlib, 2 = np-linear, 3 = np-slof,
4 = np-pic, 5 = np-linear + zlib, 6 = np-slof + zlib, 7 = np-pic + zlib
- data_type is one of 0 = mz, 1 = int, 2 = rt
- data contains the raw (blob) data for a single data array
(None)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-01-16
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR) XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]] XICs <- list(XICs[[1]], XICs[[1]]) nativeIds <- list(27706:27711, 1:6) sqName <- "testfile.chrom.sqMass" ## Not run: createSqMass(sqName, XICs, nativeIds, TRUE) con <- DBI::dbConnect(RSQLite::SQLite(), dbname = sqName) XIC_group <- extractXIC_group2(con, 0:5) DBI::dbDisconnect(con) file.remove(sqName) ## End(Not run)
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR) XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]] XICs <- list(XICs[[1]], XICs[[1]]) nativeIds <- list(27706:27711, 1:6) sqName <- "testfile.chrom.sqMass" ## Not run: createSqMass(sqName, XICs, nativeIds, TRUE) con <- DBI::dbConnect(RSQLite::SQLite(), dbname = sqName) XIC_group <- extractXIC_group2(con, 0:5) DBI::dbDisconnect(con) file.remove(sqName) ## End(Not run)
This package implements dynamic programming with affine gap penalty to find a highest-scoring scoring path. A hybrid approach of global alignment through MS2 features and local alignment with MS2 chromatograms is implemented in this tool.
Shubham Gupta, Hannes Rost
Perform affine global and overlap alignment on a similarity matrix
doAffineAlignmentCpp(sim, go, ge, OverlapAlignment)
doAffineAlignmentCpp(sim, go, ge, OverlapAlignment)
sim |
(NumericMatrix) A numeric matrix with similarity values of two sequences or signals. |
go |
(numeric) Penalty for introducing first gap in alignment. |
ge |
(numeric) Penalty for introducing subsequent gaps in alignment. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
affineAlignObj (S4class) An object from C++ class of AffineAlignObj.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
# Get sequence similarity of two DNA strings Match=10; MisMatch=-2 seq1 = "GCAT"; seq2 = "CAGTG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) objAffine_Global <- doAffineAlignmentCpp(s, 22, 7, FALSE) slot(objAffine_Global, "score") # -2 -4 -6 4 -18 objAffine_Olap <- doAffineAlignmentCpp(s, 22, 7, TRUE) slot(objAffine_Olap, "score") # 0 10 20 18 18 18 Match=10; MisMatch=-2 seq1 = "CAT"; seq2 = "CAGTG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) objAffine_Global <- doAffineAlignmentCpp(s, 22, 7, FALSE) slot(objAffine_Global, "score") # 10 20 -2 -9 -11 objAffine_Olap <- doAffineAlignmentCpp(s, 22, 7, TRUE) slot(objAffine_Olap, "score") # 10 20 18 18 18 Match=10; MisMatch=-2 seq1 = "CA"; seq2 = "AG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) objAffine_Global <- doAffineAlignmentCpp(s, 22, 7, FALSE) slot(objAffine_Global, "simScore_forw") # -4
# Get sequence similarity of two DNA strings Match=10; MisMatch=-2 seq1 = "GCAT"; seq2 = "CAGTG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) objAffine_Global <- doAffineAlignmentCpp(s, 22, 7, FALSE) slot(objAffine_Global, "score") # -2 -4 -6 4 -18 objAffine_Olap <- doAffineAlignmentCpp(s, 22, 7, TRUE) slot(objAffine_Olap, "score") # 0 10 20 18 18 18 Match=10; MisMatch=-2 seq1 = "CAT"; seq2 = "CAGTG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) objAffine_Global <- doAffineAlignmentCpp(s, 22, 7, FALSE) slot(objAffine_Global, "score") # 10 20 -2 -9 -11 objAffine_Olap <- doAffineAlignmentCpp(s, 22, 7, TRUE) slot(objAffine_Olap, "score") # 10 20 18 18 18 Match=10; MisMatch=-2 seq1 = "CA"; seq2 = "AG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) objAffine_Global <- doAffineAlignmentCpp(s, 22, 7, FALSE) slot(objAffine_Global, "simScore_forw") # -4
Perform non-affine global and overlap alignment on a similarity matrix
doAlignmentCpp(sim, gap, OverlapAlignment)
doAlignmentCpp(sim, gap, OverlapAlignment)
sim |
(NumericMatrix) A numeric matrix with similarity values of two sequences or signals. |
gap |
(double) Penalty for introducing gaps in alignment. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
AlignObj (S4class) An object from C++ class of AlignObj.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
# Get sequence similarity of two DNA strings Match=10; MisMatch=-2 seq1 = "GCAT"; seq2 = "CAGTG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) obj_Global <- doAlignmentCpp(s, 22, FALSE) slot(obj_Global, "score") # -2 -4 -6 4 -18 obj_Olap <- doAlignmentCpp(s, 22, TRUE) slot(obj_Olap, "score") # 0 10 20 18 18 18 Match=1; MisMatch=-1 seq1 = "TTTC"; seq2 = "TGC" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) obj_Global <- doAlignmentCpp(s, 2, FALSE) slot(obj_Global, "optionalPaths") matrix(data = c(1,1,1,1,1,1,1,1,1,2,1,2,1,3,3,1,1,3,6,3), nrow = 5, ncol =4, byrow = TRUE) slot(obj_Global, "M_forw") matrix(data = c(0,-2,-4,-6,-2,-7,-22,-45,-4,-20,-72,-184,-6,-41,-178,-547,-8,-72,-366,-1274), nrow = 5, ncol =4, byrow = TRUE)
# Get sequence similarity of two DNA strings Match=10; MisMatch=-2 seq1 = "GCAT"; seq2 = "CAGTG" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) obj_Global <- doAlignmentCpp(s, 22, FALSE) slot(obj_Global, "score") # -2 -4 -6 4 -18 obj_Olap <- doAlignmentCpp(s, 22, TRUE) slot(obj_Olap, "score") # 0 10 20 18 18 18 Match=1; MisMatch=-1 seq1 = "TTTC"; seq2 = "TGC" s <- getSeqSimMatCpp(seq1, seq2, Match, MisMatch) obj_Global <- doAlignmentCpp(s, 2, FALSE) slot(obj_Global, "optionalPaths") matrix(data = c(1,1,1,1,1,1,1,1,1,2,1,2,1,3,3,1,1,3,6,3), nrow = 5, ncol =4, byrow = TRUE) slot(obj_Global, "M_forw") matrix(data = c(0,-2,-4,-6,-2,-7,-22,-45,-4,-20,-72,-184,-6,-41,-178,-547,-8,-72,-366,-1274), nrow = 5, ncol =4, byrow = TRUE)
Python path can also be set using Sys.setenv(RETICULATE_PYTHON = pythonPath). Also, remove .Rhistory file to avoid conflict with previously used python version.
get_ropenms(pythonPath = NULL, condaEnv = NULL, useConda = TRUE)
get_ropenms(pythonPath = NULL, condaEnv = NULL, useConda = TRUE)
pythonPath |
(string) path of the python program that has pyopenms module. |
condaEnv |
(string) name of the conda environment that has pyopenms module. |
useConda |
(logical) TRUE: Use conda environment. FALSE: Use python through pythonPath. |
(pyopenms module)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-06-06
## Not run: ropenms <- get_ropenms(condaEnv = "envName", useConda=TRUE) ## End(Not run)
## Not run: ropenms <- get_ropenms(condaEnv = "envName", useConda=TRUE) ## End(Not run)
This function aligns XICs of reference and experiment runs. It produces aligned retention times between refernce run and experiment run.
getAlignedTimes( XICs.ref, XICs.eXp, globalFit, alignType, adaptiveRT, normalization, simMeasure, goFactor, geFactor, cosAngleThresh, OverlapAlignment, dotProdThresh, gapQuantile, kerLen, hardConstrain, samples4gradient, objType = "light" )
getAlignedTimes( XICs.ref, XICs.eXp, globalFit, alignType, adaptiveRT, normalization, simMeasure, goFactor, geFactor, cosAngleThresh, OverlapAlignment, dotProdThresh, gapQuantile, kerLen, hardConstrain, samples4gradient, objType = "light" )
XICs.ref |
List of extracted ion chromatograms from reference run. |
XICs.eXp |
List of extracted ion chromatograms from experiment run. |
globalFit |
Linear or loess fit object between reference and experiment run. |
alignType |
Available alignment methods are "global", "local" and "hybrid". |
adaptiveRT |
(numeric) Similarity matrix is not penalized within adaptive RT. |
normalization |
(character) Must be selected from "mean", "l2". |
simMeasure |
(string) Must be selected from dotProduct, cosineAngle, crossCorrelation, cosine2Angle, dotProductMasked, euclideanDist, covariance and correlation. |
goFactor |
(numeric) Penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty. |
geFactor |
(numeric) Penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty. |
cosAngleThresh |
(numeric) In simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
dotProdThresh |
(numeric) In simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
gapQuantile |
(numeric) Must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
hardConstrain |
(logical) If FALSE; indices farther from noBeef distance are filled with distance from linear fit line. |
samples4gradient |
(numeric) This parameter modulates penalization of masked indices. |
objType |
(char) Must be selected from light, medium and heavy. |
(list) the first element corresponds to the aligned reference time, the second element is the aligned experiment time.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
alignChromatogramsCpp, getAlignObj
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(oswFiles_DIAlignR, package="DIAlignR") run1 <- "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" run2 <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" XICs.ref <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run1]][["4618"]] XICs.eXp <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run2]][["4618"]] RUNS_RT <- getRTdf(oswFiles_DIAlignR, ref = "run1", eXp = "run2", maxFdrGlobal = 0.05) globalFit <- loess(RT.eXp ~ RT.ref, data = RUNS_RT, span = 0.1, control=loess.control(surface="direct")) adaptiveRT <- 77.82315 #3.5*globalFit$s getAlignedTimes(XICs.ref, XICs.eXp, globalFit, alignType = "hybrid", adaptiveRT = adaptiveRT, normalization = "mean", simMeasure = "dotProductMasked", goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100)
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(oswFiles_DIAlignR, package="DIAlignR") run1 <- "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" run2 <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" XICs.ref <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run1]][["4618"]] XICs.eXp <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run2]][["4618"]] RUNS_RT <- getRTdf(oswFiles_DIAlignR, ref = "run1", eXp = "run2", maxFdrGlobal = 0.05) globalFit <- loess(RT.eXp ~ RT.ref, data = RUNS_RT, span = 0.1, control=loess.control(surface="direct")) adaptiveRT <- 77.82315 #3.5*globalFit$s getAlignedTimes(XICs.ref, XICs.eXp, globalFit, alignType = "hybrid", adaptiveRT = adaptiveRT, normalization = "mean", simMeasure = "dotProductMasked", goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100)
Get aligned indices from MS2 extracted-ion chromatograms(XICs) pair.
getAlignedTimesCpp( l1, l2, kernelLen, polyOrd, alignType, adaptiveRT, normalization, simType, Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100 )
getAlignedTimesCpp( l1, l2, kernelLen, polyOrd, alignType, adaptiveRT, normalization, simType, Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100 )
l1 |
(list) A list of numeric matrix of two columns. l1 and l2 should have same length. |
l2 |
(list) A list of numeric matrix of two columns. l1 and l2 should have same length. |
kernelLen |
(integer) length of filter. Must be an odd number. |
polyOrd |
(integer) TRUE: remove background from peak signal using estimated noise levels. |
alignType |
(char) A character string. Available alignment methods are "global", "local" and "hybrid". |
adaptiveRT |
(numeric) Similarity matrix is not penalized within adaptive RT. |
normalization |
(char) A character string. Normalization must be selected from (L2, mean or none). |
simType |
(char) A character string. Similarity type must be selected from (dotProductMasked, dotProduct, cosineAngle, cosine2Angle, euclideanDist, covariance, correlation, crossCorrelation). |
Bp |
(numeric) Timepoint mapped by global fit for tA. |
goFactor |
(numeric) Penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty. |
geFactor |
(numeric) Penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty. |
cosAngleThresh |
(numeric) In simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
dotProdThresh |
(numeric) In simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
gapQuantile |
(numeric) Must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
hardConstrain |
(logical) if false; indices farther from noBeef distance are filled with distance from linear fit line. |
samples4gradient |
(numeric) This parameter modulates penalization of masked indices. |
NumericMatrix Aligned indices of l1 and l2.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR XICs.ref <- lapply(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) XICs.eXp <- lapply(XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) Bp <- seq(4964.752, 5565.462, length.out = nrow(XICs.ref[[1]])) time <- getAlignedTimesCpp(XICs.ref, XICs.eXp, 11, 4, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", Bp = Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100)
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR XICs.ref <- lapply(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) XICs.eXp <- lapply(XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) Bp <- seq(4964.752, 5565.462, length.out = nrow(XICs.ref[[1]])) time <- getAlignedTimesCpp(XICs.ref, XICs.eXp, 11, 4, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", Bp = Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100)
This function aligns XICs of reference and experiment runs. It produces aligned retention times between refernce run and experiment run.
getAlignedTimesFast(XICs.ref, XICs.eXp, globalFit, adaptiveRT, params)
getAlignedTimesFast(XICs.ref, XICs.eXp, globalFit, adaptiveRT, params)
XICs.ref |
List of extracted ion chromatograms from reference run. |
XICs.eXp |
List of extracted ion chromatograms from experiment run. |
globalFit |
Linear or loess fit object between reference and experiment run. |
adaptiveRT |
(numeric) Similarity matrix is not penalized within adaptive RT. |
params |
(list) parameters are entered as list. Output of the |
(matrix) the first column corresponds to the aligned reference time, the second column is the aligned experiment time.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-01-02
alignChromatogramsCpp, getAlignObj
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(oswFiles_DIAlignR, package="DIAlignR") run1 <- "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" run2 <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" XICs.ref <- lapply(XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run1]][["4618"]], as.matrix) XICs.eXp <- lapply(XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run2]][["4618"]], as.matrix) params <- paramsDIAlignR() params[["globalAlignment"]] <- "linear" globalFit <- getGlobalAlignment(oswFiles_DIAlignR, ref = "run2", eXp = "run0", fitType = params[["globalAlignment"]], maxFdrGlobal = 0.05, spanvalue = 0.1) adaptiveRT <- 77.82315 #3.5*getRSE(globalFit, params[["globalAlignment"]]) globalFit <- coef(globalFit) getAlignedTimesFast(XICs.ref, XICs.eXp, globalFit, adaptiveRT, params)
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(oswFiles_DIAlignR, package="DIAlignR") run1 <- "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" run2 <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" XICs.ref <- lapply(XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run1]][["4618"]], as.matrix) XICs.eXp <- lapply(XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run2]][["4618"]], as.matrix) params <- paramsDIAlignR() params[["globalAlignment"]] <- "linear" globalFit <- getGlobalAlignment(oswFiles_DIAlignR, ref = "run2", eXp = "run0", fitType = params[["globalAlignment"]], maxFdrGlobal = 0.05, spanvalue = 0.1) adaptiveRT <- 77.82315 #3.5*getRSE(globalFit, params[["globalAlignment"]]) globalFit <- coef(globalFit) getAlignedTimesFast(XICs.ref, XICs.eXp, globalFit, adaptiveRT, params)
Outputs AlignObj from an alignment of two XIC-groups
getAlignObj( XICs.ref, XICs.eXp, globalFit, alignType, adaptiveRT, normalization, simType, goFactor, geFactor, cosAngleThresh, OverlapAlignment, dotProdThresh, gapQuantile, kerLen, hardConstrain, samples4gradient, objType = "light" )
getAlignObj( XICs.ref, XICs.eXp, globalFit, alignType, adaptiveRT, normalization, simType, goFactor, geFactor, cosAngleThresh, OverlapAlignment, dotProdThresh, gapQuantile, kerLen, hardConstrain, samples4gradient, objType = "light" )
XICs.ref |
List of extracted ion chromatograms from reference run. |
XICs.eXp |
List of extracted ion chromatograms from experiment run. |
globalFit |
Linear or loess fit object between reference and experiment run. |
alignType |
Available alignment methods are "global", "local" and "hybrid". |
adaptiveRT |
(numeric) Similarity matrix is not penalized within adaptive RT. |
normalization |
(character) Must be selected from "mean", "l2". |
simType |
(string) Must be selected from dotProduct, cosineAngle, crossCorrelation, cosine2Angle, dotProductMasked, euclideanDist, covariance and correlation. |
goFactor |
(numeric) Penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty. |
geFactor |
(numeric) Penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty. |
cosAngleThresh |
(numeric) In simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
dotProdThresh |
(numeric) In simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
gapQuantile |
(numeric) Must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
hardConstrain |
(logical) If FALSE; indices farther from noBeef distance are filled with distance from linear fit line. |
samples4gradient |
(numeric) This parameter modulates penalization of masked indices. |
objType |
(char) Must be selected from light, medium and heavy. |
A S4 object. Three most-important slots are:
indexA_aligned |
(integer) aligned indices of reference run. |
indexB_aligned |
(integer) aligned indices of experiment run. |
score |
(numeric) cumulative score of alignment. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(oswFiles_DIAlignR, package="DIAlignR") run1 <- "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" run2 <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" XICs.ref <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run1]][["4618"]] XICs.eXp <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run2]][["4618"]] RUNS_RT <- getRTdf(oswFiles_DIAlignR, ref = "run1", eXp = "run2", maxFdrGlobal = 0.05) globalFit <- loess(RT.eXp ~ RT.ref, data = RUNS_RT, span = 0.1, control=loess.control(surface="direct")) AlignObj <- getAlignObj(XICs.ref, XICs.eXp, globalFit, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100, objType = "light")
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") data(oswFiles_DIAlignR, package="DIAlignR") run1 <- "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt" run2 <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" XICs.ref <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run1]][["4618"]] XICs.eXp <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[[run2]][["4618"]] RUNS_RT <- getRTdf(oswFiles_DIAlignR, ref = "run1", eXp = "run2", maxFdrGlobal = 0.05) globalFit <- loess(RT.eXp ~ RT.ref, data = RUNS_RT, span = 0.1, control=loess.control(surface="direct")) AlignObj <- getAlignObj(XICs.ref, XICs.eXp, globalFit, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100, objType = "light")
This function expects osw and xics directories at dataPath. It first reads osw files and fetches chromatogram indices for each requested analyte. It then align XICs of each analyte to its reference XICs. AlignObj is returned which contains aligned indices and cumulative score along the alignment path.
getAlignObjs( analytes, runs, dataPath = ".", refRun = NULL, oswMerged = TRUE, params = paramsDIAlignR(), objType = "light" )
getAlignObjs( analytes, runs, dataPath = ".", refRun = NULL, oswMerged = TRUE, params = paramsDIAlignR(), objType = "light" )
analytes |
(vector of integers) transition_group_ids for which features are to be extracted. |
runs |
(string) names of xics file without extension. |
dataPath |
(string) path to xics and osw directory. |
refRun |
(string) reference for alignment. If no run is provided, m-score is used to select reference run. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
params |
(list) parameters are entered as list. Output of the |
objType |
(char) Must be selected from light, medium and heavy. |
A list of fileInfo and AlignObjs. Each AlignObj is an S4 object. Three most-important slots are:
indexA_aligned |
(integer) aligned indices of reference run. |
indexB_aligned |
(integer) aligned indices of experiment run. |
score |
(numeric) cumulative score of alignment. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
Gupta S, Ahadi S, Zhou W, Röst H. "DIAlignR Provides Precise Retention Time Alignment Across Distant Runs in DIA and Targeted Proteomics." Mol Cell Proteomics. 2019 Apr;18(4):806-817. doi: https://doi.org/10.1074/mcp.TIR118.001132 Epub 2019 Jan 31.
plotAlignedAnalytes, getRunNames, getFeatures, getXICs4AlignObj, getAlignObj
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" runs <- c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") analytes <- c(32L, 898L, 2474L) AlignObjOutput <- getAlignObjs(analytes, runs, dataPath = dataPath) plotAlignedAnalytes(AlignObjOutput)
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" runs <- c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") analytes <- c(32L, 898L, 2474L) AlignObjOutput <- getAlignObjs(analytes, runs, dataPath = dataPath) plotAlignedAnalytes(AlignObjOutput)
This function outputs base gap-penalty depending on SimType used. In case of getting base gap-penalty from similarity matrix distribution, gapQuantile will be used to pick the value.
getBaseGapPenaltyCpp(sim, SimType, gapQuantile = 0.5)
getBaseGapPenaltyCpp(sim, SimType, gapQuantile = 0.5)
sim |
(matrix) A numeric matrix. Input similarity matrix. |
SimType |
(char) A character string. Similarity type must be selected from (dotProductMasked, dotProduct, cosineAngle, cosine2Angle, euclideanDist, covariance, correlation, crossCorrelation). |
gapQuantile |
(numeric) Must be between 0 and 1. |
baseGapPenalty (numeric).
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
sim <- matrix(c(-12, 1.0, 12, -2.3, -2, -2, 1.07, -2, 1.80, 2, 22, 42, -2, -1.5, -2, 10), 4, 4, byrow = FALSE) getBaseGapPenaltyCpp(sim, "dotProductMasked", 0.5) # -0.25
sim <- matrix(c(-12, 1.0, 12, -2.3, -2, -2, 1.07, -2, 1.80, 2, 22, 42, -2, -1.5, -2, 10), 4, 4, byrow = FALSE) getBaseGapPenaltyCpp(sim, "dotProductMasked", 0.5) # -0.25
Get child chromatogram from two parent chromatogram
getChildXICpp( l1, l2, kernelLen, polyOrd, alignType, adaptiveRT, normalization, simType, Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100, wRef = 0.5, splineMethod = "natural", mergeStrategy = "avg", keepFlanks = TRUE )
getChildXICpp( l1, l2, kernelLen, polyOrd, alignType, adaptiveRT, normalization, simType, Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, kerLen = 9L, hardConstrain = FALSE, samples4gradient = 100, wRef = 0.5, splineMethod = "natural", mergeStrategy = "avg", keepFlanks = TRUE )
l1 |
(list) A list of numeric matrix of two columns. l1 and l2 should have same length. |
l2 |
(list) A list of numeric matrix of two columns. l1 and l2 should have same length. |
kernelLen |
(integer) length of filter. Must be an odd number. |
polyOrd |
(integer) TRUE: remove background from peak signal using estimated noise levels. |
alignType |
(char) A character string. Available alignment methods are "global", "local" and "hybrid". |
adaptiveRT |
(numeric) Similarity matrix is not penalized within adaptive RT. |
normalization |
(char) A character string. Normalization must be selected from (L2, mean or none). |
simType |
(char) A character string. Similarity type must be selected from (dotProductMasked, dotProduct, cosineAngle, cosine2Angle, euclideanDist, covariance, correlation, crossCorrelation). |
Bp |
(numeric) Timepoint mapped by global fit for tA. |
goFactor |
(numeric) Penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty. |
geFactor |
(numeric) Penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty. |
cosAngleThresh |
(numeric) In simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
OverlapAlignment |
(logical) An input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
dotProdThresh |
(numeric) In simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
gapQuantile |
(numeric) Must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
hardConstrain |
(logical) if false; indices farther from noBeef distance are filled with distance from linear fit line. |
samples4gradient |
(numeric) This parameter modulates penalization of masked indices. |
wRef |
(numeric) Weight of the reference XIC. Must be between 0 and 1. |
splineMethod |
(string) must be either "fmm" or "natural". |
mergeStrategy |
(string) must be either ref, avg, refStart or refEnd. |
keepFlanks |
(logical) TRUE: Flanking chromatogram is not removed. |
(List) of chromatograms and their aligned time vectors.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2021) + MIT Date: 2021-01-08
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR XICs.ref <- lapply(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) XICs.eXp <- lapply(XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) Bp <- seq(4964.752, 5565.462, length.out = nrow(XICs.ref[[1]])) chrom <- getChildXICpp(XICs.ref, XICs.eXp, 11L, 4L, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", Bp = Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100, wRef = 0.5, keepFlanks= TRUE)
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR XICs.ref <- lapply(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) XICs.eXp <- lapply(XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) Bp <- seq(4964.752, 5565.462, length.out = nrow(XICs.ref[[1]])) chrom <- getChildXICpp(XICs.ref, XICs.eXp, 11L, 4L, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", Bp = Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100, wRef = 0.5, keepFlanks= TRUE)
This function performs the chromatogram alignment of all precursors across runA and runB. Aligned chromatograms are merged into a child chromatogram. Aligned time vector and resulting child time vector for each precursor is also returned.
getChildXICs( runA, runB, fileInfo, features, mzPntrs, precursors, prec2chromIndex, refRun, peptideScores, params, applyFun = lapply )
getChildXICs( runA, runB, fileInfo, features, mzPntrs, precursors, prec2chromIndex, refRun, peptideScores, params, applyFun = lapply )
runA |
(string) name of a run to be merged with runB. Must be in the rownames of fileInfo. |
runB |
(string) name of a run to be merged with runA. Must be in the rownames of fileInfo. |
fileInfo |
(data-frame) output of |
features |
(list of data-frames) contains features and their properties identified in each run. |
mzPntrs |
(list) a list of mzRpwiz. |
precursors |
(data-frame) atleast two columns transition_group_id and transition_ids are required. |
prec2chromIndex |
(list) a list of dataframes having following columns: |
refRun |
(integer) must be of the same length as of precursors. 1: reference is runA, 2: reference is runB. |
peptideScores |
(list of data-frames) each dataframe has scores of a peptide across all runs. |
params |
(list) parameters are entered as list. Output of the |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(list) has three elements. The first element has child XICs for all the precursors. The second element has corresponding aligned time vectors. Third element contains Residual Standard Errors (RSE) of global fits amongst runA and runB.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-06-06
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() fileInfo <- DIAlignR::getRunNames(dataPath = dataPath) mzPntrs <- getMZMLpointers(fileInfo) features <- getFeatures(fileInfo, maxFdrQuery = 1.00, runType = "DIA_Proteomics") precursors <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) precursors <- dplyr::arrange(precursors, .data$peptide_id, .data$transition_group_id) peptideIDs <- unique(precursors$peptide_id) peptideScores <- getPeptideScores(fileInfo, peptideIDs, oswMerged = TRUE, params[["runType"]], params[["context"]]) peptideScores <- lapply(peptideIDs, function(pep) dplyr::filter(peptideScores, .data$peptide_id == pep)) names(peptideScores) <- as.character(peptideIDs) prec2chromIndex <- getChromatogramIndices(fileInfo, precursors, mzPntrs) var2 <- as.character(sapply(peptideIDs, function(p) precursors$transition_group_id[which(precursors$peptide_id == p)[1]])) refRun <- data.frame(rep(1L, length(peptideIDs)), var2) mergedXICs <- getChildXICs(runA = "run1", runB = "run2", fileInfo, features, mzPntrs, precursors, prec2chromIndex, refRun, peptideScores, params) for(con in mzPntrs) DBI::dbDisconnect(con)
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() fileInfo <- DIAlignR::getRunNames(dataPath = dataPath) mzPntrs <- getMZMLpointers(fileInfo) features <- getFeatures(fileInfo, maxFdrQuery = 1.00, runType = "DIA_Proteomics") precursors <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) precursors <- dplyr::arrange(precursors, .data$peptide_id, .data$transition_group_id) peptideIDs <- unique(precursors$peptide_id) peptideScores <- getPeptideScores(fileInfo, peptideIDs, oswMerged = TRUE, params[["runType"]], params[["context"]]) peptideScores <- lapply(peptideIDs, function(pep) dplyr::filter(peptideScores, .data$peptide_id == pep)) names(peptideScores) <- as.character(peptideIDs) prec2chromIndex <- getChromatogramIndices(fileInfo, precursors, mzPntrs) var2 <- as.character(sapply(peptideIDs, function(p) precursors$transition_group_id[which(precursors$peptide_id == p)[1]])) refRun <- data.frame(rep(1L, length(peptideIDs)), var2) mergedXICs <- getChildXICs(runA = "run1", runB = "run2", fileInfo, features, mzPntrs, precursors, prec2chromIndex, refRun, peptideScores, params) for(con in mzPntrs) DBI::dbDisconnect(con)
This function reads the header of chromatogram files. It then fetches chromatogram indices by matching transition_id(osw) with chromatogramID(xics).
getChromatogramIndices(fileInfo, precursors, mzPntrs, applyFun = lapply)
getChromatogramIndices(fileInfo, precursors, mzPntrs, applyFun = lapply)
fileInfo |
(data-frame) Output of getRunNames function. |
precursors |
(data-frame) Atleast two columns transition_group_id and transition_ids are required. |
mzPntrs |
A list of mzRpwiz. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(list) A list of dataframes having following columns:
transition_group_id |
(string) it is PRECURSOR.ID from osw file. |
chromatogramIndex |
(integer) index of chromatogram in mzML file. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-04-07
chromatogramIdAsInteger, mapPrecursorToChromIndices
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") mzPntrs <- getMZMLpointers(fileInfo) prec2chromIndex <- getChromatogramIndices(fileInfo, precursors, mzPntrs) for(mz in mzPntrs) DBI::dbDisconnect(mz)
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") mzPntrs <- getMZMLpointers(fileInfo) prec2chromIndex <- getChromatogramIndices(fileInfo, precursors, mzPntrs) for(mz in mzPntrs) DBI::dbDisconnect(mz)
Calculates similarity matrix of two fragment-ion chromatogram groups or extracted-ion chromatograms(XICs)
getChromSimMatCpp( l1, l2, normalization, simType, cosAngleThresh = 0.3, dotProdThresh = 0.96, kerLen = 9L )
getChromSimMatCpp( l1, l2, normalization, simType, cosAngleThresh = 0.3, dotProdThresh = 0.96, kerLen = 9L )
l1 |
(list) A list of vectors. Length should be same as of l2. |
l2 |
(list) A list of vectors. Length should be same as of l1. |
normalization |
(char) A character string. Normalization must be selected from (L2, mean or none). |
simType |
(char) A character string. Similarity type must be selected from (dotProductMasked, dotProduct, cosineAngle, cosine2Angle, euclideanDist, covariance, correlation, crossCorrelation). |
cosAngleThresh |
(numeric) In simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
dotProdThresh |
(numeric) In simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
s (matrix) Numeric similarity matrix. Rows and columns expresses seq1 and seq2, respectively.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-05
# Get similarity matrix of dummy chromatograms r1 <- list(c(1.0,3.0,2.0,4.0), c(0.0,0.0,0.0,1.0), c(4.0,4.0,4.0,5.0)) r2 <- list(c(1.4,2.0,1.5,4.0), c(0.0,0.5,0.0,0.0), c(2.0,3.0,4.0,0.9)) round(getChromSimMatCpp(r1, r2, "L2", "dotProductMasked"), 3) matrix(c(0.125, 0.162, 0.144, 0.208, 0.186, 0.240, 0.213, 0.313, 0.233, 0.273, 0.253, 0.346, 0.101, 0.208, 0.154, 0.273), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "dotProduct"), 3) matrix(c(0.125, 0.162, 0.144, 0.208, 0.186,0.240, 0.213, 0.313, 0.233, 0.273, 0.253, 0.346, 0.101, 0.208, 0.154, 0.273), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "cosineAngle"), 3) matrix(c(0.934, 0.999, 0.989, 0.986, 0.933, 0.989, 0.983, 0.996, 0.994, 0.960, 0.995, 0.939, 0.450, 0.761, 0.633, 0.772), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "cosine2Angle"), 3) matrix(c(0.744, 0.998, 0.957, 0.944, 0.740, 0.956, 0.932, 0.985, 0.974, 0.842, 0.978, 0.764, -0.596, 0.158, -0.200, 0.190), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "mean", "euclideanDist"), 3) matrix(c(0.608, 0.614, 0.680, 0.434, 0.530, 0.742, 0.659, 0.641, 0.520, 0.541, 0.563, 0.511, 0.298, 0.375, 0.334, 0.355), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "covariance"), 3) matrix(c(0.025, 0.028, 0.027, 0.028, 0.032, 0.034, 0.033, 0.034, 0.055, 0.051, 0.053, 0.051, -0.004, 0.028, 0.012, 0.028), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "correlation"), 3) matrix(c(0.874, 0.999, 0.974, 0.999, 0.923, 0.986, 0.993, 0.986, 0.991, 0.911, 0.990, 0.911, -0.065, 0.477, 0.214, 0.477), 4, 4, byrow = FALSE)
# Get similarity matrix of dummy chromatograms r1 <- list(c(1.0,3.0,2.0,4.0), c(0.0,0.0,0.0,1.0), c(4.0,4.0,4.0,5.0)) r2 <- list(c(1.4,2.0,1.5,4.0), c(0.0,0.5,0.0,0.0), c(2.0,3.0,4.0,0.9)) round(getChromSimMatCpp(r1, r2, "L2", "dotProductMasked"), 3) matrix(c(0.125, 0.162, 0.144, 0.208, 0.186, 0.240, 0.213, 0.313, 0.233, 0.273, 0.253, 0.346, 0.101, 0.208, 0.154, 0.273), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "dotProduct"), 3) matrix(c(0.125, 0.162, 0.144, 0.208, 0.186,0.240, 0.213, 0.313, 0.233, 0.273, 0.253, 0.346, 0.101, 0.208, 0.154, 0.273), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "cosineAngle"), 3) matrix(c(0.934, 0.999, 0.989, 0.986, 0.933, 0.989, 0.983, 0.996, 0.994, 0.960, 0.995, 0.939, 0.450, 0.761, 0.633, 0.772), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "cosine2Angle"), 3) matrix(c(0.744, 0.998, 0.957, 0.944, 0.740, 0.956, 0.932, 0.985, 0.974, 0.842, 0.978, 0.764, -0.596, 0.158, -0.200, 0.190), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "mean", "euclideanDist"), 3) matrix(c(0.608, 0.614, 0.680, 0.434, 0.530, 0.742, 0.659, 0.641, 0.520, 0.541, 0.563, 0.511, 0.298, 0.375, 0.334, 0.355), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "covariance"), 3) matrix(c(0.025, 0.028, 0.027, 0.028, 0.032, 0.034, 0.033, 0.034, 0.055, 0.051, 0.053, 0.051, -0.004, 0.028, 0.012, 0.028), 4, 4, byrow = FALSE) round(getChromSimMatCpp(r1, r2, "L2", "correlation"), 3) matrix(c(0.874, 0.999, 0.974, 0.999, 0.923, 0.986, 0.993, 0.986, 0.991, 0.911, 0.990, 0.911, -0.065, 0.477, 0.214, 0.477), 4, 4, byrow = FALSE)
Get a list of data-frame of OpenSwath features that contains retention time, intensities, boundaries etc.
getFeatures( fileInfo, maxFdrQuery = 0.05, maxIPFFdrQuery = 0.05, runType = "DIA_Proteomics", applyFun = lapply )
getFeatures( fileInfo, maxFdrQuery = 0.05, maxIPFFdrQuery = 0.05, runType = "DIA_Proteomics", applyFun = lapply )
fileInfo |
(data-frame) output of |
maxFdrQuery |
(numeric) a numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_MS2.QVALUE less than itself. |
maxIPFFdrQuery |
(numeric) A numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_IPF.QVALUE less than itself. (For PTM IPF use) |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_IPF", "DIA_Metabolomics". |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(list of dataframes) each dataframe has following columns:
transition_group_id |
(integer) a unique id for each precursor. |
RT |
(numeric) retention time as in FEATURE.EXP_RT of osw files. |
Intensity |
(numeric) peak intensity as in FEATURE_MS2.AREA_INTENSITY of osw files. |
leftWidth |
(numeric) as in FEATURE.LEFT_WIDTH of osw files. |
rightWidth |
(numeric) as in FEATURE.RIGHT_WIDTH of osw files. |
peak_group_rank |
(integer) rank of each feature associated with transition_group_id. |
m_score |
(numeric) q-value of each feature associated with transition_group_id. (If using 'DIA_IPF' runType, this will represent IPF's QVALUE) |
ms2_m_score |
(numeric) MS2 q-value of each feature associated with transition_group_id. (Will only be present if using 'DIA_IPF' runType) |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-04-06
getRunNames, fetchPrecursorsInfo
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) features <- getFeatures(fileInfo, maxFdrQuery = 1.00, runType = "DIA_Proteomics") dim(features[[2]]) # 938 8
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) features <- getFeatures(fileInfo, maxFdrQuery = 1.00, runType = "DIA_Proteomics") dim(features[[2]]) # 938 8
This function takes in timeVectors from both runs, global-fit mapped values of end-points of first time vector and sample-length of window of no constraining. Outside of window, all elements of matrix are either equally weighted or weighted proportional to distance from window-boundry.
getGlobalAlignMaskCpp(tA, tB, tBp, noBeef = 50L, hardConstrain = FALSE)
getGlobalAlignMaskCpp(tA, tB, tBp, noBeef = 50L, hardConstrain = FALSE)
tA |
(numeric) This vector has equally spaced timepoints of XIC A. |
tB |
(numeric) This vector has equally spaced timepoints of XIC B. |
tBp |
(numeric) mapping of tA to run B using some global fit. |
noBeef |
(integer) It defines the distance from the global fit, upto which no penalization is performed. |
hardConstrain |
(logical) if false; indices farther from noBeef distance are filled with distance from linear fit line. |
mask (matrix) A numeric matrix.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-08
tA <- c(1707.6, 1711, 1714.5, 1717.9, 1721.3, 1724.7, 1728.1, 1731.5, 1734.9, 1738.4) tB <- c(1765.7, 1769.1, 1772.5, 1775.9, 1779.3, 1782.7, 1786.2, 1789.6, 1793, 1796.4) tBp <- c(1786.9, 1790.35, 1793.9, 1797.36, 1800.81, 1804.26, 1807.71, 1811.17, 1814.62, 1818.17) noBeef <- 1 mask <- getGlobalAlignMaskCpp(tA, tB, tBp, noBeef, FALSE) round(mask, 3) matrix(c( 5.215,4.218,4.221,2.225,1.228,0,0,0,0.788, 1.785, 6.226,5.230,4.233,3.236,2.239,1.243,0,0,0, 0.774), nrow = 2, ncol = 10, byrow = FALSE) #image(mask) # A is on x-axis, B is on y-axis
tA <- c(1707.6, 1711, 1714.5, 1717.9, 1721.3, 1724.7, 1728.1, 1731.5, 1734.9, 1738.4) tB <- c(1765.7, 1769.1, 1772.5, 1775.9, 1779.3, 1782.7, 1786.2, 1789.6, 1793, 1796.4) tBp <- c(1786.9, 1790.35, 1793.9, 1797.36, 1800.81, 1804.26, 1807.71, 1811.17, 1814.62, 1818.17) noBeef <- 1 mask <- getGlobalAlignMaskCpp(tA, tB, tBp, noBeef, FALSE) round(mask, 3) matrix(c( 5.215,4.218,4.221,2.225,1.228,0,0,0,0.788, 1.785, 6.226,5.230,4.233,3.236,2.239,1.243,0,0,0, 0.774), nrow = 2, ncol = 10, byrow = FALSE) #image(mask) # A is on x-axis, B is on y-axis
This function selects features from oswFiles which has m-score < maxFdrLoess. It fits linear/loess regression on these feature. Retention-time mapping is established from reference to experiment run.
getGlobalAlignment( oswFiles, ref, eXp, fitType = "linear", maxFdrGlobal = 0.01, spanvalue = 0.1 )
getGlobalAlignment( oswFiles, ref, eXp, fitType = "linear", maxFdrGlobal = 0.01, spanvalue = 0.1 )
oswFiles |
(list of data-frames) it is output from getFeatures function. |
ref |
(string) Must be a combination of "run" and an iteger e.g. "run2". |
eXp |
(string) Must be a combination of "run" and an iteger e.g. "run2". |
fitType |
(string) Must be from "loess" or "linear". |
maxFdrGlobal |
(numeric) A numeric value between 0 and 1. Features should have m-score lower than this value for participation in global fit. |
spanvalue |
(numeric) Spanvalue for LOESS fit. For targeted proteomics 0.1 could be used. |
An object of class "loess".
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
data(oswFiles_DIAlignR, package="DIAlignR") fit <- getGlobalAlignment(oswFiles = oswFiles_DIAlignR, ref = "run1", eXp = "run2", fitType = "linear", maxFdrGlobal = 0.05, spanvalue = 0.1)
data(oswFiles_DIAlignR, package="DIAlignR") fit <- getGlobalAlignment(oswFiles = oswFiles_DIAlignR, ref = "run1", eXp = "run2", fitType = "linear", maxFdrGlobal = 0.05, spanvalue = 0.1)
Each element of the multipeptide is a collection of features associated with a peptide.
getMultipeptide( precursors, features, runType = "DIA_Proteomics", applyFun = lapply, masters = NULL )
getMultipeptide( precursors, features, runType = "DIA_Proteomics", applyFun = lapply, masters = NULL )
precursors |
(data-frames) Contains precursors and associated transition IDs. |
features |
(list of data-frames) Contains features and their properties identified in each run. |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_IPF", "DIA_Metabolomics". |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
masters |
(characters) names of extra runs. |
(list) of dataframes having following columns:
transition_group_id |
(integer) a unique id for each precursor. |
run |
(string) run identifier. |
RT |
(numeric) retention time as in FEATURE.EXP_RT of osw files. |
Intensity |
(numeric) peak intensity as in FEATURE_MS2.AREA_INTENSITY of osw files. |
leftWidth |
(numeric) as in FEATURE.LEFT_WIDTH of osw files. |
rightWidth |
(numeric) as in FEATURE.RIGHT_WIDTH of osw files. |
peak_group_rank |
(integer) rank of each feature associated with transition_group_id. |
m_score |
(numeric) q-value of each feature associated with transition_group_id. |
alignment_rank |
(integer) rank of each feature post-alignment. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-04-08
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath, oswMerged = TRUE) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") features <- getFeatures(fileInfo, maxFdrQuery = 0.05) multipeptide <- getMultipeptide(precursors, features) multipeptide[["9861"]]
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath, oswMerged = TRUE) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") features <- getFeatures(fileInfo, maxFdrQuery = 0.05) multipeptide <- getMultipeptide(precursors, features) multipeptide[["9861"]]
Returns instantiated mzRpwiz object associated to mzML file.
getMZMLpointers(fileInfo)
getMZMLpointers(fileInfo)
fileInfo |
(data-frame) Output of DIAlignR::getRunNames function |
(A list of mzRpwiz)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) mzPntrs <- getMZMLpointers(fileInfo)
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) mzPntrs <- getMZMLpointers(fileInfo)
Get transition(native) IDs for the peptides.
getNativeIDs(oswIn, peps, params = paramsDIAlignR(), oswMerged = TRUE)
getNativeIDs(oswIn, peps, params = paramsDIAlignR(), oswMerged = TRUE)
oswIn |
(string) path to the osw feature file. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
(integer) a vector of transition IDs.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2022) + GPL-3 Date: 2022-04-19
dataPath <- system.file("extdata", package = "DIAlignR") oswIn <- file.path(dataPath, "osw", "merged.osw") peps <- c(3L, 11L) # No transitions for 3L. params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" getNativeIDs(oswIn, peps, params) # 106468 106469 106470 106471 106472 106473
dataPath <- system.file("extdata", package = "DIAlignR") oswIn <- file.path(dataPath, "osw", "merged.osw") peps <- c(3L, 11L) # No transitions for 3L. params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" getNativeIDs(oswIn, peps, params) # 106468 106469 106470 106471 106472 106473
Get a list of dataframes that contains peptide scores, pvalues, and qvalues across all runs.
getPeptideScores( fileInfo, peptides, oswMerged = TRUE, runType = "DIA_Proteomics", context = "global" )
getPeptideScores( fileInfo, peptides, oswMerged = TRUE, runType = "DIA_Proteomics", context = "global" )
fileInfo |
(data-frame) Output of |
peptides |
(integer) Ids of peptides for which scores are required. |
oswMerged |
(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet. |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_IPF", "DIA_Metabolomics". |
context |
(string) Context used in pyprophet peptide. Must be either "run-specific", "experiment-wide", or "global". |
(list of dataframes) dataframe has following columns:
peptide_id |
(integer) a unique id for each precursor. |
run |
(character) as in SCORE_PEPTIDE.RUN_ID of osw files. |
score |
(numeric) as in SCORE_PEPTIDE.SCORE of osw files. |
pvalue |
(numeric) as in SCORE_PEPTIDE.PVALUE of osw files. |
qvalue |
(numeric) as in SCORE_PEPTIDE.QVALUE of osw files. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-07-01
getRunNames, fetchPeptidesInfo
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursorsInfo <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) peptidesInfo <- getPeptideScores(fileInfo, unique(precursorsInfo$peptide_id)) dim(peptidesInfo) # 684 5
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursorsInfo <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) peptidesInfo <- getPeptideScores(fileInfo, unique(precursorsInfo$peptide_id)) dim(peptidesInfo) # 684 5
Get a data-frame of analytes' transition_group_id, transition_ids, peptide_id and amino-acid sequences.
getPrecursorByID( analytes, fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics" )
getPrecursorByID( analytes, fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics" )
analytes |
(integer) a vector of integers. |
fileInfo |
(data-frame) Output of |
oswMerged |
(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet. |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_IPF", "DIA_Metabolomics". |
(data-frames) A data-frame having following columns:
transition_group_id |
(integer) a unique id for each precursor. |
transition_id |
(list) fragment-ion ID associated with transition_group_id. This is matched with chromatogram ID in mzML file. |
peptide_id |
(integer) a unique id for each peptide. A peptide can have multiple precursors. |
sequence |
(string) amino-acid sequence of the precursor with possible modifications. |
charge |
(integer) charge on the precursor. |
group_label |
(string) TODO Figure it out. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2019-04-06
getRunNames, fetchPrecursorsInfo
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath, oswMerged = TRUE) precursors <- getPrecursorByID(c(32L, 2474L), fileInfo, oswMerged = TRUE)
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath, oswMerged = TRUE) precursors <- getPrecursorByID(c(32L, 2474L), fileInfo, oswMerged = TRUE)
This function reads the header of chromatogram files. It then fetches chromatogram indices by matching transition_group_id(osw) with chromatogramID(xics).
getPrecursorIndices( fileInfo, precursors, mzPntrs, ions = ".*", applyFun = lapply )
getPrecursorIndices( fileInfo, precursors, mzPntrs, ions = ".*", applyFun = lapply )
fileInfo |
(data-frame) Output of getRunNames function. |
precursors |
(data-frame) Atleast two columns transition_group_id and transition_ids are required. |
mzPntrs |
A list of mzRpwiz. |
ions |
(character) Strings to specify ions. eg. i0, i1. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(list) A list of dataframes having following columns:
transition_group_id |
(string) it is PRECURSOR.ID from osw file. |
chromatogramIndex |
(integer) index of MS1 chromatogram in mzML file. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2022) + GPL-3 Date: 2022-01-15
chromatogramIdAsInteger, mapPrecursorToChromIndices
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") mzPntrs <- getMZMLpointers(fileInfo) prec2chromIndex <- getPrecursorIndices(fileInfo, precursors, mzPntrs) for(mz in mzPntrs) DBI::dbDisconnect(mz)
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") mzPntrs <- getMZMLpointers(fileInfo) prec2chromIndex <- getPrecursorIndices(fileInfo, precursors, mzPntrs) for(mz in mzPntrs) DBI::dbDisconnect(mz)
Get a data-frame of analytes' transition_group_id, transition_ids, peptide_id and amino-acid sequences.
getPrecursors( fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "global", maxPeptideFdr = 0.05, level = "Peptide", useIdentifying = FALSE )
getPrecursors( fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "global", maxPeptideFdr = 0.05, level = "Peptide", useIdentifying = FALSE )
fileInfo |
(data-frame) Output of |
oswMerged |
(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet. |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_IPF", "DIA_Metabolomics". |
context |
(string) Context used in pyprophet peptide. Must be either "run-specific", "experiment-wide", or "global". |
maxPeptideFdr |
(numeric) A numeric value between 0 and 1. It is used to filter peptides from osw file which have SCORE_PEPTIDE.QVALUE less than itself. |
level |
(string) Apply maxPeptideFDR on Protein as well if specified as "Protein". Default: "Peptide". |
useIdentifying |
(logical) Set TRUE to use identifying transitions in alignment. (DEFAULT: FALSE) |
(data-frames) A data-frame having following columns:
transition_group_id |
(integer) a unique id for each precursor. |
peptide_id |
(integer) a unique id for each peptide. A peptide can have multiple precursors. |
sequence |
(string) amino-acid sequence of the precursor with possible modifications. |
charge |
(integer) charge on the precursor. |
group_label |
(string) TODO Figure it out. |
transition_ids |
(list) fragment-ion ID associated with transition_group_id. This is matched with chromatogram ID in mzML file. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-04-06
getRunNames, fetchPrecursorsInfo
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursorsInfo <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) dim(precursorsInfo) # 234 6
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursorsInfo <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) dim(precursorsInfo) # 234 6
Construct an alignment feature mapping table for star align method, to map aligned features in experiment runs to the reference run
getRefExpFeatureMap(precursors, features, applyFun = lapply)
getRefExpFeatureMap(precursors, features, applyFun = lapply)
precursors |
(data-frames) Contains precursors and associated transition IDs. |
features |
(list of data-frames) Contains features and their properties identified in each run. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(list) of dataframes per precursor peptide id having following columns:
reference_feature_id |
(integer) id of reference feature. |
experiment_feature_id |
(integer) id of experiment feature aligned to reference feature. |
Justin Sing, [email protected]
ORCID: 0000-0003-0386-0092
License: (c) Author (2020) + GPL-3 Date: 2022-11-07
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath, oswMerged = TRUE) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") features <- getFeatures(fileInfo, maxFdrQuery = 0.05) multiFeatureAlignmentMap <- getRefExpFeatureMap(precursors, features) multiFeatureAlignmentMap[["9861"]]
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath, oswMerged = TRUE) precursors <- getPrecursors(fileInfo, oswMerged = TRUE, context = "experiment-wide") features <- getFeatures(fileInfo, maxFdrQuery = 0.05) multiFeatureAlignmentMap <- getRefExpFeatureMap(precursors, features) multiFeatureAlignmentMap[["9861"]]
Provides the reference run based on lowest p-value.
getRefRun(peptideScores, applyFun = lapply)
getRefRun(peptideScores, applyFun = lapply)
peptideScores |
(list of data-frames) each dataframe has scores of a peptide across all runs. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(dataframe) has two columns:
peptide_id |
(integer) a unique id for each peptide. |
run |
(string) run identifier. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-04-08
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursorsInfo <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) peptideIDs <- unique(precursorsInfo$peptide_id) peptidesInfo <- getPeptideScores(fileInfo, peptideIDs) peptidesInfo <- lapply(peptideIDs, function(pep) peptidesInfo [.(pep),]) names(peptidesInfo) <- as.character(peptideIDs) getRefRun(peptidesInfo)
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) precursorsInfo <- getPrecursors(fileInfo, oswMerged = TRUE, runType = "DIA_Proteomics", context = "experiment-wide", maxPeptideFdr = 0.05) peptideIDs <- unique(precursorsInfo$peptide_id) peptidesInfo <- getPeptideScores(fileInfo, peptideIDs) peptidesInfo <- lapply(peptideIDs, function(pep) peptidesInfo [.(pep),]) names(peptidesInfo) <- as.character(peptideIDs) getRefRun(peptidesInfo)
This function selects features from oswFiles which has m-score < maxFdrLoess. It fits linear/loess regression on these feature. Retention-time mapping is established from reference to experiment run.
getRTdf(features, ref, eXp, maxFdrGlobal)
getRTdf(features, ref, eXp, maxFdrGlobal)
features |
(list of data-frames) it is output from getFeatures function. |
ref |
(string) Must be a combination of "run" and an iteger e.g. "run2". |
eXp |
(string) Must be a combination of "run" and an iteger e.g. "run2". |
maxFdrGlobal |
(numeric) A numeric value between 0 and 1. Features should have m-score lower than this value for participation in global fit. |
A data-frame
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2019-03-01
data(oswFiles_DIAlignR, package="DIAlignR") df <- getRTdf(features = oswFiles_DIAlignR, ref = "run1", eXp = "run2", maxFdrGlobal = 0.05)
data(oswFiles_DIAlignR, package="DIAlignR") df <- getRTdf(features = oswFiles_DIAlignR, ref = "run1", eXp = "run2", maxFdrGlobal = 0.05)
Fetches all osw files, then, keeps only those runs which has corresponding mzML files. mzML file names must match with RUN.FILENAME columns of osw files.
getRunNames(dataPath, oswMerged = TRUE, params = paramsDIAlignR())
getRunNames(dataPath, oswMerged = TRUE, params = paramsDIAlignR())
dataPath |
(char) Path to xics and osw directory. |
oswMerged |
(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet. |
params |
(list) parameters are entered as list. Output of the |
(dataframe) it has five columns:
spectraFile |
(string) as mentioned in RUN table of osw files. |
runName |
(string) contain respective mzML names without extension. |
spectraFileID |
(string) ID in RUN table of osw files. |
featureFile |
(string) Path to the feature file. |
chromatogramFile |
(string) Path to the chromatogram file. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-14
dataPath <- system.file("extdata", package = "DIAlignR") getRunNames(dataPath = dataPath, oswMerged = TRUE)
dataPath <- system.file("extdata", package = "DIAlignR") getRunNames(dataPath = dataPath, oswMerged = TRUE)
Calculates similarity matrix for two sequences
getSeqSimMatCpp(seq1, seq2, match, misMatch)
getSeqSimMatCpp(seq1, seq2, match, misMatch)
seq1 |
(char) A single string. |
seq2 |
(char) A single string. |
match |
(double) Score for character match. |
misMatch |
(double) score for character mismatch. |
s (matrix) Numeric similarity matrix. Rows and columns expresses seq1 and seq2, respectively.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2019) + MIT Date: 2019-03-05
# Get sequence similarity of two DNA strings Match=10; MisMatch=-2 seq1 = "GCAT"; seq2 = "CAGTG" getSeqSimMatCpp(seq1, seq2, Match, MisMatch) matrix(c(-2, 10, -2, -2, -2, -2, 10, -2, 10, -2, -2, -2, -2, -2, -2, 10, 10, -2, -2, -2), 4, 5, byrow = FALSE)
# Get sequence similarity of two DNA strings Match=10; MisMatch=-2 seq1 = "GCAT"; seq2 = "CAGTG" getSeqSimMatCpp(seq1, seq2, Match, MisMatch) matrix(c(-2, 10, -2, -2, -2, -2, 10, -2, 10, -2, -2, -2, -2, -2, -2, 10, 10, -2, -2, -2), 4, 5, byrow = FALSE)
Get a list of data-frame of OpenSwath features that contains retention time, intensities, boundaries etc.
getTransitions( fileInfo, maxFdrQuery = 0.05, runType = "DIA_Proteomics", applyFun = lapply )
getTransitions( fileInfo, maxFdrQuery = 0.05, runType = "DIA_Proteomics", applyFun = lapply )
fileInfo |
(data-frame) output of |
maxFdrQuery |
(numeric) a numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_MS2.QVALUE less than itself. |
runType |
(char) yhis must be one of the strings "DIA_Proteomics", "DIA_Metabolomics". |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(list of dataframes) each dataframe has following columns:
transition_group_id |
(integer) a unique id for each precursor. |
RT |
(numeric) retention time as in FEATURE.EXP_RT of osw files. |
intensity |
(list) of peak intensities as in FEATURE_TRANSITION.AREA_INTENSITY of osw files. |
leftWidth |
(numeric) as in FEATURE.LEFT_WIDTH of osw files. |
rightWidth |
(numeric) as in FEATURE.RIGHT_WIDTH of osw files. |
peak_group_rank |
(integer) rank of each feature associated with transition_group_id. |
m_score |
(numeric) q-value of each feature associated with transition_group_id. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-11-15
getRunNames, fetchTransitionsFromRun
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) transitions <- getTransitions(fileInfo, maxFdrQuery = 1.00, runType = "DIA_Proteomics") dim(transitions[[2]]) # 938 8
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath) transitions <- getTransitions(fileInfo, maxFdrQuery = 1.00, runType = "DIA_Proteomics") dim(transitions[[2]]) # 938 8
For all the analytes requested in runs, it first creates oswFiles, then, fetches chromatogram indices from oswFiles and extract chromatograms from mzML files.
getXICs( analytes, runs, dataPath = ".", maxFdrQuery = 1, runType = "DIA_Proteomics", oswMerged = TRUE, params = paramsDIAlignR() )
getXICs( analytes, runs, dataPath = ".", maxFdrQuery = 1, runType = "DIA_Proteomics", oswMerged = TRUE, params = paramsDIAlignR() )
analytes |
(integer) a vector of precursor IDs. |
runs |
(vector of string) names of mzML files without extension. |
dataPath |
(string) Path to xics and osw directory. |
maxFdrQuery |
(numeric) A numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_MS2.QVALUE less than itself. |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_Metabolomics". |
oswMerged |
(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet. |
params |
(list) parameters are entered as list. Output of the |
A list of list. Each list contains XIC-group for that run. XIC-group is a list of dataframe that has elution time and intensity of fragment-ion XIC.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
getXICs4AlignObj, getRunNames, analytesFromFeatures
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") analytes <- c(32L, 898L, 2474L) XICs <- getXICs(analytes, runs = runs, dataPath = dataPath)
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") analytes <- c(32L, 898L, 2474L) XICs <- getXICs(analytes, runs = runs, dataPath = dataPath)
For all the analytes requested, it fetches chromatogram indices from prec2chromIndex and extracts chromatograms using mzPntrs.
getXICs4AlignObj(mzPntrs, fileInfo, runs, prec2chromIndex, analytes)
getXICs4AlignObj(mzPntrs, fileInfo, runs, prec2chromIndex, analytes)
mzPntrs |
a list of mzRpwiz. |
fileInfo |
(data-frame) output of getRunNames(). |
runs |
(vector of string) names of mzML files without extension. |
prec2chromIndex |
(list of data-frames) output of getChromatogramIndices(). Each dataframe has two columns: transition_group_id and chromatogramIndex. |
analytes |
(integer) a vector of precursor IDs. |
A list of list of data-frames. Each data frame has elution time and intensity of fragment-ion XIC.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
getChromatogramIndices, getRunNames
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt") analytes <- c(32L, 898L, 2474L) params <- paramsDIAlignR() params[["chromFile"]] <- "mzML" fileInfo <- getRunNames(dataPath = dataPath) fileInfo <- updateFileInfo(fileInfo, runs) precursors <- getPrecursorByID(analytes,fileInfo) mzPntrs <- getMZMLpointers(fileInfo) prec2chromIndex <- getChromatogramIndices(fileInfo, precursors, mzPntrs) XICs <- getXICs4AlignObj(mzPntrs, fileInfo, runs, prec2chromIndex, analytes) rm(mzPntrs)
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt") analytes <- c(32L, 898L, 2474L) params <- paramsDIAlignR() params[["chromFile"]] <- "mzML" fileInfo <- getRunNames(dataPath = dataPath) fileInfo <- updateFileInfo(fileInfo, runs) precursors <- getPrecursorByID(analytes,fileInfo) mzPntrs <- getMZMLpointers(fileInfo) prec2chromIndex <- getChromatogramIndices(fileInfo, precursors, mzPntrs) XICs <- getXICs4AlignObj(mzPntrs, fileInfo, runs, prec2chromIndex, analytes) rm(mzPntrs)
Fill missing intensities in a chromatogram
imputeChromatogram( chromatogram, method = "spline", polyOrd = 4, kernelLen = 9, splineMethod = "fmm" )
imputeChromatogram( chromatogram, method = "spline", polyOrd = 4, kernelLen = 9, splineMethod = "fmm" )
chromatogram |
(data-frames) first column is time, second column is intensity. |
method |
(string) must be either "spline", "sgolay" or "linear". |
polyOrd |
(integer) must be less than kernelLen. |
kernelLen |
(integer) must be an odd integer. |
splineMethod |
(string) must be either "fmm" or "natural". |
(dataframe) has two columns:
time |
(numeric) |
intensity |
(numeric) |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-05-21
time <- seq(from = 3003.4, to = 3048, by = 3.4) y <- c(0.2050595, 0.8850070, 2.2068768, 3.7212677, 5.1652605, 5.8288915, 5.5446804, 4.5671360, 3.3213154, 1.9485889, 0.9520709, 0.3294218, 0.2009581, 0.1420923) chrom <- data.frame(time, y) chrom$y[c(1,8, 14)] <- NA imputeChromatogram(chrom, "sgolay") imputeChromatogram(chrom, "spline") imputeChromatogram(chrom, "linear")
time <- seq(from = 3003.4, to = 3048, by = 3.4) y <- c(0.2050595, 0.8850070, 2.2068768, 3.7212677, 5.1652605, 5.8288915, 5.5446804, 4.5671360, 3.3213154, 1.9485889, 0.9520709, 0.3294218, 0.2009581, 0.1420923) chrom <- data.frame(time, y) chrom$y[c(1,8, 14)] <- NA imputeChromatogram(chrom, "sgolay") imputeChromatogram(chrom, "spline") imputeChromatogram(chrom, "linear")
Takes a time vector and index vector of same length. This function create a
new time vector given indices specified in idx
.
mapIdxToTime(timeVec, idx)
mapIdxToTime(timeVec, idx)
timeVec |
A numeric vector |
idx |
An integer vector |
A mutated time vector
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
timeVec <- c(1.3,5.6,7.8) idx <- c(NA, NA, 1L, 2L, NA, NA, 3L, NA) mapIdxToTime(timeVec, idx) # c(NA, NA, 1.3, 5.6, 6.333, 7.067, 7.8, NA)
timeVec <- c(1.3,5.6,7.8) idx <- c(NA, NA, 1L, 2L, NA, NA, 3L, NA) mapIdxToTime(timeVec, idx) # c(NA, NA, 1.3, 5.6, 6.333, 7.067, 7.8, NA)
Created merged XICs of peptide (ID = 4618) 14299_QFNNTDIVLLEDFQK/3 from two SWATH runs:
run1 : hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
run2 : hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
masterXICs_DIAlignR
masterXICs_DIAlignR
The format is similar to the output of childXICs. A list of two elements: First element contains six fragmentation chromatograms. The second element has aligned parent time-vectors and corresponding child time-vector. It has five columns:
(integer) aligned indices of reference run.
(integer) aligned indices of experiment run.
(numeric) aligned time-vector of reference run.
(numeric) aligned time-vector of experiment run.
(numeric) aligned time-vector of master run.
File test_GenerateData.R has source code to generate the example data.
This function expects osw and xics directories at dataPath. It first reads osw files and fetches
chromatogram indices for each analyte. To perform alignment, first a guide-tree is built using mst
which
can also be provided with mstNet parameter. As we traverse from the start node to the other nodes,
runs are aligned pairwise.
mstAlignRuns( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, runs = NULL, peps = NULL, mstNet = NULL, applyFun = lapply )
mstAlignRuns( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, runs = NULL, peps = NULL, mstNet = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
scoreFile |
(string) path to the peptide score file, needed when oswMerged is FALSE. |
runs |
(string) names of xics file without extension. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
mstNet |
(matrix) array of tree-edges. Look up |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(None)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-05-15
alignTargetedRuns, progAlignRuns, getMST
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" mstAlignRuns(dataPath, params = params, outFile = "DIAlignR") ## Not run: y <- strsplit("run0 run1\nrun2 run2", split = '\n')[[1]] y <- cbind(A = strsplit(y[1], " ")[[1]], B = strsplit(y[2], " ")[[1]]) plot(igraph::graph_from_edgelist(y, directed = FALSE)) ## End(Not run)
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" mstAlignRuns(dataPath, params = params, outFile = "DIAlignR") ## Not run: y <- strsplit("run0 run1\nrun2 run2", split = '\n')[[1]] y <- cbind(A = strsplit(y[1], " ")[[1]], B = strsplit(y[2], " ")[[1]]) plot(igraph::graph_from_edgelist(y, directed = FALSE)) ## End(Not run)
Extract features and generate minimum spanning tree.
mstScript1( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, runs = NULL, mstNet = NULL, applyFun = lapply )
mstScript1( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, runs = NULL, mstNet = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
runs |
(string) names of xics file without extension. |
mstNet |
(string) minimum spanning tree in string format. See example of mstScript2. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2022) + GPL-3 Date: 2022-04-19
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) mstScript1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) file.remove(file.path(dataPath, "testDIAlignR_mst1.RData"))
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) mstScript1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) file.remove(file.path(dataPath, "testDIAlignR_mst1.RData"))
Performs alignment using mstScript1 output
mstScript2( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, peps = NULL, mstNet = NULL, applyFun = lapply )
mstScript2( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, peps = NULL, mstNet = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
scoreFile |
(string) path to the peptide score file, needed when oswMerged is FALSE. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
mstNet |
(string) minimum spanning tree in string format. See example of mstScript2. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2022) + GPL-3 Date: 2022-04-19
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) mstScript1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) mstNet <- "run0 run0\nrun1 run2" mstScript2(dataPath, outFile = "testDIAlignR", params = params, mstNet = mstNet, applyFun = lapply) file.remove(file.path(dataPath, "testDIAlignR_mst1.RData")) file.remove("testDIAlignR.tsv")
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) mstScript1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) mstNet <- "run0 run0\nrun1 run2" mstScript2(dataPath, outFile = "testDIAlignR", params = params, mstNet = mstNet, applyFun = lapply) file.remove(file.path(dataPath, "testDIAlignR_mst1.RData")) file.remove("testDIAlignR.tsv")
analytes info from three SWATH runs:
run0 : hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt.chrom.mzML
run1 : hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
run2 : hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
multipeptide_DIAlignR
multipeptide_DIAlignR
A list of 199 elements where each element represents a precursor and consists of a dataframe:
ID of each precursor. Same as the name of the list
Retention time, in sec
Inensity of associated feature
Left width of the peak, in sec
Right width of the peak, in sec
Ranking of associated feature
qvalue of associated feature
Name of the run, feature is from
Rank of the feature after alignment
Raw files are downloaded from Peptide Atlas. File test_GenerateData.R has source code to generate the example data.
analytes info from three SWATH runs:
run0 : hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt.chrom.mzML
run1 : hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
run2 : hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
oswFiles_DIAlignR
oswFiles_DIAlignR
A list of three elements where each element consists of a dataframe:
ID of each peptide
Retention time, in sec
Inensity of associated feature
Left width of the peak, in sec
Right width of the peak, in sec
Ranking of associated feature
qvalue of associated feature
Raw files are downloaded from Peptide Atlas. File test_GenerateData.R has source code to generate the example data.
Get child chromatogram for other precursors using main precursor alignment
otherChildXICpp( l1, l2, kernelLen, polyOrd, mat, childTime, wRef = 0.5, splineMethod = "natural" )
otherChildXICpp( l1, l2, kernelLen, polyOrd, mat, childTime, wRef = 0.5, splineMethod = "natural" )
l1 |
(list) A list of numeric matrix of two columns. l1 and l2 should have same length. |
l2 |
(list) A list of numeric matrix of two columns. l1 and l2 should have same length. |
kernelLen |
(integer) length of filter. Must be an odd number. |
polyOrd |
(integer) TRUE: remove background from peak signal using estimated noise levels. |
mat |
(matrix) aligned time and child time from the main precursor. |
childTime |
(numeric) iime vector from the child chromatogram. |
wRef |
(numeric) Weight of the reference XIC. Must be between 0 and 1. |
splineMethod |
(string) must be either "fmm" or "natural". |
(List) of chromatograms.
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2021) + MIT Date: 2021-01-08
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR XICs.ref <- lapply(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) XICs.eXp <- lapply(XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) Bp <- seq(4964.752, 5565.462, length.out = nrow(XICs.ref[[1]])) chrom <- getChildXICpp(XICs.ref, XICs.eXp, 11L, 4L, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", Bp = Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100, wRef = 0.5, keepFlanks= TRUE) chrom2 <- otherChildXICpp(XICs.ref, XICs.eXp, 11L, 4L, chrom[[2]], chrom[[1]][[1]][,1], 0.5, "natural")
data(XIC_QFNNTDIVLLEDFQK_3_DIAlignR, package="DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR XICs.ref <- lapply(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) XICs.eXp <- lapply(XICs[["hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], as.matrix) Bp <- seq(4964.752, 5565.462, length.out = nrow(XICs.ref[[1]])) chrom <- getChildXICpp(XICs.ref, XICs.eXp, 11L, 4L, alignType = "hybrid", adaptiveRT = 77.82315, normalization = "mean", simType = "dotProductMasked", Bp = Bp, goFactor = 0.125, geFactor = 40, cosAngleThresh = 0.3, OverlapAlignment = TRUE, dotProdThresh = 0.96, gapQuantile = 0.5, hardConstrain = FALSE, samples4gradient = 100, wRef = 0.5, keepFlanks= TRUE) chrom2 <- otherChildXICpp(XICs.ref, XICs.eXp, 11L, 4L, chrom[[2]], chrom[[1]][[1]][,1], 0.5, "natural")
Retention alignment requires OpenSWATH/pyProphet extracted features and chromatograms. This function provides a suite of parameters used for selecting features and manipulating chromatograms. Chromatogram alignment can be performed via reference based or progressively via rooted or unrooted tree. This function provides sensible parameters for these tasks.
paramsDIAlignR()
paramsDIAlignR()
A list of parameters:
runType |
(string) must be one of the strings "DIA_Proteomics", "DIA_IPF", "DIA_Metabolomics". |
chromFile |
(string) must either be "mzML" or "sqMass". |
maxFdrQuery |
(numeric) a numeric value between 0 and 1. It is used to filter peptides from osw file which have SCORE_MS2.QVALUE less than itself. |
maxIPFFdrQuery |
(numeric) A numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_IPF.QVALUE less than itself. (For PTM IPF use) |
maxPeptideFdr |
(numeric) a numeric value between 0 and 1. It is used to filter peptides from osw file which have SCORE_PEPTIDE.QVALUE less than itself. |
analyteFDR |
(numeric) the upper limit of feature FDR to be it considered for building tree. |
treeDist |
(string) the method used to build distance matrix. Must be either "rsquared", "count" or "RSE". |
treeAgg |
(string) the method used for agglomeration while performing hierarchical clustering. Must be either "single", "average" or "complete". |
alignToRoot |
(logical) if TRUE, align leaves to the root in hierarchical clustering, else use already save aligned vectors. |
prefix |
(string) name to be used to define merged runs. |
context |
(string) used in pyprophet peptide. Must be either "run-specific", "experiment-wide", or "global". |
unalignedFDR |
(numeric) must be between 0 and maxFdrQuery. Features below unalignedFDR are considered for quantification even without the RT alignment. |
alignedFDR1 |
(numeric) must be between unalignedFDR and alignedFDR2. Features below alignedFDR1 and aligned to the reference are considered for quantification. |
alignedFDR2 |
(numeric) must be between alignedFDR1 and maxFdrQuery. Features below alignedFDR2 and within certain distance from the aligned time are considered for quantification after the alignment. |
criterion |
(integer) strategy to select peak if found overlapping peaks. 1:intensity, 2: RT overlap, 3: mscore, 4: edge distance |
level |
(string) apply maxPeptideFDR on Protein as well if specified as "Protein". Default: "Peptide". |
integrationType |
(string) method to ompute the area of a peak contained in XICs. Must be from "intensity_sum", "trapezoid", "simpson". |
baseSubtraction |
logical TRUE: remove background from peak signal using estimated noise levels. |
baselineType |
(string) method to estimate the background of a peak contained in XICs. Must be from "none", "base_to_base", "vertical_division_min", "vertical_division_max". |
fitEMG |
(logical) enable/disable exponentially modified gaussian peak model fitting. |
recalIntensity |
(logical) recalculate intensity for all analytes. |
fillMissing |
(logical) calculate intensity for ananlytes for which features are not found. |
XICfilter |
(string) must be either sgolay, boxcar, gaussian, loess or none. |
polyOrd |
(integer) order of the polynomial to be fit in the kernel. |
kernelLen |
(integer) number of data-points to consider in the kernel. |
globalAlignment |
(string) must be either "loess" or "linear". |
globalAlignmentFdr |
(numeric) a numeric value between 0 and 1. Features should have m-score lower than this value for participation in LOESS fit. |
globalAlignmentSpan |
(numeric) spanvalue for LOESS fit. For targeted proteomics 0.1 could be used. |
RSEdistFactor |
(numeric) defines how much distance in the unit of rse remains a noBeef zone. |
normalization |
(string) must be selected from "mean", "l2". |
simMeasure |
(string) must be selected from dotProduct, cosineAngle, crossCorrelation, cosine2Angle, dotProductMasked, euclideanDist, covariance and correlation. |
alignType |
(numeric) available alignment methods are "global", "local" and "hybrid". |
goFactor |
(numeric) penalty for introducing first gap in alignment. This value is multiplied by base gap-penalty. Should be between 10-1000. |
geFactor |
(numeric) penalty for introducing subsequent gaps in alignment. This value is multiplied by base gap-penalty. |
cosAngleThresh |
(numeric) in simType = dotProductMasked mode, angular similarity should be higher than cosAngleThresh otherwise similarity is forced to zero. |
OverlapAlignment |
(logical) an input for alignment with free end-gaps. False: Global alignment, True: overlap alignment. |
dotProdThresh |
(numeric) in simType = dotProductMasked mode, values in similarity matrix higher than dotProdThresh quantile are checked for angular similarity. |
gapQuantile |
(numeric) must be between 0 and 1. This is used to calculate base gap-penalty from similarity distribution. |
kerLen |
(integer) In simType = crossCorrelation, length of the kernel used to sum similarity score. Must be an odd number. |
hardConstrain |
(logical) if FALSE; indices farther from noBeef distance are filled with distance from linear fit line. |
samples4gradient |
(numeric) modulates penalization of masked indices. |
fillMethod |
(string) must be either "spline", "sgolay" or "linear". |
splineMethod |
(string) must be either "fmm" or "natural". |
mergeTime |
(string) must be either "ref", "avg", "refStart" or "refEnd". |
keepFlanks |
(logical) TRUE: Flanking chromatogram is not removed. |
fraction |
(integer) indicates which fraction to align. |
fractionNum |
(integer) Number of fractions to divide the alignment. |
lossy |
(logical) if TRUE, time and intensity are lossy-compressed in generated sqMass file. |
useIdentifying |
(logical) Set TRUE to use identifying transitions in alignment. (DEFAULT: FALSE) |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-07-11
checkParams, alignTargetedRuns
params <- paramsDIAlignR()
params <- paramsDIAlignR()
Plot aligned XICs group for a specific peptide. AlignObjOutput is the output from getAlignObjs fucntion.
plotAlignedAnalytes( AlignObjOutput, plotType = "All", outFile = "AlignedAnalytes.pdf", annotatePeak = FALSE, saveFigs = FALSE )
plotAlignedAnalytes( AlignObjOutput, plotType = "All", outFile = "AlignedAnalytes.pdf", annotatePeak = FALSE, saveFigs = FALSE )
AlignObjOutput |
(list) list contains fileInfo, AlignObj, raw XICs for reference and experiment, and reference-peak label. |
plotType |
(string) must be one of the strings "All", "onlyUnaligned" and "onlyAligned". |
outFile |
(string) name of the output pdf file. |
annotatePeak |
(logical) TRUE: Peak boundaries and apex will be highlighted. |
saveFigs |
(logical) TRUE: Figures will be saved in AlignedAnalytes.pdf . |
A plot to the current device.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") AlignObjOutput <- getAlignObjs(analytes = 4618L, runs, dataPath = dataPath) plotAlignedAnalytes(AlignObjOutput)
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") AlignObjOutput <- getAlignObjs(analytes = 4618L, runs, dataPath = dataPath) plotAlignedAnalytes(AlignObjOutput)
Plot aligned path through the similarity matrix. Reference run has indices on X-axis, eXp run has them on Y-axis. In getAlignObjs function, objType must be set to medium.
plotAlignmentPath(AlignObjOutput)
plotAlignmentPath(AlignObjOutput)
AlignObjOutput |
(list) The list contains AlignObj, raw XICs for reference and experiment, and reference-peak label. |
A plot to the current device.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
library(lattice) dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") AlignObjOutput <- getAlignObjs(analytes = 4618L, runs, dataPath = dataPath, objType = "medium") plotAlignmentPath(AlignObjOutput)
library(lattice) dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") AlignObjOutput <- getAlignObjs(analytes = 4618L, runs, dataPath = dataPath, objType = "medium") plotAlignmentPath(AlignObjOutput)
Plot extracted-ion chromatogram.
plotAnalyteXICs( analyte, run, dataPath = ".", maxFdrQuery = 1, XICfilter = "sgolay", polyOrd = 4, kernelLen = 9, runType = "DIA_Proteomics", oswMerged = TRUE, peakAnnot = NULL, Title = NULL )
plotAnalyteXICs( analyte, run, dataPath = ".", maxFdrQuery = 1, XICfilter = "sgolay", polyOrd = 4, kernelLen = 9, runType = "DIA_Proteomics", oswMerged = TRUE, peakAnnot = NULL, Title = NULL )
analyte |
(integer) an analyte is a PRECURSOR.ID from the osw file. |
run |
(string) Name of a xics file without extension. |
dataPath |
(string) path to xics and osw directory. |
maxFdrQuery |
(numeric) A numeric value between 0 and 1. It is used to filter features from osw file which have SCORE_MS2.QVALUE less than itself. |
XICfilter |
(string) must be either sgolay, boxcar, gaussian, loess or none. |
polyOrd |
(integer) order of the polynomial to be fit in the kernel. |
kernelLen |
(integer) number of data-points to consider in the kernel. |
runType |
(char) This must be one of the strings "DIA_Proteomics", "DIA_Metabolomics". |
oswMerged |
(logical) TRUE for experiment-wide FDR and FALSE for run-specific FDR by pyprophet. |
peakAnnot |
(numeric) Peak-apex time. |
Title |
(logical) TRUE: name of the list will be displayed as title. |
A plot to the current device.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
dataPath <- system.file("extdata", package = "DIAlignR") run <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" plotAnalyteXICs(analyte = 2474L, run, dataPath = dataPath, oswMerged = TRUE, XICfilter = "none") plotAnalyteXICs(analyte = 2474L, run, dataPath = dataPath, oswMerged = TRUE, XICfilter = "sgolay")
dataPath <- system.file("extdata", package = "DIAlignR") run <- "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt" plotAnalyteXICs(analyte = 2474L, run, dataPath = dataPath, oswMerged = TRUE, XICfilter = "none") plotAnalyteXICs(analyte = 2474L, run, dataPath = dataPath, oswMerged = TRUE, XICfilter = "sgolay")
Plot Extracted-ion chromatogram group.
plotXICgroup(XIC_group, peakAnnot = NULL, Title = NULL)
plotXICgroup(XIC_group, peakAnnot = NULL, Title = NULL)
XIC_group |
(list) It is a list of dataframe which has two columns. First column is for time and second column indicates intensity. |
peakAnnot |
(numeric) Peak-apex time. |
Title |
(logical) TRUE: name of the list will be displayed as title. |
A plot to the current device.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2019) + GPL-3 Date: 2019-12-13
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") XICs <- getXICs(analytes = 4618L, runs = runs, dataPath = dataPath, oswMerged = TRUE) plotXICgroup(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]]) XICs <- smoothXICs(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], type = "sgolay", kernelLen = 13, polyOrd = 4) plotXICgroup(XICs, Title = "Precursor 4618 \n run hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt")
dataPath <- system.file("extdata", package = "DIAlignR") runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt") XICs <- getXICs(analytes = 4618L, runs = runs, dataPath = dataPath, oswMerged = TRUE) plotXICgroup(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]]) XICs <- smoothXICs(XICs[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]], type = "sgolay", kernelLen = 13, polyOrd = 4) plotXICgroup(XICs, Title = "Precursor 4618 \n run hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt")
This function expects osw and xics directories at dataPath. It first reads osw files and fetches chromatogram indices for each analyte. To perform alignment, first a crude guide-tree is built which can also be provided with newickTree parameter. As we traverse from the leaf-nodes to the root node, runs are aligned pairwise. The root node is named master1 that has average of all fragment ion chromatograms and identified peak-groups. These features are propagated back to leaf nodes and finally aligned features are written in the output file.
progAlignRuns( dataPath, params, outFile = "DIAlignR", ropenms = NULL, oswMerged = TRUE, scoreFile = NULL, runs = NULL, peps = NULL, newickTree = NULL, applyFun = lapply )
progAlignRuns( dataPath, params, outFile = "DIAlignR", ropenms = NULL, oswMerged = TRUE, scoreFile = NULL, runs = NULL, peps = NULL, newickTree = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
params |
(list) parameters are entered as list. Output of the |
outFile |
(string) name of the output file. |
ropenms |
(pyopenms module) get this python module through |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
scoreFile |
(string) path to the peptide score file, needed when oswMerged is FALSE. |
runs |
(string) names of xics file without extension. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
newickTree |
(string) guidance tree in newick format. Look up |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
(None)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-07-10
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" ## Not run: ropenms <- get_ropenms(condaEnv = "envName") progAlignRuns(dataPath, params = params, outFile = "test3", ropenms = ropenms) # Removing aligned vectors file.remove(list.files(dataPath, pattern = "*_av.rds", full.names = TRUE)) # Removing temporarily created master chromatograms file.remove(list.files(file.path(dataPath, "xics"), pattern = "^master[A-Za-z0-9]+\\.chrom\\.sqMass$", full.names = TRUE)) file.remove(file.path(dataPath, "test3.temp.RData")) file.remove(file.path(dataPath, "master.merged.osw")) ## End(Not run)
dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" ## Not run: ropenms <- get_ropenms(condaEnv = "envName") progAlignRuns(dataPath, params = params, outFile = "test3", ropenms = ropenms) # Removing aligned vectors file.remove(list.files(dataPath, pattern = "*_av.rds", full.names = TRUE)) # Removing temporarily created master chromatograms file.remove(list.files(file.path(dataPath, "xics"), pattern = "^master[A-Za-z0-9]+\\.chrom\\.sqMass$", full.names = TRUE)) file.remove(file.path(dataPath, "test3.temp.RData")) file.remove(file.path(dataPath, "master.merged.osw")) ## End(Not run)
Step 3 for progressive alignment
progComb3( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
progComb3( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
params |
(list) parameters are entered as list. Output of the |
outFile |
(string) name of the output file. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-03-03
Step 2 for progressive alignment
progSplit2( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
progSplit2( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
params |
(list) parameters are entered as list. Output of the |
outFile |
(string) name of the output file. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-03-03
Step 4 for progressive alignment
progSplit4( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
progSplit4( dataPath, params, outFile = "DIAlignR", oswMerged = TRUE, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
params |
(list) parameters are entered as list. Output of the |
outFile |
(string) name of the output file. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-03-03
Step 1 for progressive alignment
progTree1( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), groups = NULL, oswMerged = TRUE, scoreFile = NULL, peps = NULL, runs = NULL, newickTree = NULL, applyFun = lapply )
progTree1( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), groups = NULL, oswMerged = TRUE, scoreFile = NULL, peps = NULL, runs = NULL, newickTree = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
groups |
(data-frame) contains the run names and their categories/batch id to keep them on the same branch of the tree. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
scoreFile |
(string) path to the peptide score file, needed when oswMerged is FALSE. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
runs |
(string) names of xics file without extension. |
newickTree |
(string) guidance tree in newick format. Look up |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-03-03
For the give peak boundary in peakTable, the function extracts raw chromatograms and recalculate intensities.
recalculateIntensity( peakTable, dataPath = ".", oswMerged = TRUE, params = paramsDIAlignR() )
recalculateIntensity( peakTable, dataPath = ".", oswMerged = TRUE, params = paramsDIAlignR() )
peakTable |
(data-table) usually an output of alignTargetedRuns. Must have these columns: precursor, run, intensity, leftWidth, rightWidth. |
dataPath |
(string) path to xics and osw directory. |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
params |
(list) parameters are entered as list. Output of the |
(data-table)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-05-28
alignTargetedRuns, calculateIntensity
library(data.table) peakTable <- data.table(precursor = c(1967L, 1967L, 2474L, 2474L), run = rep(c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"), 2), intensity = c(186.166, 579.832, 47.9525, 3.7413), leftWidth = c(5001.76, 5025.66, 6441.51, 6516.6), rightWidth = c(5076.86, 5121.25, 6475.65, 6554.2)) dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params$smoothPeakArea <- TRUE recalculateIntensity(peakTable, dataPath, params = params) peakTable <- data.table(precursor = c(1967L, 1967L, 2474L, 2474L), run = rep(c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"), 2), intensity = list(NA, NA, NA, NA), leftWidth = c(5001.76, 5025.66, 6441.51, 6516.6), rightWidth = c(5076.86, 5121.25, 6475.65, 6554.2)) params$transitionIntensity <- TRUE recalculateIntensity(peakTable, dataPath, params = params)
library(data.table) peakTable <- data.table(precursor = c(1967L, 1967L, 2474L, 2474L), run = rep(c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"), 2), intensity = c(186.166, 579.832, 47.9525, 3.7413), leftWidth = c(5001.76, 5025.66, 6441.51, 6516.6), rightWidth = c(5076.86, 5121.25, 6475.65, 6554.2)) dataPath <- system.file("extdata", package = "DIAlignR") params <- paramsDIAlignR() params$smoothPeakArea <- TRUE recalculateIntensity(peakTable, dataPath, params = params) peakTable <- data.table(precursor = c(1967L, 1967L, 2474L, 2474L), run = rep(c("hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"), 2), intensity = list(NA, NA, NA, NA), leftWidth = c(5001.76, 5025.66, 6441.51, 6516.6), rightWidth = c(5076.86, 5121.25, 6475.65, 6554.2)) params$transitionIntensity <- TRUE recalculateIntensity(peakTable, dataPath, params = params)
Create a sqMass file that has chromatograms for given native IDs.
reduceXICs(nativeIDs, xicFileIn, xicFileOut)
reduceXICs(nativeIDs, xicFileIn, xicFileOut)
nativeIDs |
(integer) transition IDs to be kept. |
xicFileIn |
(character) name to the current file. |
xicFileOut |
(character) name of the new file. |
(None)
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2022) + GPL-3 Date: 2022-04-19
dataPath <- system.file("extdata", package = "DIAlignR") oswIn <- file.path(dataPath, "osw", "merged.osw") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" ids <- getNativeIDs(oswIn, 1338L, params) xicFileIn <- file.path(dataPath, "xics", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.sqMass") reduceXICs(ids, xicFileIn, xicFileOut = "temp.chrom.sqMass") file.remove("temp.chrom.sqMass")
dataPath <- system.file("extdata", package = "DIAlignR") oswIn <- file.path(dataPath, "osw", "merged.osw") params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" ids <- getNativeIDs(oswIn, 1338L, params) xicFileIn <- file.path(dataPath, "xics", "hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.sqMass") reduceXICs(ids, xicFileIn, xicFileOut = "temp.chrom.sqMass") file.remove("temp.chrom.sqMass")
Extract features and generate pairwise alignments.
script1( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, runs = NULL, applyFun = lapply )
script1( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, runs = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
runs |
(string) names of xics file without extension. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-02-20
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) script1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) file.remove(file.path(dataPath, "testDIAlignR_script1.RData"))
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) script1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) file.remove(file.path(dataPath, "testDIAlignR_script1.RData"))
Performs alignment using script1 output
script2( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, peps = NULL, refRun = NULL, applyFun = lapply )
script2( dataPath, outFile = "DIAlignR", params = paramsDIAlignR(), oswMerged = TRUE, scoreFile = NULL, peps = NULL, refRun = NULL, applyFun = lapply )
dataPath |
(string) path to xics and osw directory. |
outFile |
(string) name of the output file. |
params |
(list) parameters are entered as list. Output of the |
oswMerged |
(logical) TRUE if merged file from pyprophet is used. |
scoreFile |
(string) path to the peptide score file, needed when oswMerged is FALSE. |
peps |
(integer) ids of peptides to be aligned. If NULL, align all peptides. |
refRun |
(string) reference for alignment. If no run is provided, m-score is used to select reference run. |
applyFun |
(function) value must be either lapply or BiocParallel::bplapply. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2021) + GPL-3 Date: 2021-02-20
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) script1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) script2(dataPath, outFile = "testDIAlignR", params = params, applyFun = lapply) file.remove(file.path(dataPath, "testDIAlignR_script1.RData"))
params <- paramsDIAlignR() params[["context"]] <- "experiment-wide" dataPath <- system.file("extdata", package = "DIAlignR") BiocParallel::register(BiocParallel::MulticoreParam(workers = 4, progressbar = TRUE)) script1(dataPath, outFile = "testDIAlignR", params = params, applyFun = BiocParallel::bplapply) script2(dataPath, outFile = "testDIAlignR", params = params, applyFun = lapply) file.remove(file.path(dataPath, "testDIAlignR_script1.RData"))
Smooth chromatogram with savitzky-golay filter.
sgolayCpp(chrom, kernelLen, polyOrd)
sgolayCpp(chrom, kernelLen, polyOrd)
chrom |
(matrix) chromatogram containing time and intensity vectors. |
kernelLen |
(integer) length of filter. Must be an odd number. |
polyOrd |
(integer) TRUE: remove background from peak signal using estimated noise levels. |
(matrix).
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2020) + MIT Date: 2019-12-31
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR", package = "DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] xic <- sgolayCpp(as.matrix(XICs[[1]]), kernelLen = 11L, polyOrd = 4L)
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR", package = "DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt"]][["4618"]] xic <- sgolayCpp(as.matrix(XICs[[1]]), kernelLen = 11L, polyOrd = 4L)
Smoothing methods are Savitzky-Golay, Boxcar, Gaussian kernel and LOESS. Savitzky-Golay smoothing is good at preserving peak-shape compared to gaussian and boxcar smoothing. However, it assumes equidistant points that fortunately is the case for DIA data. This requires a quadratic memory to store the fit and slower than other smoothing methods.
smoothSingleXIC( chromatogram, type, samplingTime = NULL, kernelLen = NULL, polyOrd = NULL )
smoothSingleXIC( chromatogram, type, samplingTime = NULL, kernelLen = NULL, polyOrd = NULL )
chromatogram |
(dataframe) A dataframe of two columns. First column must always be monotonically increasing. |
type |
(char) must be either sgolay, boxcar, gaussian, loess or none. |
samplingTime |
(numeric) Time difference between neighboring points. |
kernelLen |
(integer) Number of data-points to consider in the kernel. |
polyOrd |
(integer) Order of the polynomial to be fit in the kernel. |
Gaussian smoothing uses a gaussian function whose bandwidth is scaled by 0.3706505 to have quartiles at +/- 0.25*bandwidth. The point selection cut-off is also hard at 0.3706505*4*bandwidth.
qnorm(0.75, sd = 0.3706505)
The definition of C_ksmooth can be found using getAnywhere('C_ksmooth') stats:::C_ksmooth
A dataframe with two columns.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL3 Date: 2020-02-21
https://terpconnect.umd.edu/~toh/spectrum/Smoothing.html, https://rafalab.github.io/dsbook/smoothing.html, https://github.com/SurajGupta/r-source/blob/master/src/library/stats/src/ksmooth.c
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR") chrom <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]][["4618"]][[1]] ## Not run: newChrom <- smoothSingleXIC(chrom, type = "sgolay", samplingTime = 3.42, kernelLen = 9, polyOrd = 3) ## End(Not run)
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR") chrom <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]][["4618"]][[1]] ## Not run: newChrom <- smoothSingleXIC(chrom, type = "sgolay", samplingTime = 3.42, kernelLen = 9, polyOrd = 3) ## End(Not run)
Smoothing methods are Savitzky-Golay, Boxcar, Gaussian kernel and LOESS. Savitzky-Golay smoothing is good at preserving peak-shape compared to gaussian and boxcar smoothing. However, it assumes equidistant points that fortunately is the case for DIA data. This requires a quadratic memory to store the fit and slower than other smoothing methods.
smoothXICs( XICs, type = "none", samplingTime = NULL, kernelLen = 9L, polyOrd = NULL )
smoothXICs( XICs, type = "none", samplingTime = NULL, kernelLen = 9L, polyOrd = NULL )
XICs |
(A list) A list of dataframe that consists of two columns. First column must be monotonically increasing. |
type |
(char) must be either sgolay, boxcar, gaussian, loess or none. |
samplingTime |
(numeric) Time difference between neighboring points. |
kernelLen |
(integer) Number of data-points to consider in the kernel. |
polyOrd |
(integer) Order of the polynomial to be fit in the kernel. |
A list.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL3 Date: 2020-02-21
https://terpconnect.umd.edu/~toh/spectrum/Smoothing.html, https://rafalab.github.io/dsbook/smoothing.html
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]][["4618"]] newXICs <- smoothXICs(XICs, type = "sgolay", samplingTime = 3.42, kernelLen = 9, polyOrd = 3)
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]][["4618"]] newXICs <- smoothXICs(XICs, type = "sgolay", samplingTime = 3.42, kernelLen = 9, polyOrd = 3)
Interpolate using spline
splineFillCpp(x, y, xout)
splineFillCpp(x, y, xout)
x |
(numeric) A numeric matrix with similarity values of two sequences or signals. |
y |
(numeric) Penalty for introducing first gap in alignment. |
xout |
(numeric) Penalty for introducing subsequent gaps in alignment. |
(numeric)
Shubham Gupta, [email protected] ORCID: 0000-0003-3500-8152 License: (c) Author (2021) + MIT Date: 2021-01-06
time <- seq(from = 3003.4, to = 3048, by = 3.4) y <- c(0.2050595, 0.8850070, 2.2068768, 3.7212677, 5.1652605, 5.8288915, 5.5446804, 4.5671360, 3.3213154, 1.9485889, 0.9520709, 0.3294218, 0.2009581, 0.1420923) y[c(1,6)] <- NA_real_ idx <- !is.na(y) splineFillCpp(time[idx], y[idx], time[!idx]) zoo::na.spline(zoo::zoo(y[idx], time[idx]), xout = time[!idx], method = "natural")
time <- seq(from = 3003.4, to = 3048, by = 3.4) y <- c(0.2050595, 0.8850070, 2.2068768, 3.7212677, 5.1652605, 5.8288915, 5.5446804, 4.5671360, 3.3213154, 1.9485889, 0.9520709, 0.3294218, 0.2009581, 0.1420923) y[c(1,6)] <- NA_real_ idx <- !is.na(y) splineFillCpp(time[idx], y[idx], time[!idx]) zoo::na.spline(zoo::zoo(y[idx], time[idx]), xout = time[!idx], method = "natural")
This function trims chromatograms from the end-points.
trimXICs(XICs, len = 1)
trimXICs(XICs, len = 1)
XICs |
(A list) A list of dataframe that consists of two columns. First column must be monotonically increasing. |
len |
(numeric) must be between 0.1 and 1. |
A list.
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL3 Date: 2020-04-01
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]][["4618"]] ## Not run: newXICs <- smoothXICs(XICs, len = 0.5) ## End(Not run)
data("XIC_QFNNTDIVLLEDFQK_3_DIAlignR") XICs <- XIC_QFNNTDIVLLEDFQK_3_DIAlignR[["hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt"]][["4618"]] ## Not run: newXICs <- smoothXICs(XICs, len = 0.5) ## End(Not run)
Get intersection of runs and fileInfo
updateFileInfo(fileInfo, runs = NULL)
updateFileInfo(fileInfo, runs = NULL)
fileInfo |
(data-frame) output of getRunNames function. |
runs |
(vector of string) names of mzML files without extension. |
(dataframe) it has five columns:
spectraFile |
(string) as mentioned in RUN table of osw files. |
runName |
(string) contain respective mzML names without extension. |
spectraFileID |
(string) ID in RUN table of osw files. |
featureFile |
(string) path to the feature file. |
chromatogramFile |
(string) path to the chromatogram file. |
Shubham Gupta, [email protected]
ORCID: 0000-0003-3500-8152
License: (c) Author (2020) + GPL-3 Date: 2020-04-15
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath, oswMerged = TRUE) runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt") updateFileInfo(fileInfo, runs)
dataPath <- system.file("extdata", package = "DIAlignR") fileInfo <- getRunNames(dataPath = dataPath, oswMerged = TRUE) runs <- c("hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt", "hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt") updateFileInfo(fileInfo, runs)
XICs of peptide QFNNTDIVLLEDFQK/3 (precursor ID: 4618) from three SWATH runs:
run0 : hroest_K120808_Strep10%PlasmaBiolRepl1_R03_SW_filt.chrom.mzML
run1 : hroest_K120809_Strep0%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
run2 : hroest_K120809_Strep10%PlasmaBiolRepl2_R04_SW_filt.chrom.mzML
XIC_QFNNTDIVLLEDFQK_3_DIAlignR
XIC_QFNNTDIVLLEDFQK_3_DIAlignR
A list of three elements where each element consists of a list of six data frames. Each data frame has two columns:
Retention time of ananlyte in the run, in sec
Intensity of signal for the transition
Raw files are downloaded from Peptide Atlas. File test_GenerateData.R has source code to generate the example data.