| Title: | Compute and analyze the glycan structrual traits from GPSM data |
|---|---|
| Description: | GlycoTraitR is an R package for analyzing glycoproteomics data, particularly glycopeptide-spectrum matches (GPSMs). It supports results generated by the pGlyco3 and Glyco-Decipher search engines. The package parses glycan structures, computes monosaccharide compositions and structural traits, and performs differential analysis of glycan heterogeneity. It constructs trait-by-PSM matrices stored in a SummarizedExperiment object, supports user-defined structural motifs, and provides visualization utilities for interpreting glycan trait changes. |
| Authors: | Bingyuan Zhang [aut, cre] (ORCID: <https://orcid.org/0000-0002-4892-323X>), Koichi Himori [aut], Yusuke Matsui [aut, fnd] |
| Maintainer: | Bingyuan Zhang <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-22 09:49:31 UTC |
| Source: | https://github.com/bioc/glycoTraitR |
Perform group-wise statistical testing on glycan trait matrices stored in a
SummarizedExperiment created by build_trait_se. For each glycan
trait and each site/protein row, the function compares trait intensities
across user-specified experimental groups using Welch’s t-test and
Levene’s variance test.
analyze_trait_changes(trait_se, group_col, group_levels, min_psm = 20)analyze_trait_changes(trait_se, group_col, group_levels, min_psm = 20)
trait_se |
A SummarizedExperiment containing trait matrices (one assay per trait),
typically returned by |
group_col |
The column name in 'colData(trait_se)' defining sample group membership. |
group_levels |
Character vector specifying which values of 'group_col' to compare (e.g., 'c("Control","Treatment")'). |
min_psm |
Minimum required PSM count per group for statistical testing. Default = 20. |
Each assay in 'trait_se' represents a glycan trait matrix. The rows as glycopeptides (site-level) or proteins (protein-level). The columns are GPSM count found in samples. For each trait × feature combination, Extract PSM-level trait intensities for samples belonging to the specified 'group_levels'. Exclude traits where either group has fewer than 'min_psm' GPSMs. Exclude all-zero traits (boolean-like traits) Run Welch two-sample t-test ('t.test') and Levene's variance test ('car::leveneTest', median centered). A result is returned only if either test shows p < 0.05.
A data frame of significant trait–site (or trait–protein) comparisons with:
trait — glycan trait name
level — site/protein identifier
l_pval — Levene test p-value
f_val — Levene test F statistic
t_pval — Welch t-test p-value
t_val — t-statistic
Rows correspond only to significant comparisons (p < 0.05) for 'l_pval' or 't_pval'.
# Load toy pGlyco3 GPSM data included with the package path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm_toyexample <- read_pGlyco3_gpsm(path) # Load accompanying toy metadata data("meta_toyexample") # Build glycan trait SummarizedExperiment at the protein level trait_se <- build_trait_se( gpsm = gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) # Identify glycan traits significantly changed between groups changed_traits <- analyze_trait_changes( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), min_psm = 20 ) changed_traits# Load toy pGlyco3 GPSM data included with the package path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm_toyexample <- read_pGlyco3_gpsm(path) # Load accompanying toy metadata data("meta_toyexample") # Build glycan trait SummarizedExperiment at the protein level trait_se <- build_trait_se( gpsm = gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) # Identify glycan traits significantly changed between groups changed_traits <- analyze_trait_changes( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), min_psm = 20 ) changed_traits
Convert a parsed glycan tree into a directed 'igraph' object with parent–child relationships and residue-level metadata suitable for structural motif detection.
build_glycan_igraph(tree)build_glycan_igraph(tree)
tree |
A parsed glycan tree from |
The resulting graph contains the following vertex attributes:
name — synthetic node label ('"a"', '"b"', ...)
residue — residue type (H, N, A, F, G)
type — identical to residue (for convenience)
color — color encoding of residue type
is_root — TRUE if the vertex is the structural root
A directed 'igraph' object representing the glycan structure.
# Example: parse a pGlyco3 monosaccharide expression into a glycan tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert expression into a parsed tree structure tree <- pGlyco3_to_tree(pGlyco_expr) g <- build_glycan_igraph(tree) g# Example: parse a pGlyco3 monosaccharide expression into a glycan tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert expression into a parsed tree structure tree <- pGlyco3_to_tree(pGlyco_expr) g <- build_glycan_igraph(tree) g
Convert a GPSM table into peptide- or protein-level glycan trait matrices
and store them in a SummarizedExperiment object. Each trait becomes an assay
matrix whose rows represent peptides or proteins, and whose columns
represent individual GPSMs.
This function provides a unified container for downstream analyses such as
differential testing and visualization.
build_trait_se(gpsm, from, motifs = NULL, level, meta)build_trait_se(gpsm, from, motifs = NULL, level, meta)
gpsm |
A GPSM table containing at least: 'Protein', 'Peptide', 'GlycanStructure', 'File', and 'Count'. |
from |
Character; glycan format used in the GPSM input. One of 'decipher' or 'pGlyco3'. |
motifs |
Optional named list of user-defined motif structures passed to compute_glycan_traits. |
level |
Summarization level. Either 'site (peptide)' or '"protein"'. |
meta |
Data frame of sample metadata with a column 'file' matching 'gpsm$File'. |
A 'SummarizedExperiment' where each assay is a glycan-trait matrix (trait × PSM), 'rowData' contains peptide/protein names, and 'colData' contains metadata aligned to PSMs.
# Load toy GPSM table exported by pGlyco3 path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm_toyexample <- read_pGlyco3_gpsm(path) # Load toy metadata for summarization data("meta_toyexample") # Build glycan trait SummarizedExperiment at protein level trait_se <- build_trait_se( gpsm = gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) # Inspect assay names and dimensions SummarizedExperiment::assayNames(trait_se) dim(trait_se)# Load toy GPSM table exported by pGlyco3 path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm_toyexample <- read_pGlyco3_gpsm(path) # Load toy metadata for summarization data("meta_toyexample") # Build glycan trait SummarizedExperiment at protein level trait_se <- build_trait_se( gpsm = gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) # Inspect assay names and dimensions SummarizedExperiment::assayNames(trait_se) dim(trait_se)
Combine residue-level composition traits (see count_residues),
structural traits (see compute_structural_traits), and
user-defined motifs (see compute_userdefined_traits)
into a unified trait vector.
compute_glycan_traits(tree, motifs)compute_glycan_traits(tree, motifs)
tree |
A parsed glycan tree from |
motifs |
Optional named list of user-defined glycan motifs. |
A named list of numeric trait values.
# Example: parse a pGlyco3-style glycan expression into a tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert to glycan tree structure tree <- pGlyco3_to_tree(pGlyco_expr) # Explore parsed nodes and edges tree$node tree$edge # Build igraph representation g <- build_glycan_igraph(tree) plot_glycan_tree(g) # Define user motifs for trait computation user_motifs <- list( LinearH3 = list( node = c("H", "H", "H"), edge = c("a-b", "b-c") ), FucBranch = list( node = c("H", "N", "F"), edge = c("a-b", "b-c") ) ) # Compute glycan structural traits compute_glycan_traits(tree, motifs = user_motifs)# Example: parse a pGlyco3-style glycan expression into a tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert to glycan tree structure tree <- pGlyco3_to_tree(pGlyco_expr) # Explore parsed nodes and edges tree$node tree$edge # Build igraph representation g <- build_glycan_igraph(tree) plot_glycan_tree(g) # Define user motifs for trait computation user_motifs <- list( LinearH3 = list( node = c("H", "H", "H"), edge = c("a-b", "b-c") ), FucBranch = list( node = c("H", "N", "F"), edge = c("a-b", "b-c") ) ) # Compute glycan structural traits compute_glycan_traits(tree, motifs = user_motifs)
A curated reference table mapping glycan IDs to their structures, used internally by glycoTraitR and also available to users.
data(glycanDatabase)data(glycanDatabase)
A data frame with N rows and M columns.
data(glycanDatabase) head(glycanDatabase)data(glycanDatabase) head(glycanDatabase)
A toy example metadata table used in vignettes, examples, and tests.
data(meta_toyexample)data(meta_toyexample)
A data frame with 34 rows and 2 variables:
Clinical diagnosis (factor)
Sample ID (numeric)
File Name (character)
data(meta_toyexample) head(meta_toyexample)data(meta_toyexample) head(meta_toyexample)
Parse a pGlyco3-style glycan expression (e.g. "N(H(H))")
and reconstruct the residue sequence and edge relationships
as a tree suitable for downstream structural analysis. This parser assumes
simple pGlyco3 monosaccharide symbols (e.g. "N", "H", "A", "F").
pGlyco3_to_tree(expr)pGlyco3_to_tree(expr)
expr |
A character string representing the pGlyco3 glycan structure. |
This function interprets parentheses as branch delimiters and assigns:
one residue per character (e.g. N, H, A)
parent–child edges based on bracket nesting
Each residue is assigned a synthetic node label
(a, b, c, …), ensuring compatibility with
graph-based trait extraction.
A list with:
node: character vector of residue types
edge: character vector of edges in "a-b" format
# Example: parse a pGlyco3-style glycan expression into a tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert to glycan tree structure tree <- pGlyco3_to_tree(pGlyco_expr) tree# Example: parse a pGlyco3-style glycan expression into a tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert to glycan tree structure tree <- pGlyco3_to_tree(pGlyco_expr) tree
Visualize a glycan structure encoded as an igraph object produced by
build_glycan_igraph. This plot is mainly intended for inspecting
glycan topology (branching, residue types, connectivity).
plot_glycan_tree(g)plot_glycan_tree(g)
g |
An igraph object representing a glycan tree from |
A glycan topology plot is drawn as a side effect.
# Example: parse a pGlyco3-style glycan expression into a tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert to glycan tree structure tree <- pGlyco3_to_tree(pGlyco_expr) # Explore parsed nodes and edges tree$node tree$edge # Build igraph representation g <- build_glycan_igraph(tree) plot_glycan_tree(g)# Example: parse a pGlyco3-style glycan expression into a tree pGlyco_expr <- "(N(N(H(H(H))(H(H)(H)(H(H))))))" # Convert to glycan tree structure tree <- pGlyco3_to_tree(pGlyco_expr) # Explore parsed nodes and edges tree$node tree$edge # Build igraph representation g <- build_glycan_igraph(tree) plot_glycan_tree(g)
Generate two diagnostic plots—a histogram and a boxplot—for a selected glycan trait at a specific site/protein feature. This function extracts GPSM-level values from a trait matrix stored in a 'SummarizedExperiment', subsets them by group, removes missing values, and visualizes the resulting distribution.
plot_trait_distribution(trait_se, group_col, group_levels, trait_name, feature)plot_trait_distribution(trait_se, group_col, group_levels, trait_name, feature)
trait_se |
A SummarizedExperiment object generated by build_trait_se, containing one assay matrix per glycan trait. |
group_col |
Column name in 'colData(trait_se)' that defines sample group membership. |
group_levels |
Character vector specifying the group values to include in the plot. Typically two values (e.g. 'c("Control","Treatment")'). |
trait_name |
Name of the glycan trait to plot. Must match an assay name in 'assays(trait_se)'. |
feature |
Row identifier within the trait matrix (peptide, or protein). Must match a row name in 'assays(trait_se)[[trait_name]]'. |
A named list of two 'ggplot2' objects: * 'p_hist' — histogram of trait intensities * 'p_box' — boxplot with jitter overlay
# Load the toy GPSM table exported by pGlyco3 (included in the package) path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm_toyexample <- read_pGlyco3_gpsm(path) # Load the accompanying toy metadata data("meta_toyexample") # Build a protein-level glycan trait SummarizedExperiment object trait_se <- build_trait_se( gpsm = gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) # Identify glycan traits significantly changed between two groups changed_traits <- analyze_trait_changes( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), min_psm = 20 ) # Extract one trait name and one protein/site feature to visualize trait_name <- changed_traits$trait[1] feature <- changed_traits$level[1] # Plot the distribution of this selected trait p <- plot_trait_distribution( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), trait_name = trait_name, feature = feature ) # Show histogram and boxplot p$p_hist p$p_box# Load the toy GPSM table exported by pGlyco3 (included in the package) path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm_toyexample <- read_pGlyco3_gpsm(path) # Load the accompanying toy metadata data("meta_toyexample") # Build a protein-level glycan trait SummarizedExperiment object trait_se <- build_trait_se( gpsm = gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) # Identify glycan traits significantly changed between two groups changed_traits <- analyze_trait_changes( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), min_psm = 20 ) # Extract one trait name and one protein/site feature to visualize trait_name <- changed_traits$trait[1] feature <- changed_traits$level[1] # Plot the distribution of this selected trait p <- plot_trait_distribution( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), trait_name = trait_name, feature = feature ) # Show histogram and boxplot p$p_hist p$p_box
Read multiple Glyco-Decipher GPSM files from a folder, merge them into a unified protein–peptide–glycan table, and attach glycan structures (WURCS 2.0).
read_decipher_gpsm(gpsm_folder_dir)read_decipher_gpsm(gpsm_folder_dir)
gpsm_folder_dir |
The path to a folder containing Glyco-Decipher GPSM files (e.g. files ending with
|
This function assumes that the input folder contains one or more
Glyco-Decipher GPSM files, typically named with the suffix
"_GPSM_DatabaseGlycan.txt". For each file,
GPSM records are read and collapsed by Protein, Peptide,
GlycanID, File, and Count.
All files are then combined into a single table, and glycan IDs are
mapped to WURCS structures via the global glycanDatabase object.
The final table uses a standardized glycan column name
GlycanStructure for compatibility with downstream functions.
A data frame in long format with one row per
Protein–Peptide–GlycanStructure–File
combination and the following columns:
Protein — protein identifier(s)
Peptide — peptide sequence
GlycanStructure — glycan structural annotation (WURCS 2.0)
File — raw file name
Count — spectral count (number of GPSMs) for this combination
The returned table is designed to be passed to
build_trait_se for glycan trait computation.
read_pGlyco3_gpsm,
build_trait_se
folder <- system.file("extdata", "decipher_toyexample", package = "glycoTraitR") gpsm <- read_decipher_gpsm(folder) head(gpsm)folder <- system.file("extdata", "decipher_toyexample", package = "glycoTraitR") gpsm <- read_decipher_gpsm(folder) head(gpsm)
Read a pGlyco3 GPSM result file and convert it into a long-format protein–peptide–glycan table with spectral counts per raw file. Each row corresponds to a unique combination of protein, peptide, glycan structure, and file.
read_pGlyco3_gpsm(gpsm_dir)read_pGlyco3_gpsm(gpsm_dir)
gpsm_dir |
The path to the pGlyco3 GPSM output file
(for example, |
This function takes a pGlyco3 GPSM file as input (typically named
pGlycoDB-GP-FDR-Pro-Quant-Site.txt). The following steps are performed:
Select relevant columns (RawName, Proteins,
Peptide, PlausibleStruct).
Rename them to a standardized schema:
File, Protein, Peptide, GlycanStructure.
Collapse multiple protein IDs per PSM into a single
pipe-separated string (e.g. "P00123|P00456").
Aggregate rows by Protein, Peptide,
GlycanStructure, and File as GPSM counts in each group.
The output of this function is typically used as the input for
build_trait_se.
A data frame with one row per
Protein–Peptide–GlycanStructure–File
combination and the following columns:
Protein — protein identifier(s)
Peptide — peptide sequence
GlycanStructure — glycan structural annotation from pGlyco3
File — raw file name
Count — spectral count (number of GPSMs) for this combination
read_decipher_gpsm,
build_trait_se
# Load toy example data included in glycoTraitR path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm <- read_pGlyco3_gpsm(path) head(gpsm)# Load toy example data included in glycoTraitR path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR" ) gpsm <- read_pGlyco3_gpsm(path) head(gpsm)
Parse a WURCS (WURCS 2.0) glycan annotation and extract:
residue sequence (as a character vector)
edge list (parent–child relationships)
wurcs_to_tree(w)wurcs_to_tree(w)
w |
A character string containing a WURCS 2.0 glycan annotation. |
This function is used internally to convert WURCS strings into a tree representation that can support structural trait computation and graph construction.
The function performs several parsing steps:
Remove the WURCS prefix and split the string into components.
Extract UniqueRES entries and map them to residue symbols
using the internal WURCS_RES_MAP table.
Follow the RES sequence index to reconstruct the residue vector.
Parse LIN entries and normalize them into simple "X-Y" edges.
A list with two elements:
node: character vector of residue types in order
edge: character vector of edges in "A-B" format
# Example WURCS glycan string w <- paste0( "WURCS=2.0/4,9,8/", "[u2122h_2*NCC/3=O]", "[a2122h-1b_1-5_2*NCC/3=O]", "[a1122h-1b_1-5]", "[a1122h-1a_1-5]", "/1-2-3-4-4-4-4-4-4/", "a4-b1_b4-c1_c3-d1_c6-f1_d2-e1_f3-g1_f6-h1_h2-i1" ) tree <- wurcs_to_tree(w) tree# Example WURCS glycan string w <- paste0( "WURCS=2.0/4,9,8/", "[u2122h_2*NCC/3=O]", "[a2122h-1b_1-5_2*NCC/3=O]", "[a1122h-1b_1-5]", "[a1122h-1a_1-5]", "/1-2-3-4-4-4-4-4-4/", "a4-b1_b4-c1_c3-d1_c6-f1_d2-e1_f3-g1_f6-h1_h2-i1" ) tree <- wurcs_to_tree(w) tree