| Title: | Post-transcriptional network modeling |
|---|---|
| Description: | A tool that enables in silico identification, integration, and modeling of mRNA features that influence post-transcriptional regulation of gene expression at a transcriptome-wide scale. |
| Authors: | Krzysztof Szkop [aut, cre], Kathleen Watt [aut], Ola Larsson [aut] |
| Maintainer: | Krzysztof Szkop <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-30 07:16:13 UTC |
| Source: | https://github.com/bioc/postNet |
Tools for the identification, integration, and modelling of mRNA features that influence post-transcriptional regulation of gene expression at a transcriptome-wide scale.
For an introduction to the package and example workflows, see the package vignette:
vignette("postNet")
See packageDescription("postNet").
Useful starting points:
vignette("postNet")
Package overview and workflow examples are provided in the vignette.
After running codonUsage, the codonCalc function can be used to calculate the number or frequency of selected codons or amino acids (AA) for each gene. These values can be used in the downstream featureIntegration analysis. In addition, codon or amino acid content for gene sets of interest can be compared against background, or other gene sets. Several options are available for plotting the comparisons.
codonCalc(ptn, featsel, analysis = "codon", unit = "count", comparisons = NULL, plotOut = TRUE, plotType = "ecdf", pdfName = NULL)codonCalc(ptn, featsel, analysis = "codon", unit = "count", comparisons = NULL, plotOut = TRUE, plotType = "ecdf", pdfName = NULL)
ptn |
A postNetData object. |
featsel |
A named list of vectors specifying one or more codons or amino acids to quantify and/or visualize. For use in the downstream featureIntegration, it is usually desirable to select those with high frequency, and the highest and lowest odds ratios identified using the codonUsage function. This list can be easily generated using the S4 method ptn_codonSelection. |
analysis |
A string specifying whether to assess |
unit |
A string specifying the unit to quantify. The options are |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
plotOut |
Logical indicating whether PDF files of plots are generated. If |
plotType |
If |
pdfName |
Name to be appended to output PDF files. The default is |
The number or frequency of selected codons or amino acids is calculated for each gene, and a two-sided Wilcoxon Rank Sum test is performed to identify significant differences between gene sets of interest, or against the background. If plotOut is TRUE, all indicated comparisons between different sets of genes will be performed and plotted.
A named list of numeric vectors for selected codons or amino acids with the calculated counts or frequencies for each gene. The elements in each vector are named by gene ID. This list can be used as input for the downstream featureIntegration analysis. The codonCalc function can also write PDF files of output plots with statistical comparisons between gene sets of interest.
codonUsage
ptn_codonSelection
ptn_codonAnalysis
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis ptn <- codonUsage(ptn = ptn, annotType = "ptnCDS", sourceSeq = "load", analysis = "codon", codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.4, thresFreqUp = 0.4, thresOddsDown = 0.4, thresFreqDown = 0.4, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Select codons of interest with high frequency, and the highest and lowest odds ratios codons <- ptn_codonSelection(ptn, comparison = 1) # Calculate and plot ecdfs of codons counts, and compare between gene sets of interest codonCounts <- codonCalc(ptn = ptn, analysis = "codon", featsel = codons, unit = "count", comparisons = list(c(1,2)), pdfName = tmp, plotType = "ecdf") str(codonCounts)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis ptn <- codonUsage(ptn = ptn, annotType = "ptnCDS", sourceSeq = "load", analysis = "codon", codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.4, thresFreqUp = 0.4, thresOddsDown = 0.4, thresFreqDown = 0.4, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Select codons of interest with high frequency, and the highest and lowest odds ratios codons <- ptn_codonSelection(ptn, comparison = 1) # Calculate and plot ecdfs of codons counts, and compare between gene sets of interest codonCounts <- codonCalc(ptn = ptn, analysis = "codon", featsel = codons, unit = "count", comparisons = list(c(1,2)), pdfName = tmp, plotType = "ecdf") str(codonCounts)
The codonUsage function performs differential codon or amino acid (AA) usage analysis. The analysis enumerates single (or multiple, e.g. dicodons) codons or amino acids, and identifies those that are enriched or depleted between different gene sets of interest. Codon usage indexes can also be compared between gene sets of interest, and differences in average counts and frequencies are visualized as scatterplots and heatmaps.
codonUsage(ptn, annotType = "ptnCDS", sourceSeq = "load", analysis, codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = TRUE, thresOddsUp = 0.25, thresFreqUp = 0.25, thresOddsDown = 0.25, thresFreqDown = 0.25, subregion = NULL, subregionSel = NULL, comparisons, plotType_index = "boxplot", setSeed = NULL, pdfName = NULL)codonUsage(ptn, annotType = "ptnCDS", sourceSeq = "load", analysis, codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = TRUE, thresOddsUp = 0.25, thresFreqUp = 0.25, thresOddsDown = 0.25, thresFreqDown = 0.25, subregion = NULL, subregionSel = NULL, comparisons, plotType_index = "boxplot", setSeed = NULL, pdfName = NULL)
ptn |
A postNetData object. |
annotType |
A string specifying the reference coding sequences to be used for the analysis. The options are: |
sourceSeq |
A string specifying the source of |
analysis |
A string specifying the type of the analysis to perform. The options are |
codonN |
An integer specifying the number of codon combinations to count. For example, |
pAdj |
A numeric value specifying the adjusted p-value threshold for selecting significant Chi-square test results. The default is |
rem5 |
Logical specifying whether codons with fewer than five counts should be filtered out from the contingency table (counts of codons by gene set) before performing the Chi-square test. The default is |
plotHeatmap |
Logical specifying whether to plot a heatmap of the standardized residuals for each codon from the Chi-square test. The default is |
thresOddsUp |
A numeric value between 0 and 1 specifying the percentage threshold for the odds ratio in selecting codons or AAs in the enriched set. For example, |
thresFreqUp |
A numeric value between 0 and 1 specifying the percentage threshold for the frequency in selecting codons or AAs in the enriched set. For example, |
thresOddsDown |
A numeric value between 0 and 1 specifying the percentage threshold for the odds ratio in selecting codons or AAs in the depleted set. For example, |
thresFreqDown |
A numeric value between 0 and 1 specifying the percentage threshold for the frequency in selecting codons or AAs in the depleted set. For example, |
subregion |
Optionally, it is possible to specify a more specific subregion of the sequences to either select or exclude from the analysis. This is done by providing a numeric value indicating the number of nucleotides from either the start of the sequence region if positive, or the end if negative. For example, |
subregionSel |
If a value is provided for |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
plotType_index |
A string to indicate the type of output plot to generate for codon indexes. Options are |
pdfName |
Name to be appended to output PDF files. The default is |
setSeed |
If |
In the first step of the analysis, for each gene set comparison, a Chi-square test is applied to assess significant differences in codon or AA usage. Then, for codons or AAs passing significance thresholds, the odds ratio for each desired comparison pair is calculated and plotted against the frequency. The codons or AAs of interest with differential usage between gene sets are usually identified as those with a high/low odds ratio and high frequency. Codons or AAs of interest can then be further assessed using the codonCalc function (see example below).
Several codon indexes are also compared between gene sets of interest, including the CAI (codon adaptation index), CBI (Codon Bias Index), FOP (frequency of optimal codons), L_aa (Number of amino acids in the protein), tAI (tRNA adaptation index).
The codonUsage function returns an updated postNetData object with results compiled and stored in the codons slot as an S4 object of class postNetCodons.
The codonAnalysis slot provides a summary of calculations for all codons for each gene, and stores an S4 object of class postNetCodonsAll with the following slots: geneID, codon, AA, count, frequency, AACountPerGene, and relative_frequency. Here, frequency corresponds to the number of codons relative to all codons in the gene, and relative_frequency corresponds to the number of codons relative to all synonymous codons in the gene. The contents of the "codonAnalysis" slot can be accessed using the ptn_codonAnalysis S4 method.
The codonSelection slot stores a named list of the selected codons or AAs that were significantly enriched or depleted (according to the selected thresholds) in the specified comparisons. This list can be used as input for the codonCalc function (see example below). The contents of the "codonSelection" slot can be accessed using the ptn_codonSelection S4 method.
The codonUsage function also generates plots comparing the codon indexes described above between gene sets, a clustered heatmap of the Chi-square residuals for each codon between the gene sets of interest, and plots comparing the average codon count and frequency between the gene sets of interest (codons encoding the same amino acid are coloured and connected by lines).
The NCBI Consensus CDS database: https://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi
CAI:
Sharp, P. M., and W. H. Li, (1987). The codon adaptation index a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research 15: 1281-1295.
CBI:
Bennetzen, J. L., and B. D. Hall, (1982). Codon selection in yeast. Journal of Biological Chemistry 257: 3026-3031.
FOP:
Ikemura, T., (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli system. Journal of Molecular Biology 151: 389-409.
tAI:
dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004 Sep 24;32(17):5036-44.
codonCalc
ptn_codonAnalysis
ptn_codonSelection
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis of single codons: ptn <- codonUsage(ptn = ptn, annotType = "ptnCDS", sourceSeq = "load", analysis = "codon", codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.25, thresFreqUp = 0.25, thresOddsDown = 0.25, thresFreqDown = 0.25, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Access the full results of the analysis for each gene: codonResults <- ptn_codonAnalysis(ptn) str(codonResults) # Select codons of interest that were significantly enriched # or depleted with high frequency, and the highest and lowest odds ratios: codons <- ptn_codonSelection(ptn, comparison = 1) str(codons) # Count and plot eCDFs for sets of enriched/depleted codons, # and compare between gene sets of interest: codonCounts <- codonCalc(ptn = ptn, analysis = "codon", featsel = codons, unit = "count", comparisons = list(c(1,2)), pdfName = tmp, plotType = "ecdf") str(codonCounts)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis of single codons: ptn <- codonUsage(ptn = ptn, annotType = "ptnCDS", sourceSeq = "load", analysis = "codon", codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.25, thresFreqUp = 0.25, thresOddsDown = 0.25, thresFreqDown = 0.25, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Access the full results of the analysis for each gene: codonResults <- ptn_codonAnalysis(ptn) str(codonResults) # Select codons of interest that were significantly enriched # or depleted with high frequency, and the highest and lowest odds ratios: codons <- ptn_codonSelection(ptn, comparison = 1) str(codons) # Count and plot eCDFs for sets of enriched/depleted codons, # and compare between gene sets of interest: codonCounts <- codonCalc(ptn = ptn, analysis = "codon", featsel = codons, unit = "count", comparisons = list(c(1,2)), pdfName = tmp, plotType = "ecdf") str(codonCounts)
The contentAnalysis function calculates the nucleotide content (as a percentage) of mRNA sequences (5'UTR, 3'UTR, and CDS) for each gene. Nucleotide content of different sequence regions can be compared between gene sets of interest, or against background. Several options are available for plotting the comparisons.
contentAnalysis(ptn, contentIn, region, subregion = NULL, subregionSel = NULL, comparisons = NULL, plotOut = TRUE, plotType = "boxplot", pdfName = NULL)contentAnalysis(ptn, contentIn, region, subregion = NULL, subregionSel = NULL, comparisons = NULL, plotOut = TRUE, plotType = "boxplot", pdfName = NULL)
ptn |
A postNetData object. |
contentIn |
A character vector specifying one or more nucleotide(s) or nucleotide combinations to quantify. These can be any selection or combination of A, T, G, or C. For example |
region |
A character vector specifying the sequence region(s) to be analyzed. This can be |
subregion |
Optionally, specify a more specific subregion of the sequences to either select or exclude from the analysis. Provide an integer indicating the number of nucleotides to include or exclude from either the start of the sequence region if positive, or the end if negative. For example, |
subregionSel |
If a value is provided for |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
plotOut |
Logical indicating whether PDF files of plots are generated. If |
plotType |
If |
pdfName |
Name to be appended to output PDF files. The default is |
When plotOut = TRUE, a two-sided Wilcoxon Rank Sum test is performed to identify significant differences in mRNA sequence region nucleotide content between gene sets of interest, and/or against the background gene set.
A named list of vectors for selected mRNA sequence regions with the calculated nucleotide content for each gene. This list can be used as input for the downstream featureIntegration analysis. The elements in each vector are named with the gene ID. The contentAnalysis function can also return PDF files of output plots with statistical comparisons between gene sets of interest.
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Calculate the GC of the first 50 nucleotides of the 5'UTR for each gene, # and compare between translationUp genes vs. background, # translationDown genes vs. background, # and translationUp vs. translationDown genes: content_UTR5_first50 <- contentAnalysis(ptn = ptn, region = c("UTR5"), subregion = 50, subregionSel = "select", comparisons = list(c(0,1),c(0,2),c(1,2)), contentIn = c("GC"), plotOut = TRUE, plotType = "ecdf", pdfName = tmp) str(content_UTR5_first50)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Calculate the GC of the first 50 nucleotides of the 5'UTR for each gene, # and compare between translationUp genes vs. background, # translationDown genes vs. background, # and translationUp vs. translationDown genes: content_UTR5_first50 <- contentAnalysis(ptn = ptn, region = c("UTR5"), subregion = 50, subregionSel = "select", comparisons = list(c(0,1),c(0,2),c(1,2)), contentIn = c("GC"), plotOut = TRUE, plotType = "ecdf", pdfName = tmp) str(content_UTR5_first50)
The contentMotifs function quantifies the number or position of motifs in sequence regions of interest and performs statistical comparisons between gene sets of interest, or against background. Motifs can be specified directly, or the output of the motifAnalysis function, which detects de-novo motifs using STREME (part of the MEME-Suite), can be supplied. Finally, the function can also implement pqsfinder to identify G-quadruplexes.
contentMotifs(ptn, motifsIn, seqType = "dna", dist = 1, min_score = 47, unitOut = "number", resid = FALSE, region, subregion = NULL, subregionSel = NULL, comparisons = NULL, pdfName = NULL, plotOut = TRUE)contentMotifs(ptn, motifsIn, seqType = "dna", dist = 1, min_score = 47, unitOut = "number", resid = FALSE, region, subregion = NULL, subregionSel = NULL, comparisons = NULL, pdfName = NULL, plotOut = TRUE)
ptn |
A postNetData object. |
motifsIn |
A character vector specifying the motif sequences to be detected and quantified. Ambiguities can be specified using IUPAC codes or [ ] (bracket) annotations. Motifs discovered with motifAnalysis can be retrieved using ptn_motifSelection and used. Optionally, G-quadruplexes can be specified as |
seqType |
A string specifying the type of sequence being provided to search for motifs. The options are |
dist |
A numeric value specifying the minimal distance between motifs (in nucleotides for DNA/RNA or amino acids for protein). The default is 1. |
min_score |
A numeric value specifying the threshold for the quality of the G-quadruplexes prediction. This is only required when |
unitOut |
A string to specify whether the output should be the |
resid |
Logical indicating if the quantification of motifs should be corrected for the length of the sequences. If |
region |
A character vector specifying the sequence region(s) to be analyzed. This can be |
subregion |
Optionally, it is possible to specify a more specific subregion of the sequences to either select or exclude from the analysis. This is done by providing a numeric value indicating the number of nucleotides from either the start of the sequence region if positive, or the end if negative. For example, |
subregionSel |
If a value is provided for |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
pdfName |
Name to be appended to output PDF files. The default is |
plotOut |
Logical indicating whether PDF files of plots are generated. If |
A two-sided Wilcoxon Rank Sum test is performed to identify significant differences in motif content between gene sets of interest, or against the background gene set.
Note that the contentMotifs function uses a more strict method of sequence matching to count motifs than the MEME-Suite, and for this reason some motifs identified with motifAnalysis, which implements STREME, may have divergent quantifications between methods. Using a more stringent threshold with the stremeThreshold parameter in motifAnalysis may remedy this. Alternatively, motifs identified with STREME can be counted using the runFimo tool provided with the MEME-Suite.
If the output motifs will be used in downstream analysis with featureIntegration, consider carefully whether to apply the resid correction. If the length of the sequence regions will be included in the featureIntegration model, it is not recommended to correct the motif content for the sequence length. However, if length will not be included in featureIntegration, the correction can be performed.
If unitOut = "number", the output will be a named list of vectors with the number of motifs detected for each gene in each sequence region specified. This list can be used as input for the downstream featureIntegration analysis. If unitOut = "position", the output will be a named list of lists with the start and end positions of each motif for each gene in each sequence region specified. These positions cannot be used in featureIntegration. The contentMotifs function can also return PDF files of output plots with statistical comparisons between gene sets of interest when unitOut = "number".
If you use the in-built "G4" option in your analysis, please cite pqsfinder:
Hon J, Martínek T, Zendulka J, Lexa M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics. 2017 Nov 1;33(21):3373-3379. doi: 10.1093/bioinformatics/btx413. PMID: 29077807.
pqsfinder
runFimo
motifAnalysis
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # ------------------------------------------------------------------ # Example 1: Detect and quantify a given motif within 5'UTRs # ------------------------------------------------------------------ # Detect and quantify a given motif within 5'UTRs, # and compare between translationUp vs. translationDown genes UTR5_SGCSGCS_num <- contentMotifs(ptn = ptn, motifsIn = "SGCSGCS", region = c("UTR5"), comparisons = list(c(1,2)), dist = 1, unitOut = "number", pdfName = tmp, plotOut = TRUE) str(UTR5_SGCSGCS_num) # Now find the positions of the motif in the 5'UTR UTR5_SGCSGCS_pos <- contentMotifs(ptn = ptn, motifsIn = "SGCSGCS", region = c("UTR5"), comparisons = list(c(1,2)), dist = 1, unitOut = "position") str(UTR5_SGCSGCS_pos) ## Not run: # ------------------------------------------------------------------ # Example 2: Detect and quantify motifs identified using motifAnalysis # ------------------------------------------------------------------ # First run motifAnalysis to identify significantly enriched motifs in the 3'UTR ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c("UTR3")) # Quantify the presence of these motifs in transcripts, # and compare between translationUp vs. translationDown genes, # adjusting for the length of the sequence. UTR3_denovo_motifs <- contentMotifs(ptn = ptn, motifsIn = ptn_motifSelection(ptn, region = "UTR3"), region = c("UTR3"), comparisons = list(c(1,2)), dist = 1, resid = TRUE, unitOut = "number", pdfName = "example", plotOut = TRUE) str(UTR3_denovo_motifs) # ------------------------------------------------------------------ # Example 3: Detect and quantify potential G-quadruplex-forming sequences # ------------------------------------------------------------------ # Quantify the presence of G4 motifs in 5'UTR and CDS, # and compare between translationUp vs. translationDown genes. G4_UT5_CDS <- contentMotifs(ptn = ptn, motifsIn = "G4", region = c("UTR5","CDS"), comparisons = list(c(1,2)), dist = 1, resid = FALSE, unitOut = "number", pdfName = "example", plotOut = TRUE) str(G4_UT5_CDS) ## End(Not run)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # ------------------------------------------------------------------ # Example 1: Detect and quantify a given motif within 5'UTRs # ------------------------------------------------------------------ # Detect and quantify a given motif within 5'UTRs, # and compare between translationUp vs. translationDown genes UTR5_SGCSGCS_num <- contentMotifs(ptn = ptn, motifsIn = "SGCSGCS", region = c("UTR5"), comparisons = list(c(1,2)), dist = 1, unitOut = "number", pdfName = tmp, plotOut = TRUE) str(UTR5_SGCSGCS_num) # Now find the positions of the motif in the 5'UTR UTR5_SGCSGCS_pos <- contentMotifs(ptn = ptn, motifsIn = "SGCSGCS", region = c("UTR5"), comparisons = list(c(1,2)), dist = 1, unitOut = "position") str(UTR5_SGCSGCS_pos) ## Not run: # ------------------------------------------------------------------ # Example 2: Detect and quantify motifs identified using motifAnalysis # ------------------------------------------------------------------ # First run motifAnalysis to identify significantly enriched motifs in the 3'UTR ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c("UTR3")) # Quantify the presence of these motifs in transcripts, # and compare between translationUp vs. translationDown genes, # adjusting for the length of the sequence. UTR3_denovo_motifs <- contentMotifs(ptn = ptn, motifsIn = ptn_motifSelection(ptn, region = "UTR3"), region = c("UTR3"), comparisons = list(c(1,2)), dist = 1, resid = TRUE, unitOut = "number", pdfName = "example", plotOut = TRUE) str(UTR3_denovo_motifs) # ------------------------------------------------------------------ # Example 3: Detect and quantify potential G-quadruplex-forming sequences # ------------------------------------------------------------------ # Quantify the presence of G4 motifs in 5'UTR and CDS, # and compare between translationUp vs. translationDown genes. G4_UT5_CDS <- contentMotifs(ptn = ptn, motifsIn = "G4", region = c("UTR5","CDS"), comparisons = list(c(1,2)), dist = 1, resid = FALSE, unitOut = "number", pdfName = "example", plotOut = TRUE) str(G4_UT5_CDS) ## End(Not run)
The featureIntegration function is used to model changes in post-transcriptional regulation and identify cis and/or trans factors (features of mRNA molecules, signatures of upstream regulators, etc.) that can explain the observed regulatory effects. The method integrates mRNA features and signatures obtained in previous steps of the postNet workflow (or from custom sources), and has several options for implementation, including:
Stepwise linear regression modelling and network analysis. This analysis uses hierarchical multiple regression to identify features explaining changes in the regulatory effect, rank them according to their importance, and reveals independent and combinatorial effects (for example, co-occurrence of regulatory features in the same mRNA molecules). The results of this analysis can be visualized using network plots showing the association of features to regulatory effects, and the relationships between features.
Random Forest feature selection and classification. This analysis implements Random Forest classification using randomForest. Feature selection is performed using Boruta to identify the set of features that best classify genes according to their regulation. The Random Forest model can then also be used to predict regulation for other gene lists or datasets using the rfPred function.
Both implementations of featureIntegration produce statistics and plots visualizing the relationship between the regulatory effect and the selected features. The relationships between selected features identified by featureIntegration and regulatory effects can be further explored using UMAP visualizations with the plotFeaturesMap function.
featureIntegration(ptn, features, lmfeatGroup = NULL, lmfeatGroupColour = NULL, analysis_type, regOnly = TRUE, allFeat = FALSE, useCorel = TRUE, covarFilt = 20, NetModelSel = "omnibus", comparisons = NULL, fdrUni = 0.05, stepP = 0.05, pdfName = NULL)featureIntegration(ptn, features, lmfeatGroup = NULL, lmfeatGroupColour = NULL, analysis_type, regOnly = TRUE, allFeat = FALSE, useCorel = TRUE, covarFilt = 20, NetModelSel = "omnibus", comparisons = NULL, fdrUni = 0.05, stepP = 0.05, pdfName = NULL)
ptn |
A postNetData object. |
features |
A named list of numeric vectors corresponding to features that will be included in modelling of regulatory effects. Each vector element in the list must have names corresponding to gene IDs. These vectors quantifying features can be the outputs of previous steps of the analysis, including from |
lmfeatGroup |
If |
lmfeatGroupColour |
If |
analysis_type |
A string specifying the method for the analysis. The options are |
regOnly |
If |
allFeat |
If |
useCorel |
If |
covarFilt |
If |
NetModelSel |
If |
comparisons |
If |
fdrUni |
If |
stepP |
If |
pdfName |
Name to be appended to output PDF files. |
The featureIntegration analysis assesses whether a catalog of cis and/or trans regulatory features appear to modulate the observed post-transcriptional regulation. Details of the implementations and considerations for feature inputs are described below.
Stepwise linear regression modelling and network analysis: featureIntegration uses stepwise linear regressions to model changes in post-transcriptional regulation across subsets of mRNAs and assesses the contribution of each feature included in the modelling in a hierarchical manner. This allows features to be ranked according to their ability to explain changes in the observed regulatory effect. Due to the hierarchical approach, identification of both distinct (independent), and overlapping (covarying) associations between features and regulatory effects is possible. featureIntegration with stepwise linear regression is performed in three phases:
Phase 1: First, each candidate feature is evaluated separately in univariate linear models to identify significant associations between the regulatory effect and individual features.
Phase 2: Second, starting with the feature that best explained changes in the regulatory effect from univariate models, forward stepwise regression is performed by adding features to the model in an iterative fashion, keeping covariance assigned to the most influential feature (a rank-based greedy model). In each step, the best performing model (the feature with the strongest association with the regulatory effect) is retained and features that fail to explain additional variance are discarded. The resulting omnibus model represents the smallest set of features that can explain the greatest proportion of variance in the regulatory effect. This step ranks features according to their importance, and reveals dependence between them. For example, regulation may depend partially on the length of the 5'UTR, but also on the GC content, or the presence of a motif, etc.
Phase 3: Finally, the independent contribution of each feature identified in phase 2 is determined. This is done by removing covariance from the omnibus model (that was previously assigned to the most influential features in phase 2) to provide the adjusted contribution of each feature. This phase provides the percentage of variance in the regulatory effect explained by a given feature, independent of all other features.
The results of all three phases of the stepwise regression modelling are summarized in table outputs and visualized as network plots (described below).
Random Forest feature selection and classification: featureIntegration can also apply Breiman's random forest algorithm (implemented in the randomForest package) to classify genes according to their regulation, and identify features associated with regulatory classes (for example, translationally activated or suppressed genes). This implementation is carried out in two steps:
Pre-modelling and feature selection: Genes are first divided into training (70%) and validation (30%) sets. Modelling is performed and importance is calculated for all features included in the input. Feature selection is then performed using Boruta to identify all input features relevant to model performance.
Final modelling: Next, modelling using randomForest is repeated using the set of selected features. The performance of the model is assessed using Receiver Operating Characteristic (ROC) curves (implemented with performance), and feature importance from the final model is reported as Mean Decrease Accuracy (see importance for details), where higher values indicate more important features.
Feature inputs for modelling: Many different types of features can be included in modelling with featureIntegration. From the postNet workflow, the outputs of the lengthAnalysis, contentAnalysis, uorfAnalysis, foldingEnergyAnalysis, contentMotifs, and codonCalc functions can be supplied directly as features. Gene signatures, such as those included with the package (see humanSignatures or mouseSignatures) can also be converted to feature inputs using the signCalc function. In addition, custom feature inputs can be supplied depending on the application and research question.
In general, features that can be included in modelling can be either numeric, or categorical. Numeric features can be both continuous (for example, sequence region nucleotide content or length), or discrete (for example, the number of uORFs in 5'UTRs). Categorical features can also be included by converting to binary variables. This is useful for evaluating gene signatures in relation to regulatory outcomes, but can also be applied in many scenarios where mRNA can be divided into two classes (for example those with, or without a certain property, etc.). It is also possible to supply categorical features with a directionality. For example, a feature that can be increased, decreased, or unchanged could be coded as 1, -1, or 0 for each gene to be included in modelling.
Careful consideration should be given when selecting the catalog of features to include in modelling. For example, highly, or perfectly correlated variables cannot be supplied together in the same model. This can be relevant when examining the percentage of nucleotide content for each base, since the content of G and C may often be highly correlated, and the sum of A, T, G, and C will add to 100%. In instances where features are too highly correlated, a warning message will be displayed and one of the correlated features must be removed from the input. In addition, when enumerating some features such as codon composition or folding energy, these values can be corrected for the length of the sequence region. However, it would then not be advisable to also include the sequence region length as a separate feature in the modelling. Please see the postNet vignette for more in-depth examples and discussion on the selection of features and interpretation of results.
The featureIntegration function returns an updated postNetData object with results compiled and stored in the features and featureIntegration slots.
The features slot stores a data.frame with the features used as input for feature integration modelling, where rows correspond to genes, and columns correspond to features. The input feature data.frame can be retrieved using the ptn_features S4 method.
The stepwise regression implementation of featureIntegration will store results for each comparison in the lm slot. This includes the results of univariate and stepwise regression modelling, and the final omnibus and adjusted models. Modelling results can be retrieved using the ptn_model S4 method. The features selected as significant in the omnibus model are listed in the selectedFeatures slot, along with the proportion of variance in the regulatory effect explained by each feature (either in omnibus, or adjusted models depending on the selection in the NetModelSel parameter). Selected features can be retrieved using the ptn_selectedFeatures S4 method. Finally, the network analysis results are stored as an igraph object, and can be retrieved using the ptn_networkGraph S4 method.
The resulting network plot is output as a PDF. Modelling is summarized in a PDF file with a table displaying results for features significant in univariate models, a table of F-values from linear models in stepwise regression, and a table summarizing features in the final omnibus and adjusted models. Optionally, if useCorel = TRUE, a table of Pearson's correlation coefficients between features will also be displayed. For F-value tables, the feature explaining the greatest proportion of variance in the regulatory effect at each step is highlighted in green, while those removed from the model due to inability to significantly explain variance are marked in red. Features highlighted in orange indicate those where there was a substantial change in F-value (an increase or decrease of at least 50%) following the addition of another feature to the model. Larger changes in F-value are indicative of a stronger relationship between features. Note that this highlighting is independent of the threshold applied with the covarFilt parameter, which controls edge selection for network analyses. For more details please refer to the postNet vignette.
The random forest implementation of featureIntegration will store results for each comparison in the rf slot. The pre-model, feature selection, and final model are stored as objects of class randomForest and Boruta. The selected features are listed in the selectedFeatures slot, along with their importance (Mean Decrease Accuracy) from the final model. Feature importance calculated by Boruta is plotted and output as a PDF file. PDF files are also output with ROC curves for the final model, and the importance of selected features.
Both implementations of featureIntegration output PDF files with scatterplots for each individual feature identified in final stepwise regression or random forest models, with Pearson's correlation coefficient between the feature and regulatory effect (two-sided test). The linear trend line, and cubic smoothing spline (allowing assessment of linearity) are displayed.
If you use random forest analysis, please cite:
Sing T, Sander O, Beerenwinkel N, Lengauer T (2005). “ROCR: visualizing classifier performance in R.” Bioinformatics, 21(20), 7881. http://rocr.bioinf.mpi-sb.mpg.de.
Liaw A, Wiener M (2002). “Classification and Regression by randomForest.” R News, 2(3), 18-22. https://CRAN.R-project.org/doc/Rnews/.
Kursa MB., Witold R. Rudnicki. (2010). “Feature Selection with the Boruta Package.” Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11.
Boruta randomForest performance rfPred plotFeaturesMap ptn_model ptn_check_models ptn_features ptn_selectedFeatures ptn_networkGraph
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Run feature integration modelling using random forest classification: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, analysis_type ="rf", comparisons = list(c(1,2)))tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Run feature integration modelling using random forest classification: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, analysis_type ="rf", comparisons = list(c(1,2)))
The foldingEnergyAnalysis function compares folding energies for each mRNA sequence region. Pre-calculated folding energies for reference sequences are available for certain RefSeq releases (for human and mouse). These values were calculated using the mfold algorithm. Alternatively, user-supplied values can be used in analyses. In addition, folding energies for gene sets of interest can be compared against background, or other gene sets. Several options are available for plotting the comparisons.
foldingEnergyAnalysis(ptn, sourceFE = "load", customFileFE = NULL, residFE = FALSE, region, comparisons = NULL, plotOut = TRUE, plotType = "ecdf", pdfName = NULL)foldingEnergyAnalysis(ptn, sourceFE = "load", customFileFE = NULL, residFE = FALSE, region, comparisons = NULL, plotOut = TRUE, plotType = "ecdf", pdfName = NULL)
ptn |
A postNetData object. |
sourceFE |
A string specifying the source of the folding energy values. The options are |
customFileFE |
If |
residFE |
Logical indicating if the folding energies should be corrected for the length of the sequences. If |
region |
A character vector specifying the sequence region(s) to be analyzed. This can be |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
plotOut |
Logical indicating whether PDF files of plots are generated. If |
plotType |
If |
pdfName |
Name to be appended to output PDF files. The default is |
Pre-calculated folding energies supplied with the package are currently available for human and mouse RefSeq releases "rel_109.20201120", and "rel_109.20200923". These values were calculated for all sequence regions using the mfold algorithm, available here: https://www.unafold.org/.
If the output folding energies will be used in downstream analysis with featureIntegration, consideration should be given to the residFE argument. If the length of the sequence regions will be included in the featureIntegration model, it is not recommended to correct the folding energies for the sequence length. However, if length will not be included in featureIntegration, the correction can be performed.
The output will be a named list of vectors corresponding to each sequence region, with the folding energy for each gene. This list can be used as input for the downstream featureIntegration analysis. The foldingEnergyAnalysis function can also return PDF files of output plots with statistical comparisons between gene sets of interest, or against background.
If you use the foldingEnergyAnalysis function in your analysis with the sourceFE = "load", please cite:
M. Zuker. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-3415, 2003. https://doi.org/10.1093/nar/gkg595
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Compare the folding energy of the 5'UTR for each gene between translationUp genes vs. background, # translationDown genes vs. background, and translationUp vs. translationDown genes. FE <- foldingEnergyAnalysis(ptn = ptn, region=c("UTR5"), comparisons = list(c(0,1),c(0,2),c(1,2)), residFE = FALSE, plotType = "ecdf", sourceFE = "load", plotOut = TRUE, pdfName = tmp) str(FE)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Compare the folding energy of the 5'UTR for each gene between translationUp genes vs. background, # translationDown genes vs. background, and translationUp vs. translationDown genes. FE <- foldingEnergyAnalysis(ptn = ptn, region=c("UTR5"), comparisons = list(c(0,1),c(0,2),c(1,2)), residFE = FALSE, plotType = "ecdf", sourceFE = "load", plotOut = TRUE, pdfName = tmp) str(FE)
postNetData object
The gageAnalysis function implements the gage package to perform Generally Applicable Gene-set Enrichment (GAGE) analysis using the regulatory effect measurement contained in a postNetData object. These values are determined by the regulationGen or effectMeasure parameters in the postNetStart function.
gageAnalysis(ptn, category, genesSlopeFiltOut = NULL, maxSize = 500, minSize = 10)gageAnalysis(ptn, category, genesSlopeFiltOut = NULL, maxSize = 500, minSize = 10)
ptn |
A postNetData object. |
category |
A character vector specifying one or more gene ontology categories to be included in the analysis. The options are |
genesSlopeFiltOut |
If using an |
maxSize |
Integer specifying the maximal size of a gene set to test. Terms/pathways with more genes will be excluded. The default is |
minSize |
Integer specifying the minimal size of a gene set to test. Terms/pathways with fewer genes will be excluded. The default is |
After running GAGE analysis, the results can be retrieved from the postNetData object using the S4 method ptn_GAGE.
Note that if using results from an anota2seq analysis, it is strongly recommended to first apply filtering using the slopeFilt function to exclude genes with unrealistic regression slopes.
The results returned by the gageAnalysis function are compiled into an S4 object of class postNetGAGE and stored in the "GAGE" slot of the postNetData object. As described in the gage package, each gene ontology category has a slot in the results object that stores a named list with three elements ("greater", "less" and "stats"). The "stats" element contains the test statistics. The "greater" and "less" elements are data.frames with the results of a two-directional test, each having the following columns:
p.geomean: The geometric mean of the individual p-values from multiple single array-based gene set tests.
stat.mean: The mean of the individual statistics from multiple single array-based gene set tests. The value denotes the magnitude of the gene-set level changes, and the sign denotes the direction of the changes.
p.val: The global p-value or summary of the individual p-values from multiple single array-based gene set tests.
q.val: The Benjamini–Hochberg adjusted p-value, as implemented by the multtest package.
set.size: The number of genes included in the gene set.
Genes: The gene IDs in the input data belonging to the gene ontology gene set.
If you use the gageAnalysis function in your analysis, please cite:
Luo, W., Friedman, M., Shedden K., Hankenson K., and Woolf P. GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161.
# ------------------------------------------------------------------ # Example 1: Running GAGE with gene lists # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE: ptn <- gageAnalysis(ptn, category = "CC") # Extract the significant enrichment results from the postNetData object: gageOut <- ptn_GAGE(ptn = ptn, category = "CC", direction = "greater", threshold = 1) # This will return the complete results. # For only significant results # the threshold should be lowered. str(gageOut) # ------------------------------------------------------------------ # Example 2: Running GAGE requiring anota2seq analysis slope filtering # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Run GAGE: ptn <- gageAnalysis(ptn, genesSlopeFiltOut = filtOutGenes, category = "CC")# ------------------------------------------------------------------ # Example 1: Running GAGE with gene lists # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE: ptn <- gageAnalysis(ptn, category = "CC") # Extract the significant enrichment results from the postNetData object: gageOut <- ptn_GAGE(ptn = ptn, category = "CC", direction = "greater", threshold = 1) # This will return the complete results. # For only significant results # the threshold should be lowered. str(gageOut) # ------------------------------------------------------------------ # Example 2: Running GAGE requiring anota2seq analysis slope filtering # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Run GAGE: ptn <- gageAnalysis(ptn, genesSlopeFiltOut = filtOutGenes, category = "CC")
The get_signatures function retrieves gene signatures of interest provided with the package for the indicated species.
get_signatures(species)get_signatures(species)
species |
A string to select the species of interest. Currently, |
The data retrieved contains a list of gene signatures relevant to understanding possible pathways and mRNA features involved in regulation of mRNA translation, with a specific focus on mTOR and integrated stress response (ISR)-sensitive translation. These signatures have been compiled either directly from published studies, or from analyses of published datasets (sources and references described in detail below). Note that if the original source of the gene signature was from human, these gene IDs have been converted to mouse, and vice versa.
These gene signatures can be used with the plotSignatures, plotSignatures_ads, and signaturesHeatmap functions to perform statistical analyses and visualizations of how each gene signature is regulated in a given dataset. Furthermore, gene signatures can also be provided as features in the featureIntegration function which performs forward stepwise regression modelling or Random Forest classification to identify features that are most important in explaining post-transcriptional regulation in a given dataset. The signCalc function can be used to convert gene signatures into a format compatible with featureIntegration.
The format is:
List of 5
$ Gandin_etal_2016_mTOR_transUp : chr [1:1487] "Zfy1" "Zfy2" "Isca2" "Rps14" ...
$ Gandin_etal_2016_mTOR_transDown : chr [1:1765] "Lpin1" "Galnt1" "Atrx" "Tbc1d31" ...
$ Cockman_etal_2020_classicTOP : chr [1:70] "Rpl37" "Rps14" "Hnrnpa1" "Rpl39" ...
$ Guan_etal_2017_Tg1_transUp : chr [1:1336] "Mcm7" "Bcl2" "Usp33" "Gm13212" ...
$ Guan_etal_2017_Tg1_transDown : chr [1:774] "Mapre3" "Mettl24" "Isg15" "Idh3g" ...
Genes translationally regulated downstream of mTOR:
Signature Information:
Signature Name(s): Gandin_etal_2016_mTOR_transUp, Gandin_etal_2016_mTOR_transDown
Source: MCF7 cells
Species: Human
Conditions: Insulin (4.2 nM) + torin1 (250 nM), Insulin (4.2 nM) + DMSO; 4 hrs
Replicates: 4 Reps/Condition
Comparison: Insulin + torin1 vs. Insulin
Publication Information:
Title: mTORC1 and CK2 coordinate ternary and eIF4F complex assembly
Method: Polysome fractionation followed by microarray
Data Source: Anota2seq analysis of microarray data deposited in GSE76766.
Analysis Method: Anota2seq algorithm (version 1.14.0). The following thresholds were applied within the anota2seqRun function: maxPAdj = 0.15; deltaP = log2(1.2); deltaT = log2(1.2); deltaPT = log2(1.2); deltaTP = log2(1.2); maxSlopeTranslation = 2; minSlopeTranslation = -1; minSlopeBuffering = -2; maxSlopeBuffering = 1. Replicate was included in the anota2seq model using the ‘batchVec’ parameter to account for batch effects.
First Author, Last Author, Year: Gandin, Topisirovic, 2016
PMID: 27040916
GEO Accession: GSE76766
DOI: 10.1038/ncomms11127
Dataset Summary: Genes classified as translationally activated (transUp) or suppressed (transDown) upon inhibition of mTOR via torin1 treatment in insulin-stimulated MCF7 cells.
mRNAs containing 5'terminal oligopyrimidine (5'TOP) motifs:
Signature Information:
Signature Name: Cockman_etal_2020_classicTOP
Source: N/A
Species: Human
Conditions: N/A
Replicates: N/A
Comparison: N/A
Publication Information:
Title: TOP mRNPs: Molecular Mechanisms and Principles of Regulation
Method: N/A
Data Source: Data obtained from supplementary materials (see: Appendix A Table 1A of publication)
Analysis Method:
First Author, Last Author, Year: Cockman, Ivanov, 2020
PMID: 32605040
GEO Accession: N/A
DOI: 10.3390/biom10070969
Dataset Summary: A curated list of well-validated 5'TOP motif-containing mRNAs.
Genes translationally regulated downstream of ISR activation:
Signature Information:
Signature Name(s): Guan_etal_2017_Tg1_transUp, Guan_etal_2017_Tg1_transDown
Source: Mouse embryonic fibroblasts (MEFs)
Species: Mouse
Conditions: Thapsigargin (400 nM), Vehicle control (DMSO); 1 hr
Replicates: 4 Reps/Condition
Comparison: Thapsigargin vs. DMSO
Publication Information:
Title: A Unique ISR Program Determines Cellular Responses to Chronic Stress
Method: Polysome fractionation followed by RNA sequencing
Data Source:
Analysis Method:
First Author, Last Author, Year: Guan, Hatzoglou, 2017
PMID: 29220654
GEO Accession: GSE90070
DOI: 10.1016/j.molcel.2017.11.007
Dataset Summary: Genes translationally activated (transUp) or suppressed (transDown) following activation of the ISR via thapsigargin treatment in MEFs.
If you use these gene signatures in your analysis, please cite:
Gandin V, Masvidal L, Cargnello M, Gyenis L, McLaughlan S, Cai Y, Tenkerian C, Morita M, Balanathan P, Jean-Jean O, Stambolic V, Trost M, Furic L, Larose L, Koromilas AE, Asano K, Litchfield D, Larsson O, Topisirovic I. mTORC1 and CK2 coordinate ternary and eIF4F complex assembly. Nat Commun. 2016 Apr 4;7:11127. doi: 10.1038/ncomms11127. PMID: 27040916; PMCID: PMC4822005.
Cockman E, Anderson P, Ivanov P. TOP mRNPs: Molecular Mechanisms and Principles of Regulation. Biomolecules. 2020 Jun 27;10(7):969. doi: 10.3390/biom10070969. PMID: 32605040; PMCID: PMC7407576.
Guan BJ, van Hoef V, Jobava R, Elroy-Stein O, Valasek LS, Cargnello M, Gao XH, Krokowski D, Merrick WC, Kimball SR, Komar AA, Koromilas AE, Wynshaw-Boris A, Topisirovic I, Larsson O, Hatzoglou M. A Unique ISR Program Determines Cellular Responses to Chronic Stress. Mol Cell. 2017 Dec 7;68(5):885-900.e6. doi: 10.1016/j.molcel.2017.11.007. PMID: 29220654; PMCID: PMC5730339.
plotSignatures
plotSignatures_ads
signaturesHeatmap
signCalc
# Load the human gene signatures: humanSignatures <- get_signatures("human") str(humanSignatures)# Load the human gene signatures: humanSignatures <- get_signatures("human") str(humanSignatures)
postNetData object
The goAnalysis function implements clusterProfiler to perform Gene Ontology
enrichment analysis using the gene sets of interest associated with the regulatory effect measurement defined within the postNetData object.
goAnalysis(ptn, genesSlopeFiltOut = NULL, category, maxSize = 500, minSize = 10, counts = 10, FDR = 0.15, name = NULL)goAnalysis(ptn, genesSlopeFiltOut = NULL, category, maxSize = 500, minSize = 10, counts = 10, FDR = 0.15, name = NULL)
ptn |
A postNetData object. |
genesSlopeFiltOut |
If using an |
category |
A character vector specifying one or more Gene Ontology categories to be included in the analysis. The options are |
maxSize |
Integer specifying the maximal size of a gene set to test. Terms/pathways with more genes will be excluded. The default is |
minSize |
Integer specifying the minimal size of a gene set to test. Terms/pathways with fewer genes will be excluded. The default is |
counts |
Integer specifying the minimal number of genes in the gene set of interest that must be included in the term. Terms with fewer genes overlapping with the gene set of interest than the threshold specified will be discarded, regardless of significance. The default is |
FDR |
A numeric value providing the FDR threshold to be used to select significant enrichments. The default is |
name |
Optionally, provide a name to be appended to the Excel output file storing the GO analysis results. Note that the "GO" slot of the |
After running GO term analysis, the results can be retrieved from the postNetData object using the S4 method ptn_GO function. Results can also be visualized using the goDotplot function.
Note that if using results from an anota2seq analysis, it is strongly recommended to first apply filtering using the slopeFilt function to exclude genes with unrealistic regression slopes.
An S4 enrichResult-class object (DOSE package) is stored in the analysis slot of the postNetData object for each category of GO terms assessed. In addition, the "results" slot of each enrichResult object is written to an Excel file. Separate Excel files are created for each category.
See clusterProfiler for full details of enrichment analysis implementation in R.
If you use the goAnalysis function in your analysis, please cite:
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320
clusterProfiler
postNetStart
slopeFilt
ptn_GO
goDotplot
tmp <- tempfile(fileext = ".xlxs") tmp2 <- tempfile(fileext = ".pdf") # ------------------------------------------------------------------ # Example 1: Running GO term analysis # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, category=c("BP"), name = tmp) # Extract the significant enrichment results from the postNetData object: goOut <- ptn_GO(ptn, category = "BP", geneList = "translationUp", threshold = 0.05) str(goOut) # Plot GO term analysis results: goDotplot(ptn = ptn, category = "BP", nCategories = 50, pool = TRUE, size = "geneRatio", pdfName = tmp2) # ------------------------------------------------------------------ # Example 2: Running GO term analysis with anota2seq results # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details): ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, genesSlopeFiltOut = filtOutGenes, category = c("BP", "KEGG"), name = "example")tmp <- tempfile(fileext = ".xlxs") tmp2 <- tempfile(fileext = ".pdf") # ------------------------------------------------------------------ # Example 1: Running GO term analysis # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, category=c("BP"), name = tmp) # Extract the significant enrichment results from the postNetData object: goOut <- ptn_GO(ptn, category = "BP", geneList = "translationUp", threshold = 0.05) str(goOut) # Plot GO term analysis results: goDotplot(ptn = ptn, category = "BP", nCategories = 50, pool = TRUE, size = "geneRatio", pdfName = tmp2) # ------------------------------------------------------------------ # Example 2: Running GO term analysis with anota2seq results # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details): ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, genesSlopeFiltOut = filtOutGenes, category = c("BP", "KEGG"), name = "example")
The goDotplot function can be used to produce dot plots visualizing the results of GO term enrichment analyses
performed using goAnalysis. The input is a postNetData object, with several options for plotting.
goDotplot(ptn, category, pool = TRUE, termSel = NULL, nCategories = 10, size = "Count", pdfName = NULL)goDotplot(ptn, category, pool = TRUE, termSel = NULL, nCategories = 10, size = "Count", pdfName = NULL)
ptn |
A postNetData object. |
category |
A character vector specifying one or more Gene Ontology categories to plot results for.
The options are |
pool |
Logical indicating if results for all gene sets of interest should be pooled into a single plot. For example,
if |
termSel |
A character vector can be supplied to specify specific terms to be plotted. For example, |
nCategories |
An integer specifying the maximum number of terms to include on each plot. Default is |
size |
A string specifying the metric used to determine the size of the dots. The options are |
pdfName |
Name to be appended to output PDF files. Default is |
To produce dot plots, the goAnalysis function must first be run on the
postNetData object.
No value is returned. Graphical outputs are generated as PDF files.
# load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, category = c("BP"), name = "example") # Plot GO term analysis results for both gene sets of interest on one plot: goDotplot(ptn=ptn, category = "BP", nCategories = 20, pool = TRUE, size = "Count", pdfName = "example")# load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, category = c("BP"), name = "example") # Plot GO term analysis results for both gene sets of interest on one plot: goDotplot(ptn=ptn, category = "BP", nCategories = 20, pool = TRUE, size = "Count", pdfName = "example")
postNetData object
The gseaAnalysis function implements the fgsea package to perform Gene Set Enrichment Analysis using the gene sets associated with the regulatory effect measurement contained in a postNetData object. These gene sets are determined by the regulationGen or effectMeasure parameters in the postNetStart function.
gseaAnalysis(ptn, genesSlopeFiltOut = NULL, collection = NULL, subcollection = NULL, subsetNames = NULL, geneSet = NULL, maxSize = 500, minSize = 10, name = NULL)gseaAnalysis(ptn, genesSlopeFiltOut = NULL, collection = NULL, subcollection = NULL, subsetNames = NULL, geneSet = NULL, maxSize = 500, minSize = 10, name = NULL)
ptn |
A postNetData object. |
genesSlopeFiltOut |
If using an |
collection |
A character vector selecting the MSigDB collection of interest, for example: |
subcollection |
Optionally, provide a character vector selecting the MSigDB subcollection of interest. For example: |
subsetNames |
Optionally, provide a character vector specifying specific terms to use for GSEA by providing the names of the pathways. For example: |
geneSet |
Optionally, instead of using a GSEA collection, provide named list of character vectors with custom gene signatures to be used in GSEA. Default is |
maxSize |
Integer specifying the maximal size of a gene set to test. Terms/pathways with more genes will be excluded. Default is |
minSize |
Integer specifying the minimal size of a gene set to test. Terms/pathways with fewer genes will be excluded. Default is |
name |
Optionally, provide a name for the tab-delimited output file storing the results of the GSEA. Note that the "GSEA" slot of the |
After running GSEA, the results of the analysis can be retrieved from the postNetData object using the S4 method ptn_GSEA function. These results can then be provided as input to the gseaPlot function, which can be used to generate GSEA plots for all results, or for specific terms of interest.
Note that if the results of an anota2seq analysis are being used for GSEA, it is strongly recommended to first perform filtering using the slopeFilt function.
A data.frame of GSEA results that is stored in the GSEA slot of the postNetData object, and written to a tab-delimited text file. Each row of the data.frame corresponds to a term, with the following columns (also see fgseaMultilevel from the fgsea package):
Term: The name of the term or pathway tested.
ES: The enrichment score.
NES: The normalized enrichment score, adjusted to the mean enrichment of randomly sampled size-matched sets.
log2err: The expected error for the standard deviation of the P-value logarithm.
count: The number of genes in the set belonging to the term or pathway.
size: The size of the pathway after removing genes not present in the input.
pvalue: The enrichment p-value.
adjusted_pvalue: The BH-adjusted p-value.
Genes: The gene IDs in the input data belonging to the term/pathway.
See fgsea for full details of GSEA implementation in R.
See https://www.gsea-msigdb.org/gsea/index.jsp for GSEA documentation.
If you use the gseaAnalysis function in your analysis, please cite:
A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, & J.P. Mesirov. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U.S.A. 102 (43) 15545-15550, (2005).
Mootha, V., Lindgren, C., Eriksson, KF. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–273 (2003).
Korotkevich G, Sukhov V, Sergushichev A (2019). “Fast gene set enrichment analysis.” bioRxiv. doi:10.1101/060012, http://biorxiv.org/content/early/2016/06/20/060012.
GSEABase
msigdb
fgsea
postNetStart
slopeFilt
ptn_GSEA
gseaPlot
tmp <- tempfile(fileext = ".txt") tmp2 <- tempfile(fileext = ".pdf") # ------------------------------------------------------------------ # Example 1: Running GSEA with a custom gene set # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create example custom gene sets for GSEA: inSet <- list(set=sample(unlist(postNetExample$geneList[[1]]), 10)) # Run GSEA on custom gene sets: ptn <- gseaAnalysis(ptn = ptn, geneSet = inSet, name = tmp) # Extract the significant enrichment results from the postNetData object: gseaOut <- ptn_GSEA(ptn, threshold = 0.05) str(gseaOut) # Plot GSEA results: gseaPlot(ptn = ptn, termNames = gseaOut$Term[1], pdfName = tmp2) # ------------------------------------------------------------------ # Example 2: Running GSEA with MSigDB collections # ------------------------------------------------------------------ # If using a GSEA collection from MSigDB, check the available versions: version <- msigdb::getMsigdbVersions() str(version) # Retrieve the MSigDB data for "human": msigdbOut <- msigdb::getMsigdb(org = "hs", id = "SYM", version = version) # Check the available collections or subcollections: msigdb::listCollections(msigdbOut) msigdb::listSubCollections(msigdbOut) # Initialize a postNetData object: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GSEA on the C5 collection with GO:BP and specific terms: ptn <- gseaAnalysis(ptn = ptn, collection = "c5", subcollection = "GO:BP", subsetNames = c("GOBP_CELL-CELL_SIGNALING_BY_WNT", "GOBP_ENDOCYTOSIS"), name = "c5_Example") # ------------------------------------------------------------------ # Example 3: Running GSEA requiring anota2seq analysis slope filtering # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Create example custom gene sets for GSEA: set1 <- sample(row.names(postNetExample$ads_data$dataP), 10) inSet <- list(Set1 = set1) # Run GSEA on the C5 collection: ptn <- gseaAnalysis(ptn = ptn, genesSlopeFiltOut = filtOutGenes, geneSet = inSet, name = "ads_example")tmp <- tempfile(fileext = ".txt") tmp2 <- tempfile(fileext = ".pdf") # ------------------------------------------------------------------ # Example 1: Running GSEA with a custom gene set # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create example custom gene sets for GSEA: inSet <- list(set=sample(unlist(postNetExample$geneList[[1]]), 10)) # Run GSEA on custom gene sets: ptn <- gseaAnalysis(ptn = ptn, geneSet = inSet, name = tmp) # Extract the significant enrichment results from the postNetData object: gseaOut <- ptn_GSEA(ptn, threshold = 0.05) str(gseaOut) # Plot GSEA results: gseaPlot(ptn = ptn, termNames = gseaOut$Term[1], pdfName = tmp2) # ------------------------------------------------------------------ # Example 2: Running GSEA with MSigDB collections # ------------------------------------------------------------------ # If using a GSEA collection from MSigDB, check the available versions: version <- msigdb::getMsigdbVersions() str(version) # Retrieve the MSigDB data for "human": msigdbOut <- msigdb::getMsigdb(org = "hs", id = "SYM", version = version) # Check the available collections or subcollections: msigdb::listCollections(msigdbOut) msigdb::listSubCollections(msigdbOut) # Initialize a postNetData object: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GSEA on the C5 collection with GO:BP and specific terms: ptn <- gseaAnalysis(ptn = ptn, collection = "c5", subcollection = "GO:BP", subsetNames = c("GOBP_CELL-CELL_SIGNALING_BY_WNT", "GOBP_ENDOCYTOSIS"), name = "c5_Example") # ------------------------------------------------------------------ # Example 3: Running GSEA requiring anota2seq analysis slope filtering # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Create example custom gene sets for GSEA: set1 <- sample(row.names(postNetExample$ads_data$dataP), 10) inSet <- list(Set1 = set1) # Run GSEA on the C5 collection: ptn <- gseaAnalysis(ptn = ptn, genesSlopeFiltOut = filtOutGenes, geneSet = inSet, name = "ads_example")
The gseaPlot function generates visualizations of the results of GSEA. The rank is plotted on the x-axis against the enrichment score on the y-axis. The positions of genes belonging to the term or pathway of interest are indicated in the ranking by red bars.
gseaPlot(ptn, termNames, genesSlopeFiltOut = NULL, gseaParam = 1, ticksSize = 0.3, pdfName = NULL)gseaPlot(ptn, termNames, genesSlopeFiltOut = NULL, gseaParam = 1, ticksSize = 0.3, pdfName = NULL)
ptn |
A postNetData object. |
termNames |
Character vector specifying the names of the terms to be plotted. |
genesSlopeFiltOut |
If using an |
gseaParam |
The GSEA parameter used to adjust the displayed statistic values. Can be either 0 or 1.
Default is |
ticksSize |
Number to adjust the thickness of the lines indicating the position of genes in the ranking. Default
is |
pdfName |
Optionally, provide a name for the output PDF file. |
No value is returned. A graphical output is generated as a PDF file.
See fgsea for full details of GSEA implementation in R.
See https://www.gsea-msigdb.org/gsea/index.jsp for GSEA documentation.
A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, & J.P. Mesirov. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U.S.A. 102 (43) 15545-15550, (2005).
Mootha, V., Lindgren, C., Eriksson, KF. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–273 (2003).
tmp <- tempfile(fileext = ".txt") tmp2 <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create example custom gene sets for GSEA: inSet <- list(set = sample(unlist(postNetExample$geneList[[1]]), 10)) # Run GSEA on custom gene sets: ptn <- gseaAnalysis(ptn = ptn, geneSet = inSet, name = tmp) # Extract the significant enrichment results from the postNetData object: gseaOut <- ptn_GSEA(ptn, threshold = 0.05) # Plot GSEA results: gseaPlot(ptn = ptn, termNames = gseaOut$Term[1], pdfName = tmp)tmp <- tempfile(fileext = ".txt") tmp2 <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create example custom gene sets for GSEA: inSet <- list(set = sample(unlist(postNetExample$geneList[[1]]), 10)) # Run GSEA on custom gene sets: ptn <- gseaAnalysis(ptn = ptn, geneSet = inSet, name = tmp) # Extract the significant enrichment results from the postNetData object: gseaOut <- ptn_GSEA(ptn, threshold = 0.05) # Plot GSEA results: gseaPlot(ptn = ptn, termNames = gseaOut$Term[1], pdfName = tmp)
The humanSignatures dataset is a list of example gene signatures derived from published studies profiling mRNA
translation downstream of mTOR signalling and activation of the integrated stress response (ISR). In addition, an example of a signature comprised of mRNAs containing
5'TOP motifs is also included. The list is not intended to be exhaustive, but provides examples of how gene signatures can be used in a postNet analysis to gain
insights into post-transcriptional regulation of gene expression.
data("humanSignatures")data("humanSignatures")
The format is: List of 5 $ Gandin_etal_2016_mTOR_transUp : chr [1:1569] "ZNF467" "PHLDA2" "PYCARD" "FHOD1" ... $ Gandin_etal_2016_mTOR_transDown : chr [1:1721] "DST" "OTULINL" "MME" "BRD8" ... $ Cockman_etal_2020_classicTOP : chr [1:92] "RPSA" "RPS2" "RPS3" "RPS3A" ... $ Guan_etal_2017_Tg1_transUp : chr [1:1192] "AR" "ATP7A" "BRCA2" "LYST" ... $ Guan_etal_2017_Tg1_transDown : chr [1:731] "AGA" "BCKDHB" "CST3" "CYBA" ...
The humanSignatures dataset contains a list of gene signatures relevent to understanding possible pathways and
mRNA features involved in regulation of mRNA translation, with a specific focus on mTOR and integrated stress response (ISR)-sensitive translation.
These signatures have been compiled either directly from published studies, or from analyses of published datasets (sources and references
described in detail below). Note that if the orignal source of the gene signature was from mouse, these gene IDs have been converted to human.
These gene signatures can be used with the plotSignatures, plotSignatures_ads, and signaturesHeatmap functions to perform statistical analyses and visualizations of how each gene signature is regulated in a given dataset. Furthermore, gene signatures can also be provided as features in the featureIntegration function which performs step-wise regression modelling or random forest classification to identify features that are most important in explaining post-transcriptional regulation in a given dataset.
Genes translationally regulated downstream of mTOR:
Signature Information:
Signature Name(s): Gandin_etal_2016_mTOR_transUp, Gandin_etal_2016_mTOR_transDown
Source: MCF7 cells
Species: Human
Conditions: Insulin (4.2 nM) + torin1 (250 nM), Insulin (4.2 nM) + DMSO; 4 hrs
Replicates: 4 Reps/Condition
Comparison: Insulin + torin1 vs. Insulin
Publication Information:
Title: mTORC1 and CK2 coordinate ternary and eIF4F complex assembly
Method: Polysome fractionation followed by microarray
Data Source: Anota2seq analysis of microarray data deposited in GSE76766.
Analysis Method: Anota2seq algorithm (version 1.14.0). The following thresholds were applied within the anota2seqRun function: maxPAdj = 0.15; deltaP = log2(1.2); deltaT = log2(1.2); deltaPT = log2(1.2); deltaTP = log2(1.2); maxSlopeTranslation = 2; minSlopeTranslation = -1; minSlopeBuffering = -2; maxSlopeBuffering = 1. Replicate was included in the anota2seq model using the ‘batchVec’ parameter to account for batch effects.
First Author, Last Author, Year: Gandin, Topisirovic, 2016
PMID: 27040916
GEO Accession: GSE76766
DOI: 10.1038/ncomms11127
Dataset Summary: Genes classified as translationally activated (transUp) or suppressed (transDown) upon inhibition of mTOR via torin1 treatment in insulin-stimulated MCF7 cells.
mRNAs containing 5'terminal oligopyrimidine (5'TOP) motifs:
Signature Information:
Signature Name: Cockman_etal_2020_classicTOP
Source: N/A
Species: Human
Conditions: N/A
Replicates: N/A
Comparison: N/A
Publication Information:
Title: TOP mRNPs: Molecular Mechanisms and Principles of Regulation
Method: N/A
Data Source: Data obtained from supplementary materials (see: Appendix A Table 1A of publication)
Analysis Method:
First Author, Last Author, Year: Cockman, Ivanov, 2020
PMID: 32605040
GEO Accession: N/A
DOI: 10.3390/biom10070969
Dataset Summary: A curated list of well-validated 5'TOP motif-containing mRNAs.
Genes translationally regulated downstream of ISR activation:
Signature Information:
Signature Name(s): Guan_etal_2017_Tg1_transUp, Guan_etal_2017_Tg1_transDown
Source: Mouse embryonic fibroblasts (MEFs)
Species: Mouse
Conditions: Thapsigargin (400 nM), Vehicle control (DMSO); 1 hr
Replicates: 4 Reps/Condition
Comparison: Thapsigargin vs. DMSO
Publication Information:
Title: A Unique ISR Program Determines Cellular Responses to Chronic Stress
Method: Polysome fractionation followed by RNA seqeuncing
Data Source:
Analysis Method:
First Author, Last Author, Year: Guan, Hatzoglou, 2017
PMID: 29220654
GEO Accession: GSE90070
DOI: 10.1016/j.molcel.2017.11.007
Dataset Summary: Genes translationally activated (transUp) or suppressed (transDown) following activation of the ISR via thapsigargin treatment in MEFs.
If you use these gene signatures in your analysis, please cite:
Gandin V, Masvidal L, Cargnello M, Gyenis L, McLaughlan S, Cai Y, Tenkerian C, Morita M, Balanathan P, Jean-Jean O, Stambolic V, Trost M, Furic L, Larose L, Koromilas AE, Asano K, Litchfield D, Larsson O, Topisirovic I. mTORC1 and CK2 coordinate ternary and eIF4F complex assembly. Nat Commun. 2016 Apr 4;7:11127. doi: 10.1038/ncomms11127. PMID: 27040916; PMCID: PMC4822005.
Cockman E, Anderson P, Ivanov P. TOP mRNPs: Molecular Mechanisms and Principles of Regulation. Biomolecules. 2020 Jun 27;10(7):969. doi: 10.3390/biom10070969. PMID: 32605040; PMCID: PMC7407576.
Guan BJ, van Hoef V, Jobava R, Elroy-Stein O, Valasek LS, Cargnello M, Gao XH, Krokowski D, Merrick WC, Kimball SR, Komar AA, Koromilas AE, Wynshaw-Boris A, Topisirovic I, Larsson O, Hatzoglou M. A Unique ISR Program Determines Cellular Responses to Chronic Stress. Mol Cell. 2017 Dec 7;68(5):885-900.e6. doi: 10.1016/j.molcel.2017.11.007. PMID: 29220654; PMCID: PMC5730339.
# Load the mouse signatures humanSignatures <- get_signatures("human") str(humanSignatures)# Load the mouse signatures humanSignatures <- get_signatures("human") str(humanSignatures)
The lengthAnalysis function calculates the log2-transformed length of different regions of mRNA sequences (5'UTR, 3'UTR, and CDS) for each gene. In addition, the length of different sequence regions for gene sets of interest can be compared against background, or other gene sets. Several options are available for plotting the comparisons.
lengthAnalysis(ptn, region, comparisons = NULL, plotOut = TRUE, plotType = "boxplot", pdfName = NULL)lengthAnalysis(ptn, region, comparisons = NULL, plotOut = TRUE, plotType = "boxplot", pdfName = NULL)
ptn |
A postNetData object. |
region |
A character vector specifying the sequence region(s) to be analyzed. This can be |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
plotOut |
Logical indicating whether PDF files of plots are generated. If |
plotType |
If |
pdfName |
Name to be appended to output PDF files. |
A two-sided Wilcoxon Rank Sum test is performed to identify significant differences in mRNA sequence region length between gene sets of interest, or against the background gene set.
A named list of vectors for selected mRNA sequence regions with the calculated log2-transformed lengths for each gene. The elements in each vector are named with the gene ID. The lengthAnalysis function can also return PDF files of output plots with statistical comparisons between gene sets of interest.
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Calculate the length of the 5'UTR, 3'UTR and coding sequence for each gene, # and compare lengths between translationUp genes vs. background, # translationDown genes vs. background, # and translationUp vs. translationDown genes: len <- lengthAnalysis(ptn = ptn, region = c("UTR5","CDS","UTR3"), comparisons = list(c(0,1),c(0,2),c(1,2)), plotOut = TRUE, plotType = "boxplot", pdfName = tmp) str(len)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Calculate the length of the 5'UTR, 3'UTR and coding sequence for each gene, # and compare lengths between translationUp genes vs. background, # translationDown genes vs. background, # and translationUp vs. translationDown genes: len <- lengthAnalysis(ptn = ptn, region = c("UTR5","CDS","UTR3"), comparisons = list(c(0,1),c(0,2),c(1,2)), plotOut = TRUE, plotType = "boxplot", pdfName = tmp) str(len)
postNetData object
The miRNAanalysis function employs gage in combination with lists of mRNA targets of miRNA from the
TargetScan database to identify enrichments in miRNA targets within the gene sets of interest contained in a postNetData object.
This function does not identify or predict miRNA binding sites, but rather assesses whether gene sets of interest may be regulated
by particular miRNAs.
miRNAanalysis(ptn, miRNATargetScanFile, genesSlopeFiltOut = NULL, contextScore = -0.2, Pct = 0, maxSize = 500, minSize = 10)miRNAanalysis(ptn, miRNATargetScanFile, genesSlopeFiltOut = NULL, contextScore = -0.2, Pct = 0, maxSize = 500, minSize = 10)
ptn |
A postNetData object. |
miRNATargetScanFile |
The name and path of a TAB delimited text file with miRNA target predictions. The target prediction file must contain the headings 'Cumulative weighted context++ score', 'Aggregate PCT', 'Gene Symbol', and 'Representative miRNA', and can be downloaded from the TargetScan database https://www.targetscan.org. Importantly, the file must be subset to contain predictions for only the desired species prior to running the analysis. |
genesSlopeFiltOut |
If using an |
contextScore |
A negative numeric value specifying the threshold of the cumulative weighted context++ score on which to filter miRNA target predictions. Larger negative values of |
Pct |
A numeric value between 0 and 1 specifying the threshold of the aggregate Pct score on which to filter miRNA target predictions. Larger values of |
maxSize |
Integer specifying the maximal size of a gene set to test. miRNA with more predicted target genes will be excluded. Default is |
minSize |
Integer specifying the minimal size of a gene set to test. miRNAs with fewer predicted target genes will be excluded. Default is |
For each miRNA-mRNA targeting prediction, two metrics are obtained from the TargetScan database; the cumulative weighted context++ score (CWCS) (see https://www.targetscan.org/vert_71/docs/context_score_totals.html for details) and the Pct score (see https://www.targetscan.org/docs/pct.html for details). In order to limit the analysis to high-confidence targeting predictions, filtering can be performed using one, or both of these metrics. Subsequently, gage is applied to identify enrichments in miRNAs predicted to target gene sets of interest based on ranked genes (ranked by the regulatory effect measurement).
Depending on the biological question, it is important to consider which thresholds to use to filter target predictions. Context scores provide a prediction of targeting efficiency, whereas Pct is an estimate of the probability that the miRNA-mRNA targeting interaction is evolutionarily conserved suggesting important biological function. See https://www.targetscan.org/faqs.html for further details regarding target prediction.
Note that enrichments in miRNA predicted to target genes that are upregulated (e.g., if the regulatory effect measurement is log2 fold change) that appear in the results table labelled "greater" can be interpreted as those miRNA that may be downregulated or otherwise not active in the experimental condition. Likewise, enrichments in miRNA predicted to target genes that are downregulated that appear in the results table labelled "less" can be interpreted as those miRNA that may be upregulated or active in the experimental condition.
After running GAGE miRNA target enrichment analysis, the results can be retrieved from the postNetData object using the S4 method ptn_miRNA_analysis function. To include specific miRNA in downstream featureIntegration analysis, lists of genes targeted by particular miRNA can be retrieved using the S4 method ptn_miRNA_to_gene function. These gene lists can then be included as signatures in the models.
Note that if the results of an anota2seq analysis are being used for GAGE miRNA target enrichment analysis, it is strongly recommended to first perform filtering using the slopeFilt function.
The results returned by the miRNAanalysis function are compiled into an S4 object of class postNetmiRNA and stored in the "miRNA" slot of the postNetData object. The "miRNA_analysis" slot stores a named list with three elements ("greater", "less" and "stats"). As described in the gage package, the "stats" element contains the test statistics. The "greater" and "less" elements are data.frames with the results of a two-directional test, each having the following columns:
p.geomean: The geometric mean of the individual p-values from multiple single array-based gene set tests.
stat.mean: The mean of the individual statistics from multiple single array-based gene set tests. The value denotes the magnitude of the gene-set level changes, and the sign denotes the direction of the changes.
p.val: The global p-value or summary of the individual p-values from multiple single array-based gene set tests.
q.val: The BH adjusted p-value, as implemented by the multtest package.
set.size: The number of genes included in the gene set.
The contents of the "miRNA_analysis" slot can be accessed using the ptn_miRNA_analysis S4 method.
The "miRNA_to_gene" slot stores a list where each element is a vector of genes that are predicted to be targeted by the miRNA that passed the filtering thresholds for contextScore and/or Pct.
The contents of the "miRNA_to_gene" slot can be accessed using the ptn_miRNA_to_gene S4 method.
If you use the miRNAanalysis function in your analysis, please cite:
McGeary SE, Lin KS, Shi CY, Pham T, Bisaria N, Kelley GM, Bartel DP. The biochemical basis of microRNA targeting efficacy. Science Dec 5, (2019).
Agarwal V, Bell GW, Nam J, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4:e05005, (2015). eLife Lens view.
Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161.
gage
ptn_miRNA_to_gene
ptn_miRNA_analysis
## Not run: # ------------------------------------------------------------------ # Example 1: Running GAGE miRNA target enrichment analysis with gene lists # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn = ptn, miRNATargetScanFile = 'miRNA_predictions_hsa.txt', contextScore = -0.2, Pct = 0.4) # Extract the significant enrichment results from the postNetData object: miRNAout <- ptn_miRNA_analysis(ptn = ptn, direction = "less", threshold = 1) str(miRNAout) # ------------------------------------------------------------------ # Example 2: Running analysis requiring anota2seq analysis slope filtering # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn, genesSlopeFiltOut = filtOutGenes, miRNATargetScanFile = "miRNA_predictions_hsa.txt", Pct = 0.7) ## End(Not run)## Not run: # ------------------------------------------------------------------ # Example 1: Running GAGE miRNA target enrichment analysis with gene lists # ------------------------------------------------------------------ # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn = ptn, miRNATargetScanFile = 'miRNA_predictions_hsa.txt', contextScore = -0.2, Pct = 0.4) # Extract the significant enrichment results from the postNetData object: miRNAout <- ptn_miRNA_analysis(ptn = ptn, direction = "less", threshold = 1) str(miRNAout) # ------------------------------------------------------------------ # Example 2: Running analysis requiring anota2seq analysis slope filtering # ------------------------------------------------------------------ # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, source = "load", species = "human" ) # Run slope filtering to remove the genes with unrealistic slopes: filtOutGenes <- slopeFilt(ads = ads, regulationGen = "translation", contrastSel = 1) # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn, genesSlopeFiltOut = filtOutGenes, miRNATargetScanFile = "miRNA_predictions_hsa.txt", Pct = 0.7) ## End(Not run)
postNetData object
The motifAnalysis function applies the STREME method from the MEME-Suite to discover de novo ungapped motifs (recurring, fixed-length patterns) that are enriched in the selected sequence region(s) of gene sets of interest relative to control sequences (background).
motifAnalysis(ptn, stremeThreshold = 0.05, minwidth = 6, memePath = NULL, seqType = "dna", region, subregion = NULL, subregionSel = NULL)motifAnalysis(ptn, stremeThreshold = 0.05, minwidth = 6, memePath = NULL, seqType = "dna", region, subregion = NULL, subregionSel = NULL)
ptn |
a postNetData object. |
stremeThreshold |
A numeric value specifying the enrichment p-value threshold for motif selection. The default is 0.05. |
minwidth |
A numeric value specifying the minimal width (in nucleotides) of the motif. The default is 6. |
memePath |
The path to the STREME executables in "meme/bin". Note that the MEME-Suite must be installed. |
seqType |
A string specifying the type of sequence being provided to search for motifs. The options are,
|
region |
A character vector specifying the sequence region(s) to be analyzed. This can be |
subregion |
Optionally, it is possible to specify a more specific subregion of the sequences to either
select or exclude from the analysis. This is done by providing a numeric value indicating the number of
nucleotides from either the start of the sequence region if positive, or the end if negative. For example,
|
subregionSel |
If a value is provided for |
The motifAnalysis function applies the runStreme function from the memes package. However, the MEME Suite of software tools must also be installed locally, and the path to the executables must be provided. The MEME Suite can be downloaded here: https://meme-suite.org/meme/doc/download.html. Installation instructions can be found here: https://meme-suite.org/meme/doc/install.html?man_type=web.
Motifs identified using motifAnalysis can be quantified and included in downstream featureIntegration using the contentMotifs function. Significantly enriched motifs can be extracted using the S4 method ptn_motifSelection function, and universalmotif results for each gene set included in geneList can be retrieved using the ptn_motifgeneList function.
If you use the motifAnalysis function in your analysis, please appropriately cite the memes R package, The MEME Suite, and the STREME tool publications provided below. Licensing The MEME Suite is free for non-profit use. However, for-profit users should contact [email protected] and purchase a license. See http://meme-suite.org/doc/copyright.html for details
The motifAnalysis function returns an updated postNetData object with results compiled into an S4 object of class postNetMotifs and stored in the "motifs" slot. Within the motif slot, results for each sequence region are provided as lists. The "motifSelection" list element corresponds to a character vector of identified motifs in the gene sets of interest that have enrichment p-values below the threshold set with the stremeThreshold parameter. Additional list elements are provided for each gene set of interest containing detailed information about the selected motifs in universalmotif format.
Timothy L. Bailey, James Johnson, Charles E. Grant, William S. Noble. "The MEME Suite", Nucleic Acids Research, 43(W1):W39-W49, 2015.
Timothy L. Bailey, "STREME: Accurate and versatile sequence motif discovery", Bioinformatics, 2021. https://doi.org/10.1093/bioinformatics/btab203
Nystrom SL, McKay DJ (2021). "Memes: A motif analysis environment in R using tools from the MEME Suite." PLOS Computational Biology, 17, 1-14. doi:10.1371/journal.pcbi.1008991, https://doi.org/10.1371/journal.pcbi.1008991.
memes
runStreme
universalmotif
contentMotifs
ptn_motifSelection
ptn_motifgeneList
# Note that as users must provide the path to the MEME Suite executables, it is not possible to # create a fully executable example as this path will vary depending on many factors. An example of # how to run this function is provided below, and can be updated with the correct memePath argument. ## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c('UTR5'), subregion = NULL, subregionSel = NULL) # Extract the significantly enriched motifs: myMotifs <- ptn_motifSelection(ptn = ptn, region = "UTR5") str(myMotifs) # Extract full motif results from one gene set of interest: motifsTransUp <- ptn_motifgeneList(ptn = ptn, region = "UTR5", geneList = "translationUp") str(motifsTransUp) ## End(Not run)# Note that as users must provide the path to the MEME Suite executables, it is not possible to # create a fully executable example as this path will vary depending on many factors. An example of # how to run this function is provided below, and can be updated with the correct memePath argument. ## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c('UTR5'), subregion = NULL, subregionSel = NULL) # Extract the significantly enriched motifs: myMotifs <- ptn_motifSelection(ptn = ptn, region = "UTR5") str(myMotifs) # Extract full motif results from one gene set of interest: motifsTransUp <- ptn_motifgeneList(ptn = ptn, region = "UTR5", geneList = "translationUp") str(motifsTransUp) ## End(Not run)
The mouseSignatures dataset is a list of example gene signatures derived from published studies profiling mRNA
translation downstream of mTOR signalling and activation of the integrated stress response (ISR). In addition, an example of a signature comprised of mRNAs containing
5'TOP motifs is also included. The list is not intended to be exhaustive, but provides examples of how gene signatures can be used in a postNet analysis to gain
insights into post-transcriptional regulation of gene expression.
data("mouseSignatures")data("mouseSignatures")
The format is: List of 5 $ Gandin_etal_2016_mTOR_transUp : chr [1:1487] "Zfy1" "Zfy2" "Isca2" "Rps14" ... $ Gandin_etal_2016_mTOR_transDown : chr [1:1765] "Lpin1" "Galnt1" "Atrx" "Tbc1d31" ... $ Cockman_etal_2020_classicTOP : chr [1:70] "Rpl37" "Rps14" "Hnrnpa1" "Rpl39" ... $ Guan_etal_2017_Tg1_transUp : chr [1:1336] "Mcm7" "Bcl2" "Usp33" "Gm13212" ... $ Guan_etal_2017_Tg1_transDown : chr [1:774] "Mapre3" "Mettl24" "Isg15" "Idh3g" ...
The mouseSignatures dataset contains a list of gene signatures relevent to understanding possible pathways and
mRNA features involved in regulation of mRNA translation, with a specific focus on mTOR and integrated stress response (ISR)-sensitive translation.
These signatures have been compiled either directly from published studies, or from analyses of published datasets (sources and references
described in detail below). Note that if the orignal source of the gene signature was from human, these gene IDs have been converted to mouse.
These gene signatures can be used with the plotSignatures, plotSignatures_ads, and signaturesHeatmap functions to perform statistical analyses and visualizations of how each gene signature is regulated in a given dataset. Furthermore, gene signatures can also be provided as features in the featureIntegration function which performs step-wise regression modelling or random forest classification to identify features that are most important in explaining post-transcriptional regulation in a given dataset.
Genes translationally regulated downstream of mTOR:
Signature Information:
Signature Name(s): Gandin_etal_2016_mTOR_transUp, Gandin_etal_2016_mTOR_transDown
Source: MCF7 cells
Species: Human
Conditions: Insulin (4.2 nM) + torin1 (250 nM), Insulin (4.2 nM) + DMSO; 4 hrs
Replicates: 4 Reps/Condition
Comparison: Insulin + torin1 vs. Insulin
Publication Information:
Title: mTORC1 and CK2 coordinate ternary and eIF4F complex assembly
Method: Polysome fractionation followed by microarray
Data Source: Anota2seq analysis of microarray data deposited in GSE76766.
Analysis Method: Anota2seq algorithm (version 1.14.0). The following thresholds were applied within the anota2seqRun function: maxPAdj = 0.15; deltaP = log2(1.2); deltaT = log2(1.2); deltaPT = log2(1.2); deltaTP = log2(1.2); maxSlopeTranslation = 2; minSlopeTranslation = -1; minSlopeBuffering = -2; maxSlopeBuffering = 1. Replicate was included in the anota2seq model using the ‘batchVec’ parameter to account for batch effects.
First Author, Last Author, Year: Gandin, Topisirovic, 2016
PMID: 27040916
GEO Accession: GSE76766
DOI: 10.1038/ncomms11127
Dataset Summary: Genes classified as translationally activated (transUp) or suppressed (transDown) upon inhibition of mTOR via torin1 treatment in insulin-stimulated MCF7 cells.
mRNAs containing 5'terminal oligopyrimidine (5'TOP) motifs:
Signature Information:
Signature Name: Cockman_etal_2020_classicTOP
Source: N/A
Species: Human
Conditions: N/A
Replicates: N/A
Comparison: N/A
Publication Information:
Title: TOP mRNPs: Molecular Mechanisms and Principles of Regulation
Method: N/A
Data Source: Data obtained from supplementary materials (see: Appendix A Table 1A of publication)
Analysis Method:
First Author, Last Author, Year: Cockman, Ivanov, 2020
PMID: 32605040
GEO Accession: N/A
DOI: 10.3390/biom10070969
Dataset Summary: A curated list of well-validated 5'TOP motif-containing mRNAs.
Genes translationally regulated downstream of ISR activation:
Signature Information:
Signature Name(s): Guan_etal_2017_Tg1_transUp, Guan_etal_2017_Tg1_transDown
Source: Mouse embryonic fibroblasts (MEFs)
Species: Mouse
Conditions: Thapsigargin (400 nM), Vehicle control (DMSO); 1 hr
Replicates: 4 Reps/Condition
Comparison: Thapsigargin vs. DMSO
Publication Information:
Title: A Unique ISR Program Determines Cellular Responses to Chronic Stress
Method: Polysome fractionation followed by RNA seqeuncing
Data Source:
Analysis Method:
First Author, Last Author, Year: Guan, Hatzoglou, 2017
PMID: 29220654
GEO Accession: GSE90070
DOI: 10.1016/j.molcel.2017.11.007
Dataset Summary: Genes translationally activated (transUp) or suppressed (transDown) following activation of the ISR via thapsigargin treatment in MEFs.
If you use these gene signatures in you analysis, please cite:
Gandin V, Masvidal L, Cargnello M, Gyenis L, McLaughlan S, Cai Y, Tenkerian C, Morita M, Balanathan P, Jean-Jean O, Stambolic V, Trost M, Furic L, Larose L, Koromilas AE, Asano K, Litchfield D, Larsson O, Topisirovic I. mTORC1 and CK2 coordinate ternary and eIF4F complex assembly. Nat Commun. 2016 Apr 4;7:11127. doi: 10.1038/ncomms11127. PMID: 27040916; PMCID: PMC4822005.
Cockman E, Anderson P, Ivanov P. TOP mRNPs: Molecular Mechanisms and Principles of Regulation. Biomolecules. 2020 Jun 27;10(7):969. doi: 10.3390/biom10070969. PMID: 32605040; PMCID: PMC7407576.
Guan BJ, van Hoef V, Jobava R, Elroy-Stein O, Valasek LS, Cargnello M, Gao XH, Krokowski D, Merrick WC, Kimball SR, Komar AA, Koromilas AE, Wynshaw-Boris A, Topisirovic I, Larsson O, Hatzoglou M. A Unique ISR Program Determines Cellular Responses to Chronic Stress. Mol Cell. 2017 Dec 7;68(5):885-900.e6. doi: 10.1016/j.molcel.2017.11.007. PMID: 29220654; PMCID: PMC5730339.
# Load the mouse signatures mouseSignatures <- get_signatures("mouse") str(mouseSignatures)# Load the mouse signatures mouseSignatures <- get_signatures("mouse") str(mouseSignatures)
The plotFeaturesMap function implements umap to visualize regulatory effects and features across genes.
plotFeaturesMap(ptn, regOnly = TRUE, comparisons = NULL, featSel, remBinary = TRUE, featCol = NULL, scaled = FALSE, centered = TRUE, remExtreme = NULL, pdfName = NULL)plotFeaturesMap(ptn, regOnly = TRUE, comparisons = NULL, featSel, remBinary = TRUE, featCol = NULL, scaled = FALSE, centered = TRUE, remExtreme = NULL, pdfName = NULL)
ptn |
A postNetData object. |
regOnly |
Logical specifying whether UMAPs should include only regulated genes (i.e., those included in |
comparisons |
If |
featSel |
A character vector specifying the features used in |
remBinary |
A logical specifying if binary features should be removed when generating UMAPs. If |
featCol |
A character vector specifying which features should be coloured on the UMAPs, visualizing the relationship between the regulatory effect and the specified features. Default is |
scaled |
A logical specifying whether the input data should be scaled prior to generating UMAPs. Default is |
centered |
A logical specifying whether the input data should be centered prior to generating UMAPs. Default is |
remExtreme |
A numeric value between |
pdfName |
Name to be appended to output PDF files. |
UMAPs are highly sensitive to input parameters. Therefore, the default settings are a suggested starting point, but may not be appropriate for all inputs. These visualizations are intended to be used as a tool for data exploration and understanding relationships between different features and regulatory effects, particularly in cases where multiple features are co-occurring in the same mRNA.
A PDF file with a UMAP visualization will be output for each feature specified with the featCol parameter. In each plot, the panel on the left is coloured according to the regulatory effect superimposed on the UMAP generated using the features specified in the featSel parameter, while the panel on the right is coloured according to the individual feature superimposed on the same UMAP. If featCol = NULL, a PDF will be output for each feature in featSel.
The plotFeaturesMap function also returns an updated postNetData object with a data.frame of UMAP coordinates stored in the featuresMap slot.
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using stepwise regression ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Plot UMAP visualizations using the features identified in stepwise regression modelling ptn <- plotFeaturesMap(ptn = ptn, regOnly = TRUE, comparisons = list(c(1,2)), featSel = c(names(ptn_selectedFeatures(ptn, analysis_type = "lm", comparison = 1)),"CDS_C"), remBinary = TRUE, scaled = TRUE, centered = TRUE, remExtreme = 0.1, pdfName = tmp)tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using stepwise regression ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Plot UMAP visualizations using the features identified in stepwise regression modelling ptn <- plotFeaturesMap(ptn = ptn, regOnly = TRUE, comparisons = list(c(1,2)), featSel = c(names(ptn_selectedFeatures(ptn, analysis_type = "lm", comparison = 1)),"CDS_C"), remBinary = TRUE, scaled = TRUE, centered = TRUE, remExtreme = 0.1, pdfName = tmp)
postNetData object
The plotSignatures function assesses changes in the regulation of known gene signatures in the data set of interest contained in a postNetData object. Outputs include statistical analyses and visualizations of signature regulation.
plotSignatures(ptn, signatureList, signature_colours = NULL, dataName, generalName, xlim = c(-2, 2), tableCex = 0.7, pdfName = NULL)plotSignatures(ptn, signatureList, signature_colours = NULL, dataName, generalName, xlim = c(-2, 2), tableCex = 0.7, pdfName = NULL)
ptn |
A postNetData object. |
signatureList |
A named list of vectors containing gene IDs for the signatures of interest to be examined. Note that several signatures of translational regulation are provided with the package for both human and mouse. These can be retrieved using the get_signatures function. |
signature_colours |
An optional character vector specifying colours for plotting each signature. Note that |
dataName |
A character string specifying a name for the data set in which the gene signature(s) will be examined. This name will be appended to the name of the PDF file generated. |
generalName |
A character string specifying a name for the signatures that will be examined. This name will be appended to the name of the PDF file generated. |
xlim |
A numeric vector specifying the x-axis limits for the eCDF plots. Default is |
tableCex |
A numeric value specifying the size of the text in the table of statistical results that appears at the top of the eCDF plots. Default is |
pdfName |
Name to be appended to output PDF files. |
Gene signatures of interest are visualized using empirical cumulative distribution functions (eCDFs) of the regulatory effect measurement in a postNetData object (often log2 fold changes). ECDFs for genes belonging to the signatures of interest are compared to those for all other genes (background). Differences in the regulatory effect measurement distributions are calculated at the quantiles. Significant directional shifts in the eCDFs for gene signature versus background are identified using a Wilcoxon rank-sum test.
No value is returned. Graphical outputs are generated as PDF files.
signaturesHeatmap
plotSignatures_ads
get_signatures
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # load signature data: signatures <- get_signatures("human") # Examine the regulation of mTOR-sensitive transcripts # in the regulatory effects stored in a postNetData object: plotSignatures(ptn = ptn, signatureList = signatures[c("Gandin_etal_2016_mTOR_transUp", "Gandin_etal_2016_mTOR_transDown")], signature_colours = c("red", "blue"), dataName = "Example_data", generalName = "mTOR_sensitive_translation", xlim = c(-3, 3), tableCex = 1, pdfName = tmp)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # load signature data: signatures <- get_signatures("human") # Examine the regulation of mTOR-sensitive transcripts # in the regulatory effects stored in a postNetData object: plotSignatures(ptn = ptn, signatureList = signatures[c("Gandin_etal_2016_mTOR_transUp", "Gandin_etal_2016_mTOR_transDown")], signature_colours = c("red", "blue"), dataName = "Example_data", generalName = "mTOR_sensitive_translation", xlim = c(-3, 3), tableCex = 1, pdfName = tmp)
Anota2seqDataSet object
The plotSignatures_ads function assesses changes in the regulation of known gene signatures of interest in an
Anota2seqDataSet object from an anota2seq analysis. Outputs include statistical analyses and visualizations of signature
regulation.
plotSignatures_ads(ads, contrast, dataName, effects_names = c("Total mRNA Log2FC", "Polysome-associated mRNA Log2FC", "Buffering Log2FC", "Translation Log2FC"), signatureList, generalName, signature_colours, xlim = c(-2, 2), scatterXY = NULL, tableCex = 1, pdfName = NULL)plotSignatures_ads(ads, contrast, dataName, effects_names = c("Total mRNA Log2FC", "Polysome-associated mRNA Log2FC", "Buffering Log2FC", "Translation Log2FC"), signatureList, generalName, signature_colours, xlim = c(-2, 2), scatterXY = NULL, tableCex = 1, pdfName = NULL)
ads |
An S4 object of class |
contrast |
An integer selecting the anota2seq contrast to retrieve regulatory effect measurements from. For more details on contrasts, please see the anota2seq vignette. |
dataName |
A string specifying the name that will be given to the selected contrast from the |
effects_names |
A character vector specifying axis names for the fold change scatterplot and ecdf plots. The default is, |
signatureList |
A named list of vectors containing gene IDs for the signatures of interest to be examined. Note that several signatures of translational regulation are provided with the package for both human and mouse. These can be retrieved using the get_signatures function. |
generalName |
A character vector specifying the names of the signatures to be plotted. |
signature_colours |
A character vector specifying colours for plotting each signature. Note that |
xlim |
A numeric vector specifying the x-axis limits for the ecdf plots. Default is |
scatterXY |
A numeric value specifying the scale of the x and y axes for the fold change scatterplot. For example, |
tableCex |
A numeric value specifying the size of the text in the table of statistical results that appears at the top of the ecdf plots. Default is |
pdfName |
Name to be appended to output PDF files. |
Gene signatures of interest are visualized in a scatterplot of log2 fold changes in polysome-associated and total mRNA. The empirical cumulative distribution functions (ECDFs) of log2 fold changes in polysome-associated mRNA, total mRNA, translation, and buffering (offsetting) are also plotted independently. Fold change ECDFs for genes belonging to the signatures of interest are compared to those for all other genes (background). Differences in the fold change distributions are calculated at the quantiles. Significant directional shifts in the ECDFs for gene signature versus background are identified using a Wilcoxon Rank Sum test.
No value is returned. Graphical outputs are generated as PDF files.
anota2seqRun
signaturesHeatmap
plotSignatures
local({ oldwd <- getwd() on.exit(setwd(oldwd), add = TRUE) setwd(tempdir()) # load example data: data("postNetExample", package = "postNet") # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # load signature data: humanSignatures <- get_signatures("human") # Examine the regulation of mTOR-sensitive transcripts in the results of the anota2seq analysis plotSignatures_ads(ads = ads, contrast = 1, dataName = "Osmosis_1h", signatureList = humanSignatures[c("Gandin_etal_2016_mTOR_transUp", "Gandin_etal_2016_mTOR_transDown")], generalName = "mTOR-sensitive translation", signature_colours = c("red", "blue"), xlim = c(-3, 3), scatterXY = 4, tableCex = 0.65, pdfName = "mTORsignatures") })local({ oldwd <- getwd() on.exit(setwd(oldwd), add = TRUE) setwd(tempdir()) # load example data: data("postNetExample", package = "postNet") # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # load signature data: humanSignatures <- get_signatures("human") # Examine the regulation of mTOR-sensitive transcripts in the results of the anota2seq analysis plotSignatures_ads(ads = ads, contrast = 1, dataName = "Osmosis_1h", signatureList = humanSignatures[c("Gandin_etal_2016_mTOR_transUp", "Gandin_etal_2016_mTOR_transDown")], generalName = "mTOR-sensitive translation", signature_colours = c("red", "blue"), xlim = c(-3, 3), scatterXY = 4, tableCex = 0.65, pdfName = "mTORsignatures") })
postNetData: Core data container for postNet analyses
The postNetData class stores the core data structure used throughout the postNet analysis. It contains input gene lists, sequence annotations and source information, and results from codon, motif, GSEA, GAGE, GO term, and feature integration analyses. This object is created by the postNetStart function and serves as the input/output container for most downstream functions.
Objects of this class are typically created using the constructor function postNetStart. Alternatively, they can be created manually using new("postNetData", ...).
species:Object of class "characterOrNULL" specifying the species associated with the sequence annotations and input gene lists, e.g. "human", "mouse". Used to load species-specific reference sequences.
version:Object of class "characterOrNULL" storing the version identifier for the RefSeq sequence annotations loaded.
selection:Object of class "character" indicating the method used to select representative transcript isoforms, e.g. "random", "shortest", or "longest".
seed:Object of class "ANY" storing the integer used to set the seed for random sampling if selection = "random"
annot:Object of class "postNetAnnot" storing the annotated sequences for each mRNA transcript region, "UTR5", "CDS", "CCDS", and "UTR3".
dataIn:Object of class "postNetDataIn" containing the user-defined input gene lists of interest, background gene set, regulatory effect measurement, and selected colours for plotting.
features:Object of class "dataframeOrNULL" with a data.frame containing the features for each gene (e.g., UTR lengths, GC content, codon or motif counts) that were used as input for stepwise regression and/or random forest models during featureIntegration analysis.
analysis:Object of class "postNetAnalysis" storing the results from analysis functions including codonUsage, motifAnalysis, gseaAnalysis, gageAnalysis, goAnalysis, miRNAanalysis and featureIntegration. Structured into nested S4 objects.
The following S4 accessor and utility methods are available for postNetData objects:
signature(x = "postNetData"): Retrieve the background gene set.
signature(x = "postNetData"): Access results from the codon or amino acid usage analysis.
signature(x = "postNetData"): Retrieve the selected significantly enriched/depleted codons or amino acids.
signature(x = "postNetData"): Return the colours that will be used for visualizations.
signature(x = "postNetData"): Access the user-defined background gene set, gene lists of interest, regulatory effect measurements, and colours for visualizations.
signature(x = "postNetData"): Retrieve the user-supplied regulatory effect measurement (e.g., Log2 Fold Changes).
signature(x = "postNetData"): Extract the features data.frame used as input for stepwise regression and random forest analyses.
signature(x = "postNetData"): Return a character vector of gene IDs for the genes included in the gene sets of interest and background gene sets (all genes used in the analysis).
signature(x = "postNetData"): Access the list of user-supplied gene sets of interest used in analyses.
signature(x = "postNetData"): Return a character vector of transcript IDs for the transcripts included in the gene sets of interest and background gene sets (all transcripts used in the analysis).
signature(x = "postNetData"): Access the results and additional information on the de novo identified motifs for the specified gene list in universalmotif format.
signature(x = "postNetData"): Return a character vector of de novo identified motifs in the gene sets of interest that have enrichment p-values below a user-specified threshold.
signature(x = "postNetData"): Retrieve the network structure in igraph format generated during linear stepwise regression modelling with the featureIntegration function.
signature(x = "postNetData"): Return the user-defined transcript isoform selection strategy (e.g., random, shortest, longest).
signature(x = "postNetData"): Retrieve a character vector containing the reference annotation sequences corresponding to the specified transcript region.
signature(x = "postNetData"): Return the species associated with the analysis.
signature(x = "postNetData"): Return the release version of the RefSeq annotations used to generate the reference sequences.
signature(x = "postNetData"): Access the results of GAGE analysis.
signature(x = "postNetData"): Access the results of GSEA analysis.
signature(x = "postNetData"): Access the results of GO term analysis.
signature(x = "postNetData"): Access the results of GAGE-based miRNA enrichment analysis.
signature(x = "postNetData"): Return a list of genes that are predicted to be targeted by the miRNA that passed the filtering thresholds in miRNA analysis.
postNetStart
motifAnalysis
codonUsage
miRNAanalysis
gseaAnalysis
gageAnalysis
goAnalysis
featureIntegration
plotFeaturesMap
ptn_background
ptn_check_models
ptn_codonAnalysis
ptn_codonSelection
ptn_colours
ptn_dataIn
ptn_effect
ptn_features
ptn_GAGE
ptn_geneID
ptn_geneList
ptn_GO
ptn_GSEA
ptn_id
ptn_miRNA_analysis
ptn_miRNA_to_gene
ptn_model
ptn_motifgeneList
ptn_motifSelection
ptn_networkGraph
ptn_selectedFeatures
ptn_selection
ptn_sequences
ptn_species
ptn_version
showClass("postNetData")showClass("postNetData")
This object contains example data for testing postNet functions, including lists of post-transcriptionally regulated genes, background gene sets, and log_2 fold changes in translation efficiency between two treatment conditions.
data(postNetExample, package='postNet')data(postNetExample, package='postNet')
The format is: List of 6 $ geneList :List of 2 ..$ translationUp : chr [1:10] "OCIAD1" "MSH2" "SCFD1" "INO80C" ... ..$ translationDown: chr [1:10] "XKR8" "ANPEP" "AGPAT2" "UBN2" ... $ background: chr [1:100] "SMARCD2" "ZNF239" "MTIF3" "ADGRE2" ... $ effect : Named num [1:100] -0.849 -2.506 -2.615 1.676 0.315 ... ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... $ ptn :Formal class 'postNetData' [package "postNet"] with 8 slots .. ..@ species : chr "human" .. ..@ version : chr "ver_40.202408" .. ..@ selection: chr "random" .. ..@ seed : num 123 .. ..@ annot :Formal class 'postNetAnnot' [package "postNet"] with 4 slots .. .. .. ..@ UTR5:Formal class 'postNetRegion' [package "postNet"] with 3 slots .. .. .. .. .. ..@ id : chr [1:100] "NM_152916" "NM_001135189" "NM_001012727" "NM_020133" ... .. .. .. .. .. ..@ geneID : chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... .. .. .. .. .. ..@ sequences: chr [1:100] "ACAGCTGCCCAGCCTGCGGAGACGGGACAGCCCTGTCCCACTCACTCTTTCCCCTGCTGCTCCTGCCGGCAGCTCAGCTGGAACC" "CCCCTCAGGAGAAGTCGGGAAGGTGGCGGCGGCGGCGGCGGTTGTCCCGGCTGTGCCGGTTGGTGTGGCCCGTCAGCCCGCGTACCACAGCGCCCGGGCCGCGTCGAGCCC"| __truncated__ "AGCCCCGCCGCCCTCGCAATAAGGGGCCTGAGCGCGCGGGGGAGAAGCGGGAGCGGGAGCGGGAGCGAGCTGGCGGCGCCGTCGGGCGCCGGGCCGGGCC" "GCTTTTTCTTTCCAGTGTTGGCTGACTTACAGCTCTTATAAACTAGTGGCAATTTCTGAACCCAGCCGGCTCCATCTCAGCTTCTGGTTTCTAAGTCCATGTGCCAAAGGC"| __truncated__ ... .. .. .. ..@ CDS :Formal class 'postNetRegion' [package "postNet"] with 3 slots .. .. .. .. .. ..@ id : chr [1:100] "NM_152916" "NM_001135189" "NM_001012727" "NM_020133" ... .. .. .. .. .. ..@ geneID : chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... .. .. .. .. .. ..@ sequences: chr [1:100] "ATGGGAGGCCGCGTCTTTCTCGTCTTTCTCGCATTCTGTGTCTGGCTGACTCTGCCGGGAGCTGAAACCCAGGACTCCAGGGGCTGTGCCCGGTGGTGCCCTCAGGACTCC"| __truncated__ "ATGGCGGCCAGCGCGAAGCGGAAGCAGGAGGAGAAGCACCTGAAGATGCTGCGGGACATGACCGGCCTCCCGCACAACCGAAAGTGCTTCGACTGCGACCAGCGCGGCCCC"| __truncated__ "ATGGAGCTGTGGCCGTGTCTGGCCGCGGCGCTGCTGTTGCTGCTGCTGCTGGTGCAGCTGAGCCGCGCGGCCGAGTTCTACGCCAAGGTCGCCCTGTACTGCGCGCTGTGC"| __truncated__ "ATGGACCTCGCGGGACTGCTGAAGTCTCAGTTCCTGTGCCACCTGGTCTTCTGCTACGTCTTTATTGCCTCAGGGCTAATCATCAACACCATTCAGCTCTTCACTCTCCTC"| __truncated__ ... .. .. .. ..@ UTR3:Formal class 'postNetRegion' [package "postNet"] with 3 slots .. .. .. .. .. ..@ id : chr [1:100] "NM_152916" "NM_001135189" "NM_001012727" "NM_020133" ... .. .. .. .. .. ..@ geneID : chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... .. .. .. .. .. ..@ sequences: chr [1:100] "AAAAATCTTCTGAATAAGATCTTCCCTCTTTGCCCGTGGAAAATCTGAACAATCTTTGAGCCATCTAGAGGGGAAAGAAAAGACTTTGTTCTGTGTGTTTCAAGAAATTCA"| __truncated__ "CCTTATATAGACAATTTACTGGAACGAACTTTTATGTGGTCACATTACATCTCTCCACCTCTTGCACTGTTGTCTTGTTTCACTGATCTTAGCTTTAAACACAAGAGAAGT"| __truncated__ "CCCAGACCACGGCAGGGCATGACCTGGGGAGGGCAGGTGGAAGCCGATGGCTGGAGGATGGGCAGAGGGGACTCCTCCCGGCTTCCAAATACCACTCTGTCCGGCTCCCCC"| __truncated__ "CTCAGGGAGGTGTCACCATCCGAAGGGAACCTTGGGGAACTGGTGGCCTCTGCATATCCTCCTTAGTGGGACACGGTGACAAAGGCTGGGTGAGCCCCTGCTGGGCACGGC"| __truncated__ ... .. .. .. ..@ CCDS: NULL .. ..@ dataIn :Formal class 'postNetDataIn' [package "postNet"] with 4 slots .. .. .. ..@ background: chr [1:100] "SMARCD2" "ZNF239" "MTIF3" "ADGRE2" ... .. .. .. ..@ geneList :List of 2 .. .. .. .. ..$ translationUp : chr [1:10] "OCIAD1" "MSH2" "SCFD1" "INO80C" ... .. .. .. .. ..$ translationDown: chr [1:10] "XKR8" "ANPEP" "AGPAT2" "UBN2" ... .. .. .. ..@ effect : Named num [1:100] -0.849 -2.506 -2.615 1.676 0.315 ... .. .. .. .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... .. .. .. ..@ colours : chr [1:2] "#7FB7BE" "#DB7F67" .. ..@ features : NULL .. ..@ analysis :Formal class 'postNetAnalysis' [package "postNet"] with 7 slots .. .. .. ..@ featureIntegration: NULL .. .. .. ..@ motifs : NULL .. .. .. ..@ codons : NULL .. .. .. ..@ GO : NULL .. .. .. ..@ GSEA : NULL .. .. .. ..@ GAGE : NULL .. .. .. ..@ miRNA : NULL $ ads_data :List of 3 ..$ phenoVec: chr [1:8] "Control" "Control" "Control" "Control" ... ..$ dataP : num [1:100, 1:8] 3.3 6.54 4.69 3.62 5.78 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... .. .. ..$ : chr [1:8] "riboSeq_Control_rep1" "riboSeq_Control_rep2" "riboSeq_Control_rep3" "riboSeq_Control_rep4" ... ..$ dataT : num [1:100, 1:8] 3.53 6.78 5.64 3.05 4.85 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... .. .. ..$ : chr [1:8] "rnaSeq_Control_rep1" "rnaSeq_Control_rep2" "rnaSeq_Control_rep3" "rnaSeq_Control_rep4" ... $ features :List of 17 ..$ UTR5_length : Named num [1:100] 6.41 8.06 6.64 7.73 5.25 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ CDS_length : Named num [1:100] 11.18 10.62 9.53 10.15 10.61 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR3_length : Named num [1:100] 11.91 12.71 9.25 12.68 4.17 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR5_G : Named num [1:100] 24.7 42.1 49 21.6 44.7 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR5_C : Named num [1:100] 42.4 39.8 33 24.9 18.4 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR5_A : Named num [1:100] 15.29 8.27 13 20.19 18.42 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR5_T : Named num [1:100] 17.65 9.77 5 33.33 18.42 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ CDS_G : Named num [1:100] 25.6 21.7 31.2 27.1 19.2 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ CDS_C : Named num [1:100] 30.5 26.1 33.6 27.2 23.1 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ CDS_A : Named num [1:100] 21.8 27.3 15.5 22.2 23.6 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ CDS_T : Named num [1:100] 22.1 24.9 19.7 23.6 34.1 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR3_G : Named num [1:100] 20.7 17.5 34 23.9 22.2 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR3_C : Named num [1:100] 22.3 15.7 32.5 22.7 11.1 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR3_A : Named num [1:100] 28.7 30.8 17.7 27.1 44.4 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR3_T : Named num [1:100] 28.3 36 15.8 26.2 22.2 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ uORFs_ATG_strong: Named num [1:100] 0 0 0 0 0 0 0 0 0 1 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ... ..$ UTR5_SGCSGCS : Named num [1:100] 0 6 2 0 0 2 0 0 0 1 ... .. ..- attr(*, "names")= chr [1:100] "ADGRE2" "AGFG1" "AGPAT2" "AGPAT4" ...
The example data provided is a small subset of the data taken from a study of the effect of osmotic stress on mRNA translation. Immortalized human corneal epithelial cells (10.014O pRSV-T) were cultured under osmotic stress (500 mOsm, NaCl) for 1h, along with controls.The provided gene lists are detrived from an anota2seq analysis of ribosome profiling and RNA-seq data to identify genes that were translationally activated or suppressed under osmotic stress.
A set of representative mRNA features was also enumerated in these translationally regulated genes, along with signatures of genes for which translation is know to be regulated downstream of mTOR and the integrated stress response (ISR). This set of features is used to illustrate the feature integration modelling and UMAP visualizations.
Krokowski D, Jobava R, Szkop KJ, Chen CW, Fu X, Venus S, Guan BJ, Wu J, Gao Z, Banaszuk W, Tchorzewski M, Mu T, Ropelewski P, Merrick WC, Mao Y, Sevval AI, Miranda H, Qian SB, Manifava M, Ktistakis NT, Vourekas A, Jankowsky E, Topisirovic I, Larsson O, Hatzoglou M. Stress-induced perturbations in intracellular amino acids reprogram mRNA translation in osmoadaptation independently of the ISR. Cell Rep. 2022 Jul 19;40(3):111092. doi: 10.1016/j.celrep.2022.111092. PMID: 35858571; PMCID: PMC9491157.
Data available at GEO: GSE200097.
postNetExample <- data(postNetExample, package='postNet') str(postNetExample)postNetExample <- data(postNetExample, package='postNet') str(postNetExample)
The postNetStart function is the first step in a postNet analysis workflow. It retrieves and compiles all necessary inputs and annotations into a postNetData S4 object, including:
Reference sequence annotations (5'UTR, CDS, 3'UTR), including in-built, custom, or retrieved directly from the NCBI RefSeq database. By default, 'source = "load"' meaning 'postNetStart()' will load one of the in-built reference sequence annotations provided with the package when it is first needed. The data are downloaded from the postNet GitHub releases and stored in a user-specific cache directory managed by BiocFileCache. Once cached, the reference data are reused in subsequent sessions and do not require re-downloading. This option is available when 'species = "human"' or '"mouse"'. In-built annotations are based on NCBI RefSeq GRCh38 (human) and GRCm39 (mouse) genome assemblies and corresponding transcript annotations.The following reference annotation versions are currently available:
Human (RefSeq GRCh38): ver_40.202408 Mouse (RefSeq GRCm39): ver_27.202402
Optionally, UTR sequences can also be adjusted if more precise sequences are available (e.g. from CAGE, QuantSeq, etc.)
Gene sets of interest (either from an anota2seq analysis by providing the Anota2seqDataSet
object, or using custom gene lists of interest).
A numeric regulation effect measurement (e.g., fold changes in gene expression or translation efficiency, etc.) either from an anota2seq analysis, or from a custom source. These values will be used in the featureIntegration analysis, where mRNA features, gene signatures, and/or other variables can be used to explain changes in the regulation effect measurement using forward stepwise regression and/or random forest approaches.
The resulting postNetData object is then passed to downstream functions in the postNet
workflow, and stores analysis results.
postNetStart( ## ---------------------------- ## 1. Input Data (Anota2seq or Custom) ## ---------------------------- ads = NULL, regulation = NULL, contrast = NULL, regulationGen = NULL, contrastSel = NULL, geneList = NULL, geneListcolours = NULL, customBg = NULL, effectMeasure = NULL, ## ---------------------------- ## 2. Reference Sequence Annotations ## ---------------------------- source, species = NULL, customFile = NULL, fastaFile = NULL, posFile = NULL, rna_gbff_file = NULL, rna_fa_file = NULL, genomic_gff_file = NULL, selection = "random", setSeed = NULL, ## ---------------------------- ## 3. Adjusting Sequences (Custom UTRs) ## ---------------------------- adjObj = NULL, region_adj = NULL, excl = FALSE, keepAll = FALSE )postNetStart( ## ---------------------------- ## 1. Input Data (Anota2seq or Custom) ## ---------------------------- ads = NULL, regulation = NULL, contrast = NULL, regulationGen = NULL, contrastSel = NULL, geneList = NULL, geneListcolours = NULL, customBg = NULL, effectMeasure = NULL, ## ---------------------------- ## 2. Reference Sequence Annotations ## ---------------------------- source, species = NULL, customFile = NULL, fastaFile = NULL, posFile = NULL, rna_gbff_file = NULL, rna_fa_file = NULL, genomic_gff_file = NULL, selection = "random", setSeed = NULL, ## ---------------------------- ## 3. Adjusting Sequences (Custom UTRs) ## ---------------------------- adjObj = NULL, region_adj = NULL, excl = FALSE, keepAll = FALSE )
ads |
An optional S4 object of class Anota2seqDataSet-class (resulting from the |
regulation |
If using |
contrast |
If using |
regulationGen |
If using |
contrastSel |
If using |
geneList |
If anota2seq was not used, instead of providing |
geneListcolours |
If using |
customBg |
If using |
effectMeasure |
If using |
selection |
Specifies which mRNA isoform will be selected from the reference sequence annotation for use in analyses in cases when multiple isoforms are present for a given gene. This parameter can be one of:
|
setSeed |
If |
source |
Select the source of the reference sequence annotations. Note that the gene/transcript annotations used in the
|
species |
Select the species of interest. Currently, |
customFile |
If |
fastaFile |
If |
posFile |
If |
rna_gbff_file |
If |
rna_fa_file |
If |
genomic_gff_file |
If |
adjObj |
A named list of custom UTR sequences that will replace the existing UTR sequences in the annotation file. List elements must be named |
region_adj |
Character vector specifying which UTR regions in your annotation should be adjusted by the provided
|
excl |
When providing custom UTR sequences with |
keepAll |
When using custom UTR sequences with |
By default, postNetStart will load or create reference sequence
annotations from RefSeq for species, assemble or load gene lists of
interest, define a suitable background gene set, and store a numeric regulation
effect measurement (e.g., fold change) for downstream modeling.
Reproducibility Note: If selection = "random", repeated calls
without setSeed can produce different isoform selections. To ensure
reproducible results between different analyses of the same dataset, set setSeed
to a fixed integer.
When adjusting annotations via adjObj, ensure that transcript IDs
match those in your reference annotation. You can choose to exclude or keep
genes that lack updated sequences via excl and keepAll.
The outputs of each step of postNet analysis will also be stored in the postNetData object, and can be retrieved through the use of a series of helper functions (see Note and examples below).
See the postNet vignette for a more extensive tutorial and examples.
An S4 object of class postNetData containing:
Reference Sequence Annotations: 5'UTR, CDS, and 3'UTR sequences for each
gene and/or transcript isoform, with the possibility to customize UTRs via adjObj.
Gene Lists: One or more sets of gene IDs of interest, as well as a set of
gene IDs to be used as background for statistical comparisons (from ads, or
geneList and customBg).
Regulation Effect Measurement: A numeric vector of an effect of interest, (e.g., translation efficiency fold changes) that will be modelled using mRNA features and/or other signatures of interest.
Metadata: The chosen species, sequence annotation version, and
isoform selection method.
Analysis Results: The results of subsequent steps of the
postNet workflow, such as featureIntegration, are stored in the
postNetData object.
The helper functions available to retrieve inputs and results from the postNetData object are:
https://www.ncbi.nlm.nih.gov/refseq/ for RefSeq documentation. https://bioconductor.org/packages/release/bioc/vignettes/anota2seq/inst/doc/anota2seq.pdf for anota2seq details.
anota2seqRun for obtaining the anota2seq ads object. featureIntegration for requirements of postNet feature integration analysis.
tmp <- tempfile(fileext = ".pdf") # ----------------------------------------------------------------------- # Example 1: Using custom gene lists and background # ----------------------------------------------------------------------- # load example data: data("postNetExample", package = "postNet") # Genes of interest should be provided in a named list. myGenes <- postNetExample$geneList str(myGenes) # All gene IDs in the list should be present in the background. myBg <- postNetExample$background str(myBg) # The regulation effect measurement must be named with the same gene IDs # present in the background (or the gene list if no custom background is provided). myEffect <- postNetExample$effect str(myEffect) # Initialize the postNetData object with custom gene lists: ptn <- postNetStart( geneList = myGenes, geneListcolours = c("#7FB7BE","#DB7F67"), customBg = myBg, effectMeasure = myEffect, selection = "random", setSeed = 123, source = "load", species = "human" ) ## Not run: # ----------------------------------------------------------------------- # Example 2: Using an anota2seq object with built-in RefSeq annotations # ----------------------------------------------------------------------- # Instead of using gene sets of interest, the results of an anota2seq analysis # can also be provided. # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # This postNet analysis will allow identification of mRNA features, # and comparison between translationally activated and suppressed genes. # Feature integration analysis will model changes in translation # efficiency. # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, # ensures reproducibility of random isoform selection source = "load", species = "human" ) # ----------------------------------------------------------------------- # Example 3: Creating annotation from local RefSeq files # ----------------------------------------------------------------------- # Example RefSeq annotation files for mouse downloaded from the NCBI database: # - "myMouse_rna.gbff.gz" # - "myMouse_rna.fa.gz" # - "myMouse_genomic.gff.gz" # Initialize the postNetData object with custom gene lists and RefSeq annotations, # and considering the longest mRNA isoform for genes with multiple isoforms: ptn <- postNetStart( geneList = myGenes, geneListcolours = c("#F2A104","#00743F"), customBg = myBg, effectMeasure = myEffect, selection = "longest", source = "createFromSourceFiles", species = "mouse", rna_gbff_file = "myMouse_rna.gbff.gz", rna_fa_file = "myMouse_rna.fa.gz", genomic_gff_file = "myMouse_genomic.gff.gz" ) ## End(Not run) # ----------------------------------------------------------------------- # Example 4: Retrieving data from the postNetData object # ----------------------------------------------------------------------- # Inputs and results stored in the slots of the postNetData object can be # easily retrieved with the use of ptn helper functions. # To extract reference annotation sequences for 5'UTRs from a postNetData object ptn: myUTR5seqs <- ptn_sequences(ptn,region = "UTR5") str(myUTR5seqs) # To extract the RefSeq annotation release version # from postNetData object ptn (if using source = "load"): myVersion <- ptn_version(ptn) myVersiontmp <- tempfile(fileext = ".pdf") # ----------------------------------------------------------------------- # Example 1: Using custom gene lists and background # ----------------------------------------------------------------------- # load example data: data("postNetExample", package = "postNet") # Genes of interest should be provided in a named list. myGenes <- postNetExample$geneList str(myGenes) # All gene IDs in the list should be present in the background. myBg <- postNetExample$background str(myBg) # The regulation effect measurement must be named with the same gene IDs # present in the background (or the gene list if no custom background is provided). myEffect <- postNetExample$effect str(myEffect) # Initialize the postNetData object with custom gene lists: ptn <- postNetStart( geneList = myGenes, geneListcolours = c("#7FB7BE","#DB7F67"), customBg = myBg, effectMeasure = myEffect, selection = "random", setSeed = 123, source = "load", species = "human" ) ## Not run: # ----------------------------------------------------------------------- # Example 2: Using an anota2seq object with built-in RefSeq annotations # ----------------------------------------------------------------------- # Instead of using gene sets of interest, the results of an anota2seq analysis # can also be provided. # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # This postNet analysis will allow identification of mRNA features, # and comparison between translationally activated and suppressed genes. # Feature integration analysis will model changes in translation # efficiency. # Initialize the postNetData object, where "ads" is an Anota2seqDataSet object # resulting from an anota2seqRun call: ptn <- postNetStart( ads = ads, regulation = c("translationUp","translationDown"), contrast = c(1,1), regulationGen = "translation", contrastSel = 1, selection = "random", setSeed = 123, # ensures reproducibility of random isoform selection source = "load", species = "human" ) # ----------------------------------------------------------------------- # Example 3: Creating annotation from local RefSeq files # ----------------------------------------------------------------------- # Example RefSeq annotation files for mouse downloaded from the NCBI database: # - "myMouse_rna.gbff.gz" # - "myMouse_rna.fa.gz" # - "myMouse_genomic.gff.gz" # Initialize the postNetData object with custom gene lists and RefSeq annotations, # and considering the longest mRNA isoform for genes with multiple isoforms: ptn <- postNetStart( geneList = myGenes, geneListcolours = c("#F2A104","#00743F"), customBg = myBg, effectMeasure = myEffect, selection = "longest", source = "createFromSourceFiles", species = "mouse", rna_gbff_file = "myMouse_rna.gbff.gz", rna_fa_file = "myMouse_rna.fa.gz", genomic_gff_file = "myMouse_genomic.gff.gz" ) ## End(Not run) # ----------------------------------------------------------------------- # Example 4: Retrieving data from the postNetData object # ----------------------------------------------------------------------- # Inputs and results stored in the slots of the postNetData object can be # easily retrieved with the use of ptn helper functions. # To extract reference annotation sequences for 5'UTRs from a postNetData object ptn: myUTR5seqs <- ptn_sequences(ptn,region = "UTR5") str(myUTR5seqs) # To extract the RefSeq annotation release version # from postNetData object ptn (if using source = "load"): myVersion <- ptn_version(ptn) myVersion
This object contains example data to run a postNet analysis, including lists of post-transcriptionally regulated and background genes, and log2 fold changes in translation efficiency between two treatment conditions.
data(postNetVignette, package='postNet')data(postNetVignette, package='postNet')
The format is: List of 5 $ geneList :List of 2 ..$ translationUp : chr [1:1445] "DDX6" "CLK1" "DUSP1" "TXNIP" ... ..$ translationDown: chr [1:1378] "PPDPF" "NCKAP1" "AP2M1" "ABHD12" ... $ background: chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... $ effect : Named num [1:9555] -2.095 0.275 -0.57 -0.608 0.576 ... ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... $ features :List of 18 ..$ UTR5_length : Named num [1:9555] 8.34 7.18 7.23 7.71 4.91 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR3_length : Named num [1:9555] 9.59 4.86 11.96 9.46 11.07 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR5_C : Named num [1:9555] 28.1 32.4 54.7 30.1 36.7 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR5_A : Named num [1:9555] 18.5 14.5 6 21.1 16.7 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR5_T : Named num [1:9555] 19.4 22.1 14 14.8 10 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR3_G : Named num [1:9555] 36.8 10.3 26.6 16.6 23.4 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR3_C : Named num [1:9555] 26.3 13.8 25.9 17 17.9 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR3_A : Named num [1:9555] 20.4 27.6 20 34.1 28.1 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR3_T : Named num [1:9555] 16.5 48.3 27.4 32.3 30.5 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Up : Named num [1:9555] 0.00567 0.05068 0.04141 0.11059 0.13968 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Down : Named num [1:9555] 0.2125 0.1326 0.1871 0.0541 0.0762 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ uORFs_ATG_strong : Named num [1:9555] 2 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ UTR5_SCSCGS : Named num [1:9555] 0 1 5 1 0 3 0 0 0 1 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Gandin_etal_2016_mTOR_transUp : Named num [1:9555] 1 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Gandin_etal_2016_mTOR_transDown: Named num [1:9555] 0 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Cockman_etal_2020_classicTOP : Named num [1:9555] 0 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Guan_etal_2017_Tg1_transUp : Named num [1:9555] 0 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... ..$ Guan_etal_2017_Tg1_transDown : Named num [1:9555] 0 0 0 0 0 0 0 0 0 0 ... .. ..- attr(*, "names")= chr [1:9555] "A4GALT" "AAAS" "AACS" "AADAT" ... $ ads_data :List of 3 ..$ phenoVec: chr [1:8] "Control" "Control" "Control" "Control" ... ..$ dataP : num [1:100, 1:8] 6.15 7.85 8.62 6.15 5.8 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:100] "ACOX1" "ACTR2" "AGRN" "APOBEC3C" ... .. .. ..$ : chr [1:8] "riboSeq_Control_rep1" "riboSeq_Control_rep2" "riboSeq_Control_rep3" "riboSeq_Control_rep4" ... ..$ dataT : num [1:100, 1:8] 5.77 8.23 9.39 5.34 6.4 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:100] "ACOX1" "ACTR2" "AGRN" "APOBEC3C" ... .. .. ..$ : chr [1:8] "rnaSeq_Control_rep1" "rnaSeq_Control_rep2" "rnaSeq_Control_rep3" "rnaSeq_Control_rep4" ...
The example data provided with the postNet package is taken from a study of the effect of osmotic stress on mRNA translation. Immortalized human corneal epithelial cells (10.014O pRSV-T) were cultured under osmotic stress (500 mOsm, NaCl) for 1h, along with controls.The provided gene lists are detrived from an anota2seq analysis of ribosome profiling and RNA-seq data to identify genes that were translationally activated or suppressed under osmotic stress.
A set of representative mRNA features was also enumerated in these translationally regulated genes, along with signatures of genes for which translation is know to be regulated downstream of mTOR and the integrated stress response (ISR). This set of features is used to illustrate the feature integration modelling and UMAP visualizations.
Krokowski D, Jobava R, Szkop KJ, Chen CW, Fu X, Venus S, Guan BJ, Wu J, Gao Z, Banaszuk W, Tchorzewski M, Mu T, Ropelewski P, Merrick WC, Mao Y, Sevval AI, Miranda H, Qian SB, Manifava M, Ktistakis NT, Vourekas A, Jankowsky E, Topisirovic I, Larsson O, Hatzoglou M. Stress-induced perturbations in intracellular amino acids reprogram mRNA translation in osmoadaptation independently of the ISR. Cell Rep. 2022 Jul 19;40(3):111092. doi: 10.1016/j.celrep.2022.111092. PMID: 35858571; PMCID: PMC9491157.
Data available at GEO: GSE200097.
postNetVignette <- data(postNetVignette, package='postNet') str(postNetVignette)postNetVignette <- data(postNetVignette, package='postNet') str(postNetVignette)
postNetData object
The background slot holds a character vector of gene IDs used as the background in statistical comparisons performed in postNet analyses.
## S4 method for signature 'postNetData' ptn_background(ptn)## S4 method for signature 'postNetData' ptn_background(ptn)
ptn |
A postNetData object. |
A character vector containing the gene IDs for the set of genes used as background.
postNetStart for details on background gene sets.
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the set of background genes used in postNet analyses myBg <- ptn_background(ptn) str(myBg)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the set of background genes used in postNet analyses myBg <- ptn_background(ptn) str(myBg)
postNetData object
Retrieve the comparisons used for the models resulting from both the stepwise linear regression and random forest implementations of featureIntegration from a postNetData object.
ptn_check_models(ptn, analysis_type)ptn_check_models(ptn, analysis_type)
ptn |
A postNetData object. |
analysis_type |
A string specifying the method of analysis used for featureIntegration to retrieve the selected features from. The options are |
Returns a character string specifying the comparisons that were used in modelling. For example "translationUp_c1_translationDown_c1" indicates the gene sets that were compared, and which anota2seq contrast they were selected from (e.g. c1 = contrast 1).
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using stepwise regression ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve the contrasts that were used in modelling ptn_check_models(ptn, analysis_type = "lm")tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using stepwise regression ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve the contrasts that were used in modelling ptn_check_models(ptn, analysis_type = "lm")
postNetData object
Retrieve the "codonAnalysis" slot from a postNetData object, which holds the results of the codon usage analysis produced using the codonUsage function.
## S4 method for signature 'postNetData' ptn_codonAnalysis(ptn)## S4 method for signature 'postNetData' ptn_codonAnalysis(ptn)
ptn |
A postNetData object. |
The value returned by the ptn_codonAnalysis function is a data.frame with the following columns: geneID, codon, AA, count, frequency, AACountPerGene, and relative_frequency. Here, frequency corresponds to the number of codons relative to all codons in the gene, and relative_frequency corresponds to the number of codons relative to all synonymous codons in the gene.
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis of single codons ptn <- codonUsage(ptn = ptn, annotType = 'ptnCDS', sourceSeq = "load", analysis = 'codon', codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.4, thresFreqUp = 0.4, thresOddsDown = 0.4, thresFreqDown = 0.4, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Get the results of the codon usage analysis codonResults <- ptn_codonAnalysis(ptn) str(codonResults)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis of single codons ptn <- codonUsage(ptn = ptn, annotType = 'ptnCDS', sourceSeq = "load", analysis = 'codon', codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.4, thresFreqUp = 0.4, thresOddsDown = 0.4, thresFreqDown = 0.4, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Get the results of the codon usage analysis codonResults <- ptn_codonAnalysis(ptn) str(codonResults)
postNetData object
Retrieve the codons or amino acids that were selected as significantly enriched or depleted in the specified comparison from a postNetData object.
## S4 method for signature 'postNetData' ptn_codonSelection(ptn, comparison)## S4 method for signature 'postNetData' ptn_codonSelection(ptn, comparison)
ptn |
A postNetData object. |
comparison |
An integer specifying which comparison to select results from. For example, |
The value returned is a named list of the selected codons or AAs that were significantly enriched or depleted (according to the thresholds selected in the codonUsage function) in the specified comparisons. This list can be used as input for the codonCalc function.
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis of single codons ptn <- codonUsage(ptn = ptn, annotType = 'ptnCDS', sourceSeq = "load", analysis = 'codon', codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.4, thresFreqUp = 0.4, thresOddsDown = 0.4, thresFreqDown = 0.4, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Select codons of interest that were significantly enriched or depleted # with high frequency, and the highest and lowest odds ratios codons <- ptn_codonSelection(ptn, comparison = 1)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run codon usage analysis of single codons ptn <- codonUsage(ptn = ptn, annotType = 'ptnCDS', sourceSeq = "load", analysis = 'codon', codonN = 1, pAdj = 0.01, rem5 = TRUE, plotHeatmap = FALSE, thresOddsUp = 0.4, thresFreqUp = 0.4, thresOddsDown = 0.4, thresFreqDown = 0.4, subregion = NULL, subregionSel = NULL, comparisons = list(c(1,2)), plotType_index = "violin", pdfName = tmp) # Select codons of interest that were significantly enriched or depleted # with high frequency, and the highest and lowest odds ratios codons <- ptn_codonSelection(ptn, comparison = 1)
postNetData object
Retrieve the "colours" slot of a postNetData object, which holds a character vector with the user-provided colours used for plotting. The vector must be the same length as the number of elements in geneList (e.g., one colour for each gene set of interest).
## S4 method for signature 'postNetData' ptn_colours(ptn)## S4 method for signature 'postNetData' ptn_colours(ptn)
ptn |
A postNetData object. |
A character vector specifying the colours to be used for plotting.
postNetStart for details on selecting colours. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access version of the RefSeq annotation used to create the reference sequences myCols <- ptn_colours(ptn) str(myCols)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access version of the RefSeq annotation used to create the reference sequences myCols <- ptn_colours(ptn) str(myCols)
postNetData object
Retrieve the "background", "geneList", "effect", and "colours" slots from a postNetData object. If ads has been provided, these are retrieved from the Anota2seqDataSet based on the regulation, contrast, regulationGen, and contrastSel arguments of the postNetStart function. If not using ads, these inputs are compiled from the geneList, geneListcolours, effectMeasure, and optionally the customBg arguments.
## S4 method for signature 'postNetData' ptn_dataIn(ptn)## S4 method for signature 'postNetData' ptn_dataIn(ptn)
ptn |
A postNetData object. |
Returns an S4 object of class postNetDataIn with the following slots:
background: a character vector of gene/transcript IDs corresponding to the background used for
statistical comparisons against the gene sets of interest in geneList. This background is either taken
automatically from the Anota2seqDataSet if using ads, or if not using ads it is created from
a user-provided geneList or provided directly with customBg.
geneList: a list of one or more character vectors corresponding to the gene sets of interest to be
used in the postNet analysis. If using ads, gene sets corresponding to anota2seq regulatory modes
are selected using the regulation argument. If not using ads, the list of gene sets is provided using
geneList.
effect: a numeric vector corresponding to the regulation effect measurement. The elements of the vector
are named with the gene/transcript IDs present in either ads, geneList, or customBg. If using
ads, the regulatory effect measurement is selected by regulationGen. If using geneList, it is
provided with effectMeasure.
colours: a character vector of the same length as geneList specifying the colours to be
used for plotting. If using geneList, colours are user-provided. If using ads, the default colours for each
regulatory mode from anota2seq will be used automatically.
postNetStart for more information on required inputs for postNet analysis. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the input data (background, geneList, effect, colours) stored in the postNetData object: myDataIn <- ptn_dataIn(ptn) str(myDataIn)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the input data (background, geneList, effect, colours) stored in the postNetData object: myDataIn <- ptn_dataIn(ptn) str(myDataIn)
postNetData object
Retrieve the regulatory effect measurement from a postNetData object. If using geneList, the "effect" slot holds a user-provided named numeric vector corresponding to a custom regulation effect measurement. If using ads, the values in the "effect" slot will correspond to the selected regulationGen from ads.
## S4 method for signature 'postNetData' ptn_effect(ptn)## S4 method for signature 'postNetData' ptn_effect(ptn)
ptn |
A postNetData object. |
A numeric vector with names corresponding to the gene/transcript IDs in geneList or ads.
postNetStart for details on selecting or providing the regulatory effect measurement. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the regulatory effect measurement stored in the postNetData object: myEffect_out <- ptn_effect(ptn) str(myEffect_out)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the regulatory effect measurement stored in the postNetData object: myEffect_out <- ptn_effect(ptn) str(myEffect_out)
features slot of a postNetData object
Retrieve the input features used in stepwise regression and random forest modelling from the postNetData object.
## S4 method for signature 'postNetData' ptn_features(ptn)## S4 method for signature 'postNetData' ptn_features(ptn)
ptn |
A postNetData object. |
Returns a data.frame of features (or gene signatures, see signCalc) quantified during a postNet analysis and used as input for featureIntegration. Rows correspond to genes, and columns correspond to features.
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve a data.frame summarizing the features for each gene used in modelling: featureInput <- ptn_features(ptn) str(featureInput)tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve a data.frame summarizing the features for each gene used in modelling: featureInput <- ptn_features(ptn) str(featureInput)
postNetData object
Retrieve the "GAGE" slot from a postNetData object, which holds the results of GAGE analysis created by the gageAnalysis function.
ptn_GAGE(ptn, category, direction, threshold)ptn_GAGE(ptn, category, direction, threshold)
ptn |
A postNetData object. |
category |
A character vector specifying one or more gene ontology categories to be included in the analysis.
The options are |
direction |
A character indicating the directionality of regulation of the gene sets. Can be either
|
threshold |
A numeric value providing the FDR threshold to be used to select results from the GAGE analysis. |
The value returned by the ptn_GAGE function is a data.frame with the following columns (see the gage package for full details):
p.geomean: The geometric mean of the individual p-values from multiple single array-based gene set tests.
stat.mean: The mean of the individual statistics from multiple single array-based gene set tests. The value denotes the magnitude of the gene-set level changes, and the sign denotes the direction of the changes.
p.val: The global p-value or summary of the individual p-values from multiple single array-based gene set tests.
q.val: The BH adjusted p-value, as implemented by the multtest package.
set.size: The number of genes included in the gene set.
Genes: The gene IDs in the input data belonging to the gene ontology gene set.
See the gage package for details.
Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161.
gage
gageAnalysis
postNetStart
# load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE: ptn <- gageAnalysis(ptn, category = "CC") # Extract the significant enrichment results from the postNetData object: gageOut <- ptn_GAGE(ptn = ptn, category = "CC", direction = "greater", threshold = 1) str(gageOut)# load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE: ptn <- gageAnalysis(ptn, category = "CC") # Extract the significant enrichment results from the postNetData object: gageOut <- ptn_GAGE(ptn = ptn, category = "CC", direction = "greater", threshold = 1) str(gageOut)
postNetData object
Retrieve the gene IDs for the reference annotation sequences from a postNetData object. The annotations for the different sequence regions (5'UTR, CDS, 3'UTR, and optionally CCDS) are lists of three character vectors ("id", "geneID", and "sequence"). The ptn_geneID function retrieves the gene IDs for sequences from the specified mRNA region.
## S4 method for signature 'postNetData' ptn_geneID(ptn, region)## S4 method for signature 'postNetData' ptn_geneID(ptn, region)
ptn |
A postNetData object. |
region |
The sequence region from which to access the transcript IDs. Can be either: |
A character vector containing the gene IDs corresponding to the reference sequence annotations for the specified mRNA region.
postNetStart for more information on reference sequence annotations. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the transcript IDs corresponding to the # reference coding sequences (CDS) from the postNetData object: myGeneIDs <- ptn_geneID(ptn, "CDS") str(myGeneIDs)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the transcript IDs corresponding to the # reference coding sequences (CDS) from the postNetData object: myGeneIDs <- ptn_geneID(ptn, "CDS") str(myGeneIDs)
postNetData object
Retrieve the geneList slot from a postNetData object, which holds a user-provided named list of one or more character vectors corresponding to gene sets of interest to be examined in the postNet analysis.
## S4 method for signature 'postNetData' ptn_geneList(ptn)## S4 method for signature 'postNetData' ptn_geneList(ptn)
ptn |
a |
A named list of one or more character vectors corresponding to gene sets of interest.
postNetStart for details on providing gene sets of interest. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the gene sets of interest stored in the postNetData object: myGeneList <- ptn_geneList(ptn) str(myGeneList)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the gene sets of interest stored in the postNetData object: myGeneList <- ptn_geneList(ptn) str(myGeneList)
postNetData object
Retrieve the "GO" slot from a postNetData object, which holds the results of GO term enrichment analysis created by
the goAnalysis function.
ptn_GO(ptn, category, geneList, threshold)ptn_GO(ptn, category, geneList, threshold)
ptn |
A postNetData object. |
category |
A character vector specifying one or more Gene Ontology categories to extract results for.
The options are |
geneList |
A string specifying the name of the gene set of interest to extract results for. This must match one of the names
of the elements of |
threshold |
A numeric value providing the FDR threshold to be used to select results from the GO term analysis. |
A data.frame of GO term enrichment analysis results that is stored in the GO slot of the postNetData object, and written to an Excel file. Each row of the data.frame corresponds to a term, with the following columns:
ID: The ID of the term or pathway tested.
Description: The name of the term. Note that if the category used in the analysis was "KEGG", there are two additional
columns with descriptive information called "category" and "subcategory".
Count: The number of genes in the set belonging to the term or pathway
Size: The size of the pathway after removing genes not present in the input.
pvalue: The enrichment p-value.
adjusted_pvalue: The BH-adjusted p-value.
geneID: The gene IDs in the input data belonging to the term/pathway.
See clusterProfiler for full details of enrichment analysis implementation in R.
If you use the goAnalysis function in your analysis, please cite:
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, 19(11):3292-3320
clusterProfiler
goAnalysis
postNetStart
slopeFilt
goDotplot
tmp <- tempfile(fileext = ".xlxs") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, category=c("BP"), name = tmp) # Extract the significant enrichment results from the postNetData object: goOut <- ptn_GO(ptn, category = "BP", geneList = "translationUp", threshold = 0.05) str(goOut)tmp <- tempfile(fileext = ".xlxs") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GO term analysis: ptn <- goAnalysis(ptn = ptn, category=c("BP"), name = tmp) # Extract the significant enrichment results from the postNetData object: goOut <- ptn_GO(ptn, category = "BP", geneList = "translationUp", threshold = 0.05) str(goOut)
postNetData object
Retrieve the "GSEA" slot from a postNetData object, which holds the results of GSEA created by the gseaAnalysis function.
ptn_GSEA(ptn, threshold)ptn_GSEA(ptn, threshold)
ptn |
A postNetData object. |
threshold |
A numeric value providing the FDR threshold to be used to select results from the GSEA. |
A data.frame of GSEA results that is stored in the GSEA slot of the postNetData object, and written to a tab-delimited text file. Each row of the data.frame corresponds to a term, with the following columns (also see fgseaMultilevel from the fgsea package):
Term: The name of the term or pathway tested.
ES: The enrichment score.
NES: The normalized enrichment score, adjusted to the mean enrichment of randomly sampled size-matched sets.
log2err: The expected error for the standard deviation of the P-value logarithm.
count: The number of genes in the set belonging to the term or pathway.
size: The size of the pathway after removing genes not present in the input.
pvalue: The enrichment p-value.
adjusted_pvalue: The BH-adjusted p-value.
Genes: The gene IDs in the input data belonging to the term/pathway.
See fgsea for full details of GSEA implementation in R.
See https://www.gsea-msigdb.org/gsea/index.jsp for GSEA documentation.
A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, & J.P. Mesirov. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, Proc. Natl. Acad. Sci. U.S.A. 102 (43) 15545-15550, (2005).
Mootha, V., Lindgren, C., Eriksson, KF. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–273 (2003).
tmp <- tempfile(fileext = ".txt") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create example custom gene sets for GSEA: inSet <- list(set=sample(unlist(postNetExample$geneList[[1]]), 10)) # Run GSEA on custom gene sets: ptn <- gseaAnalysis(ptn = ptn, geneSet = inSet, name = tmp) # Extract enrichment results from the postNetData object with an FDR < 0.05: gseaOut <- ptn_GSEA(ptn, threshold=0.05)tmp <- tempfile(fileext = ".txt") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create example custom gene sets for GSEA: inSet <- list(set=sample(unlist(postNetExample$geneList[[1]]), 10)) # Run GSEA on custom gene sets: ptn <- gseaAnalysis(ptn = ptn, geneSet = inSet, name = tmp) # Extract enrichment results from the postNetData object with an FDR < 0.05: gseaOut <- ptn_GSEA(ptn, threshold=0.05)
postNetData object
Retrieve the transcript IDs for the reference annotation sequences from a postNetData object. The annotations for the different sequence regions (5'UTR, CDS, 3'UTR, and optionally CCDS) are lists of three character vectors ("id", "geneID", and "sequence"). The ptn_id function retrieves the transcript IDs for sequences from the specified mRNA region.
## S4 method for signature 'postNetData' ptn_id(ptn, region)## S4 method for signature 'postNetData' ptn_id(ptn, region)
ptn |
A postNetData object. |
region |
The sequence region from which to access the transcript IDs. Can be either: |
A character vector containing the transcript IDs corresponding to the reference sequence annotations for the specified mRNA region.
postNetStart for more information on reference sequence annotations. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the transcript IDs corresponding # to the reference 5'UTR sequences from the postNetData object myTransIDs <- ptn_id(ptn, "UTR5") str(myTransIDs)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the transcript IDs corresponding # to the reference 5'UTR sequences from the postNetData object myTransIDs <- ptn_id(ptn, "UTR5") str(myTransIDs)
postNetData object
Retrieve the "miRNA_analysis" slot from a postNetData object, which holds the results of GAGE analysis identifying miRNA target enrichments in gene sets of interest created by the miRNAanalysis function.
ptn_miRNA_analysis(ptn, direction, threshold)ptn_miRNA_analysis(ptn, direction, threshold)
ptn |
A postNetData object. |
direction |
A character indicating the directionality of regulation of the gene sets. Can be either
|
threshold |
A numeric value providing the FDR threshold to be used to select results from the GAGE miRNA enrichment analysis. |
Note that enrichments in miRNA predicted to target genes that are upregulated (e.g., if the regulatory effect measurement is log2 fold change) that appear in the results table labelled "greater" can be interpreted as those miRNA that may be downregulated or otherwise not active in the experimental condition. Likewise, enrichments in miRNA predicted to target genes that are downregulated that appear in the results table labelled "less" can be interpreted as those miRNA that may be upregulated or active in the experimental condition.
The value returned by the ptn_miRNA_analysis function is a data.frame with the following columns (see the gage package for full details):
p.geomean: The geometric mean of the individual p-values from multiple single array-based gene set tests.
stat.mean: The mean of the individual statistics from multiple single array-based gene set tests. The value denotes the magnitude of the gene-set level changes, and the sign denotes the direction of the changes.
p.val: The global p-value or summary of the individual p-values from multiple single array-based gene set tests.
q.val: The BH adjusted p-value, as implemented by the multtest package.
set.size: The number of genes included in the gene set.
If you use the miRNAanalysis function in your analysis, please cite:
McGeary SE, Lin KS, Shi CY, Pham T, Bisaria N, Kelley GM, Bartel DP. The biochemical basis of microRNA targeting efficacy. Science Dec 5, (2019).
Agarwal V, Bell GW, Nam J, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4:e05005, (2015). eLife Lens view.
Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161.
gage
miRNAanalysis
ptn_miRNA_to_gene
postNetStart
## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn = ptn, miRNATargetScanFile = "miRNA_predictions_hsa.txt", contextScore = -0.2, Pct = 0.4) # Extract the significant enrichment results from the postNetData object: miRNAout <- ptn_miRNA_analysis(ptn = ptn, direction = "less", threshold = 1) str(miRNAout) ## End(Not run)## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn = ptn, miRNATargetScanFile = "miRNA_predictions_hsa.txt", contextScore = -0.2, Pct = 0.4) # Extract the significant enrichment results from the postNetData object: miRNAout <- ptn_miRNA_analysis(ptn = ptn, direction = "less", threshold = 1) str(miRNAout) ## End(Not run)
postNetData object
Retrieve the "miRNA_to_gene" slot from a postNetData object, which stores a list of genes that are predicted to be targeted by the miRNA that passed
the filtering thresholds for contextScore and/or Pct.
ptn_miRNA_to_gene(ptn, miRNAs)ptn_miRNA_to_gene(ptn, miRNAs)
ptn |
A postNetData object. |
miRNAs |
A character vector specifying one or more miRNAs for which to extract the list of target genes. |
A list where each element is a vector of genes that are predicted to be targeted by the miRNA that passed the filtering thresholds for contextScore and/or Pct.
If you use the miRNAanalysis function in your analysis, please cite:
McGeary SE, Lin KS, Shi CY, Pham T, Bisaria N, Kelley GM, Bartel DP. The biochemical basis of microRNA targeting efficacy. Science Dec 5, (2019).
Agarwal V, Bell GW, Nam J, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4:e05005, (2015). eLife Lens view.
Luo W., Friedman M., Shedden K., Hankenson K., and Woolf P. GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161.
gage
miRNAanalysis
ptn_miRNA_analysis
postNetStart
## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn = ptn, miRNATargetScanFile = "miRNA_predictions_hsa.txt", contextScore = -0.2, Pct = 0.4) # Extract the significant enrichment results from the postNetData object: miRNAtargets <- ptn_miRNA_to_gene(ptn = ptn, miRNAs = c("hsa-miR-138-5p", "hsa-miR-182-5p")) str(miRNAtargets) ## End(Not run)## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Run GAGE miRNA target enrichment analysis: ptn <- miRNAanalysis(ptn = ptn, miRNATargetScanFile = "miRNA_predictions_hsa.txt", contextScore = -0.2, Pct = 0.4) # Extract the significant enrichment results from the postNetData object: miRNAtargets <- ptn_miRNA_to_gene(ptn = ptn, miRNAs = c("hsa-miR-138-5p", "hsa-miR-182-5p")) str(miRNAtargets) ## End(Not run)
postNetData object
Retrieve the models resulting from both the stepwise linear regression and random forest implementations of featureIntegration from a postNetData object.
ptn_model(ptn, analysis_type, model, comparison)ptn_model(ptn, analysis_type, model, comparison)
ptn |
A postNetData object. |
analysis_type |
A string specifying the method of analysis used for featureIntegration to retrieve the selected features from. The options are |
model |
A string specifying the model to retrieve. If |
comparison |
An integer specifying which comparison to select the model from. For example, |
If analysis_type = "lm" and model = "univariateModel", an object of class postNetUnivariate is returned with three slots containing the p-values, FDRs, and proportion of variance explained for each feature passing the significance threshold in univariate models.
If analysis_type = "lm" and model = "stepwiseModel", an object of class postNetStepWise is returned with two slots containing the linear models from each step of forward stepwise regression, and the F-value table storing the F-values for each feature at each step of forward stepwise regression.
If analysis_type = "lm" and model = "finalModel", an object of class postNetFinalModel is returned with three slots containing the total variance explained by the final omnibus model, the final omnibus linear model, and a table summarizing the p-values and variance explained by each feature in the omnibus and adjusted models.
If analysis_type = "rf" and model = "preModel", an object of class randomForest is returned for the pre-model using all features. See randomForest for full description of the output object.
If analysis_type = "rf" and model = "borutaModel", an object of class Boruta is returned with the results of feature selection. See Boruta for full description of the output object.
If analysis_type = "rf" and model = "finalModel", an object of class randomForest is returned for the final model using the selected features. See randomForest for full description of the output object.
Liaw A, Wiener M (2002). “Classification and Regression by randomForest.” R News, 2(3), 18-22. https://CRAN.R-project.org/doc/Rnews/.
Kursa MB., Witold R. Rudnicki (2010). “Feature Selection with the Boruta Package.” Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11.
featureIntegration
Boruta
randomForest
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve the omnibus model from forward stepwise regression analysis: finalModel <- ptn_model(ptn, analysis_type = "lm", model = "finalModel", comparison = 1) str(finalModel)tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve the omnibus model from forward stepwise regression analysis: finalModel <- ptn_model(ptn, analysis_type = "lm", model = "finalModel", comparison = 1) str(finalModel)
postNetData object
Retrieve the results of de novo motif identification for each gene set of interest in a postNetData object. Data for all motifs identified for each gene set of interest in the input geneList with enrichment p-values below the given threshold are stored in universalmotif format.
## S4 method for signature 'postNetData' ptn_motifGeneList(ptn, region, geneList)## S4 method for signature 'postNetData' ptn_motifGeneList(ptn, region, geneList)
ptn |
A postNetData object. |
region |
A string specifying the sequence region of interest. This can be one of |
geneList |
A string providing the name of the gene set of interest from |
The ptn_motifGeneList function returns additional information on the selected de novo identified motifs for the specified gene list in universalmotif format.
Timothy L. Bailey, James Johnson, Charles E. Grant, William S. Noble, "The MEME Suite", Nucleic Acids Research, 43(W1):W39-W49, 2015.
Timothy L. Bailey, "STREME: Accurate and versatile sequence motif discovery", Bioinformatics, 2021. https://doi.org/10.1093/bioinformatics/btab203
memes
runStreme
universalmotif
motifAnalysis
# Note that as users must provide the path to the MEME Suite executables, it is not possible to # create a fully executable example as this path will vary depending on many factors. An example of # how to run this function is provided below, and can be updated with the correct memePath argument. ## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c('UTR5'), subregion = NULL, subregionSel = NULL) # Extract full motif results from one gene set of interest: motifsTransUp <- ptn_motifGeneList(ptn = ptn, region = 'UTR5', geneList = 'translationUp') str(motifsTransUp) ## End(Not run)# Note that as users must provide the path to the MEME Suite executables, it is not possible to # create a fully executable example as this path will vary depending on many factors. An example of # how to run this function is provided below, and can be updated with the correct memePath argument. ## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c('UTR5'), subregion = NULL, subregionSel = NULL) # Extract full motif results from one gene set of interest: motifsTransUp <- ptn_motifGeneList(ptn = ptn, region = 'UTR5', geneList = 'translationUp') str(motifsTransUp) ## End(Not run)
postNetData object
Retrieve the "motifSelection" slot from a postNetData object, which holds de novo identified motifs in the gene sets of interest that have enrichment p-values below the given threshold.
## S4 method for signature 'postNetData' ptn_motifSelection(ptn, region)## S4 method for signature 'postNetData' ptn_motifSelection(ptn, region)
ptn |
A postNetData object. |
region |
A string specifying the sequence region of interest. This can be one of |
The ptn_motifSelection function returns a character vector of de novo identified motifs in the gene sets of interest that have enrichment p-values below the threshold set with the stremeThreshold parameter of the motifAnalysis function.
Timothy L. Bailey, James Johnson, Charles E. Grant, William S. Noble, "The MEME Suite", Nucleic Acids Research, 43(W1):W39-W49, 2015.
Timothy L. Bailey, "STREME: Accurate and versatile sequence motif discovery", Bioinformatics, 2021. https://doi.org/10.1093/bioinformatics/btab203
# Note that as users must provide the path to the MEME Suite executables, it is not possible to # create a fully executable example as this path will vary depending on many factors. An example of # how to run this function is provided below, and can be updated with the correct memePath argument. ## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c('UTR5'), subregion = NULL, subregionSel = NULL) # Extract the significantly enriched motifs: myMotifs <- ptn_motifSelection(ptn = ptn, region = 'UTR5') str(myMotifs) ## End(Not run)# Note that as users must provide the path to the MEME Suite executables, it is not possible to # create a fully executable example as this path will vary depending on many factors. An example of # how to run this function is provided below, and can be updated with the correct memePath argument. ## Not run: # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn ptn <- motifAnalysis(ptn = ptn, stremeThreshold = 0.05, minwidth = 6, memePath = "/meme/bin", region = c('UTR5'), subregion = NULL, subregionSel = NULL) # Extract the significantly enriched motifs: myMotifs <- ptn_motifSelection(ptn = ptn, region = 'UTR5') str(myMotifs) ## End(Not run)
networkGraph slot of a postNetData object
Retrieve the network (an S3 igraph object) resulting from stepwise regression modelling from the postNetData object.
## S4 method for signature 'postNetData' ptn_networkGraph(ptn, comparison)## S4 method for signature 'postNetData' ptn_networkGraph(ptn, comparison)
ptn |
A postNetData object. |
comparison |
An integer specifying which comparison to select results from. For example, |
An S3 igraph object based on the results of stepwise regression modelling with featureIntegration.
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve the network graph of the results of the forward stepwise regression omnibus model: networkGraph <- ptn_networkGraph(ptn, comparison = 1) str(networkGraph)tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1,2)), NetModelSel = "omnibus") # Retrieve the network graph of the results of the forward stepwise regression omnibus model: networkGraph <- ptn_networkGraph(ptn, comparison = 1) str(networkGraph)
selectedFeatures slot of a postNetData object
Retrieve the features explaining changes in the regulatory effect identified by stepwise regression or random forest models from the postNetData object.
ptn_selectedFeatures(ptn, analysis_type, comparison)ptn_selectedFeatures(ptn, analysis_type, comparison)
ptn |
A postNetData object. |
analysis_type |
A string specifying the method of analysis used for featureIntegration to retrieve the selected features from. The options are |
comparison |
An integer specifying which comparison to select results from. For example, |
The relationships between selected features and changes in the regulatory effect can be further explored using the plotFeaturesMap function, which uses UMAPs to visualize regulatory effects and features across genes.
The selected features can also be quantified in other gene lists or datasets and used to predict regulation using random forest classification, implemented with the rfPred function.
If analysis_type = "lm" a named numeric vector of the features selected by stepwise regression modelling will be returned. Selected features are those passing the significance defined using the stepP parameter of the featureIntegration function. Numeric values correspond to the proportion of variance in the regulatory effect explained by each feature (from either omnibus, or adjusted models).
If analysis_type = "rf", the selected features identified by Boruta and used in random forest classification will be returned, with values corresponding to feature importance calculated by randomForest.
Liaw A, Wiener M (2002). “Classification and Regression by randomForest.” R News, 2(3), 18-22. https://CRAN.R-project.org/doc/Rnews/.
Witold R. Rudnicki M (2010). “Feature Selection with the Boruta Package.” Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11.
featureIntegration
plotFeaturesMap
rfPred
Boruta
randomForest
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1, 2)), NetModelSel = "omnibus") # Retrieve the features selected by the model that explain changes in translation efficiency: selectedFeatures <- ptn_selectedFeatures(ptn, analysis_type = "lm", comparison = 1) str(selectedFeatures)tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling: myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using forward stepwise regression: ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, regOnly = TRUE, allFeat = FALSE, analysis_type = "lm", covarFilt = 20, comparisons = list(c(1, 2)), NetModelSel = "omnibus") # Retrieve the features selected by the model that explain changes in translation efficiency: selectedFeatures <- ptn_selectedFeatures(ptn, analysis_type = "lm", comparison = 1) str(selectedFeatures)
postNetData object
Retrieve the "selection" slot from a postNetData object, which holds the method chosen to select which mRNA isoforms are used in analyses. These can be one of: "longest", "shortest", or "random".
## S4 method for signature 'postNetData' ptn_selection(ptn)## S4 method for signature 'postNetData' ptn_selection(ptn)
ptn |
A postNetData object. |
A character specifying the method used to select mRNA isoforms. Can be either "longest", "shortest", or "random".
postNetStart for details on the different mRNA isoform selection options. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access mRNA isoform selection method: ptn_selection(ptn)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access mRNA isoform selection method: ptn_selection(ptn)
postNetData object
The "postNetAnnot" slot holds the reference sequence annotations as postNetRegion lists. The annotations for the different sequence regions (5'UTR, CDS, 3'UTR, and optionally CCDS) are lists of three character vectors ("id", "geneID", and "sequence"). The ptn_sequences function retrieves sequences from the specified mRNA region.
## S4 method for signature 'postNetData' ptn_sequences(ptn, region)## S4 method for signature 'postNetData' ptn_sequences(ptn, region)
ptn |
A postNetData object. |
region |
The sequence region to be accessed. Can be either: |
A character vector containing the reference annotation sequences corresponding to the specified region.
postNetStart for more information on obtaining reference sequence annotations from different sources. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the 5'UTR sequences from the postNetData object UTR5seqs <- ptn_sequences(ptn, "UTR5") str(UTR5seqs)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access the 5'UTR sequences from the postNetData object UTR5seqs <- ptn_sequences(ptn, "UTR5") str(UTR5seqs)
postNetData object
Retrieve the "species" slot from a postNetData object, which holds the source species of the RefSeq sequence annotations. Currently, "human" and "mouse" are available if source is "load", "create", "createFromSourceFiles", or "createFromFasta".
## S4 method for signature 'postNetData' ptn_species(ptn)## S4 method for signature 'postNetData' ptn_species(ptn)
ptn |
A postNetData object. |
A character specifying the species of origin for the RefSeq annotations used to generate the reference sequences.
postNetStart for details on available species, and providing custom annotations. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access species of origin for the RefSeq annotation used to create the reference sequences ptn_species(ptn)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access species of origin for the RefSeq annotation used to create the reference sequences ptn_species(ptn)
postNetData object
Retrieve the "version" slot from a postNetData object, which holds the release version number of the RefSeq annotations used to generate the reference sequences. Currently, this only applies when source = 'load' in the postNetStart function.
## S4 method for signature 'postNetData' ptn_version(ptn)## S4 method for signature 'postNetData' ptn_version(ptn)
ptn |
A postNetData object. |
A character specifying the release version of the RefSeq annotations used to generate the reference sequences.
postNetStart for details on selecting the appropriate reference sequence annotation versions. postNetData-class
# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access version of the RefSeq annotation used to create the reference sequences ptn_version(ptn)# load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Access version of the RefSeq annotation used to create the reference sequences ptn_version(ptn)
After identifying features associated with changes in post-transcriptional regulation and training a random forest classification model using the featureIntegration function, the rfPred function can then be used to apply this model to predict post-transcriptional regulation in new gene lists or datasets based on the same set of features. In this way, it is possible to evaluate how well a set of features identified as being associated with post-transcriptional regulation in one context can explain regulation observed in another context.
rfPred(ptn, comparison, predGeneList, predFeatures, pdfName = NULL)rfPred(ptn, comparison, predGeneList, predFeatures, pdfName = NULL)
ptn |
A postNetData object after running the random forest implementation of |
comparison |
An integer specifying which comparison to select the model from. For example, |
predGeneList |
A named list of character vectors corresponding to the new gene lists for which regulation will be predicted. The list must have two gene lists corresponding to the same classes in the predictive model trained on the original data. For example, if the random forest model from |
predFeatures |
A named list of numeric vectors corresponding to features that will be used with the model trained on the original data to predict regulatory effects in the new data. Each vector element in the list must have names corresponding to gene IDs. These features must be enumerated in the new gene lists or dataset that is being predicted on, and must include the same features used in the final random forest model trained on the original dataset. |
pdfName |
Name to be appended to output PDF files. |
The rfPred function applies a trained Random Forest classification model generated by featureIntegration to an independent set of genes. Prediction is performed using the same feature definitions (enumerated in the new data set) and classes as the original model, enabling assessment of how well features explaining post-transcriptional regulation in one dataset generalize to another context. Model performance is summarized using ROC analysis, providing a quantitative measure of cross-dataset predictive power.
The output of rfPred is a PDF file with the Receiver Operating Characteristic (ROC) curve illustrating the performance of the trained model in predicting regulatory classes in the new dataset.
Sing T, Sander O, Beerenwinkel N, Lengauer T (2005). “ROCR: visualizing classifier performance in R.” Bioinformatics, 21(20), 7881. http://rocr.bioinf.mpi-sb.mpg.de.
Liaw A, Wiener M (2002). “Classification and Regression by randomForest.” R News, 2(3), 18-22. https://CRAN.R-project.org/doc/Rnews/.
featureIntegration randomForest performance
tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using random forest classification to select features and # generate the final model that will be used to predict regulation in a new gene list. ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, analysis_type ="rf", comparisons = list(c(1, 2))) # Simulate a new gene list by randomly sampling genes from the background. newGenes <- sample(postNetExample$background, size = 20) newGeneList <- list(translationUp = newGenes[1:10], translationDown = newGenes[11:20]) # Select the features that were used to train the final random forest model in the original dataset. predFeatureNames <- names(ptn_selectedFeatures(ptn, analysis_type = "rf", comparison = 1)) # Prepare the predictive features. In this case the same list of input features # will be used to predict as the new gene lists are taken from # the same dataset the model was trained on. However, usually the input # for the rfPred function would be taken from a postNetData object # from a separate analysis on a distinct dataset. newFeatures <- myFeatures[predFeatureNames] ptn <- rfPred(ptn = ptn, comparison = 1, predGeneList = newGeneList, predFeatures = newFeatures, pdfName = tmp )tmp <- tempfile(fileext = ".pdf") # load and create example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Prepare the list of pre-calculated features to be used in feature integration modelling myFeatures <- postNetExample$features str(myFeatures) # Run feature integration modelling using random forest classification to select features and # generate the final model that will be used to predict regulation in a new gene list. ptn <- featureIntegration(ptn = ptn, features = myFeatures, pdfName = tmp, analysis_type ="rf", comparisons = list(c(1, 2))) # Simulate a new gene list by randomly sampling genes from the background. newGenes <- sample(postNetExample$background, size = 20) newGeneList <- list(translationUp = newGenes[1:10], translationDown = newGenes[11:20]) # Select the features that were used to train the final random forest model in the original dataset. predFeatureNames <- names(ptn_selectedFeatures(ptn, analysis_type = "rf", comparison = 1)) # Prepare the predictive features. In this case the same list of input features # will be used to predict as the new gene lists are taken from # the same dataset the model was trained on. However, usually the input # for the rfPred function would be taken from a postNetData object # from a separate analysis on a distinct dataset. newFeatures <- myFeatures[predFeatureNames] ptn <- rfPred(ptn = ptn, comparison = 1, predGeneList = newGeneList, predFeatures = newFeatures, pdfName = tmp )
postNetData object
The signaturesHeatmap function produces a heatmap of the regulation of gene signatures of interest in a
postNetData object. Regulation of a gene signature of interest in the data set can be evaluated using the FDR of a Wilcoxon
Rank Sum test comparing the regulatory effect measurement for the gene signature against the background. Alternatively, the difference
in the regulatory effect measurement for the genes in the signature compared to background can also be visualized at percentiles.
signaturesHeatmap(ptn, signatureList, unit = "FDR", pdfName = NULL)signaturesHeatmap(ptn, signatureList, unit = "FDR", pdfName = NULL)
ptn |
A postNetData object. |
signatureList |
A named list of vectors containing gene IDs for the signatures of interest to be examined. Note that several signatures of translational regulation are provided with the package for both human and mouse. These can be retrieved using the get_signatures function. |
unit |
A string specifying the unit to be plotted in the heatmap. The options are |
pdfName |
Name to be appended to output PDF files. |
When unit = 'FDR', the values display in the heatmap are obtained from a two-sided Wilcoxon Rank Sum test comparing
the regulatory effect measurement values for the gene signature of interest against the background. See postNetStart
for details on background gene sets. P-values are then corrected for multiple testing using the Benjamini & Hochberg method, and the
-log10 FDR value is multiplied by either 1 or -1 corresponding to up- or down-regulation of the gene signature compared to background,
respectively.
When unit = 'p75' (or any other percentile value), the values displayed in the heatmap are obtained by taking the difference between
the Empirical Cumulative Distribution Function (ecdf) of the regulatory effect measurements for the gene signature and the background,
at the percentile specified.
No value is returned. Graphical outputs are generated as PDF files.
plotSignatures
plotSignatures_ads
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # load signature data: signatures <- get_signatures("human") # Examine the regulation of various transcripts # sensitive to translational regulation by various pathways and factors signaturesHeatmap(ptn, signatureList = signatures, unit = 'FDR', pdfName = tmp)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # load signature data: signatures <- get_signatures("human") # Examine the regulation of various transcripts # sensitive to translational regulation by various pathways and factors signaturesHeatmap(ptn, signatureList = signatures, unit = 'FDR', pdfName = tmp)
The signCalc function converts lists of gene IDs into signatures that are compatible as inputs for downstream
featureIntegration models.
signCalc(ptn, signatures)signCalc(ptn, signatures)
ptn |
A postNetData object. |
signatures |
A named list of character vectors containing gene IDs to be converted into signatures that can be used as input with the featureIntegration function. |
Named list of vectors indicating inclusion in different signatures for all genes in the postNetData object.
Genes in the input signature that are included in the postNetData object are encoded as 1, and those that are absent are
encoded as a 0.
# load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create signatures of interest to be used in featureIntegration models addSign <- list() addSign[["TOP_mRNAs"]] <- c("RPL4", "RPL5","EEF1A1","EIF3D") addSign[["Hypoxia_Induced"]] <- c("VEGFA","LOX","ENO1","PDK1","PGK1","HIF1") mySign <- signCalc(ptn = ptn, signatures = addSign) str(mySign)# load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Create signatures of interest to be used in featureIntegration models addSign <- list() addSign[["TOP_mRNAs"]] <- c("RPL4", "RPL5","EEF1A1","EIF3D") addSign[["Hypoxia_Induced"]] <- c("VEGFA","LOX","ENO1","PDK1","PGK1","HIF1") mySign <- signCalc(ptn = ptn, signatures = addSign) str(mySign)
The slopeFilt function identifies the genes with unrealistic models of changes in translation efficiency according to the slopes fitted in the per-gene anota2seq analysis of partial variance (APV) models. These genes are then filtered out of the analysis by providing the output of slopeFilt to the genesSlopeFiltOut argument of the gseaAnalysis, gseaPlot, gageAnalysis, and goAnalysis functions. See the anota2seq vignette for additional details on slope filtering.
slopeFilt(ads, regulationGen, contrastSel, minSlope = NULL, maxSlope = NULL)slopeFilt(ads, regulationGen, contrastSel, minSlope = NULL, maxSlope = NULL)
ads |
An S4 object of class |
regulationGen |
The regulation effect measurement that will be taken from |
contrastSel |
The contrast in |
minSlope |
The minimum threshold to filter genes whose calculated slopes in anota2seq APV models are
too small to be realistic. If |
maxSlope |
The maximum threshold to filter genes whose calculated slopes in anota2seq APV models are
too large to be realistic. If |
When performing GSEA, GAGE, or GO term analysis using the output of an anota2seq analysis, it is often necessary to filter the input genes and log2 fold changes for the "translation" and "buffering" regulatory modes prior to performing enrichment analyses. This is because the slopes fitted by the anota2seq APV models can sometimes have unrealistic values, or suggest unlikely translational regulation, impacting the analysis of changes in translation or translational buffering (or offsetting). Filtering out genes with these unrealistic slopes is especially important for GSEA and GAGE analyses, which rely on rankings. For analyses relying on hypergeometric tests, such as GO term enrichment, the impact of filtering on the analysis is likely to be more negligible. However, slope filtering is still recommended. Note that in high-quality data sets, usually few genes will require slope filtering.
A character vector of the gene identifiers that will be excluded from downstream enrichment analyses based on the selected slope filtering thresholds.
anota2seqSelSigGenes anota2seqRun gseaAnalysis gseaPlot gageAnalysis goAnalysis miRNAanalysis
local({ oldwd <- getwd() on.exit(setwd(oldwd), add = TRUE) setwd(tempdir()) # load example data: data("postNetExample", package = "postNet") # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Get the genes to be filtered out of downstream enrichment analyses # using the buffering regulatory mode: filtOutGenes <- slopeFilt(ads, regulationGen = "buffering", contrastSel = 1) str(filtOutGenes) })local({ oldwd <- getwd() on.exit(setwd(oldwd), add = TRUE) setwd(tempdir()) # load example data: data("postNetExample", package = "postNet") # Initialize Anota2seqDataSet (see anota2seq vignette for details) ads <- anota2seq::anota2seqDataSetFromMatrix( dataP = postNetExample$ads_data$dataP, dataT = postNetExample$ads_data$dataT, phenoVec = postNetExample$ads_data$phenoVec, batchVec = c(1, 2, 3, 4, 1, 2, 3, 4), dataType = "RNAseq", normalize = FALSE) # Run an anota2seq analysis: # Note that the quality control and residual outlier testing are not # performed to limit the running time of this example. For full details # on running an analysis please see the anota2seq vignette and help manual. ads <- anota2seq::anota2seqRun(ads, performQC = FALSE, performROT = FALSE, useProgBar = FALSE) # Get the genes to be filtered out of downstream enrichment analyses # using the buffering regulatory mode: filtOutGenes <- slopeFilt(ads, regulationGen = "buffering", contrastSel = 1) str(filtOutGenes) })
The uorfAnalysis function detects the presence and position of uORFs in 5'UTRs. Options are available to examine uORFs with both canonical and non-canonical start codons, as well as different Kozak contexts. In addition, the proportions of mRNA transcripts with uORFs can be plotted, and statistical comparisons can be made between gene sets of interest against background, and/or other gene sets.
uorfAnalysis(ptn, startCodon = "ATG", KozakContext = "strong", onlyUTR5 = FALSE, unitOut = "number", comparisons = NULL, plotOut = TRUE, pdfName = NULL)uorfAnalysis(ptn, startCodon = "ATG", KozakContext = "strong", onlyUTR5 = FALSE, unitOut = "number", comparisons = NULL, plotOut = TRUE, pdfName = NULL)
ptn |
A postNetData object. |
startCodon |
A string specifying the start codon. Only uORFs with the specified start codon will be detected.
The default is the canonical start codon, |
KozakContext |
A string to select the Kozak context. The options are: |
onlyUTR5 |
A logical to select whether the uORFs detected should be completely contained in the 5'UTR,
(i.e., the stop codons of the uORFs occur before the start of the CDS). If |
unitOut |
A string to specify whether the output should be the |
comparisons |
A list of numeric vectors specifying pairwise comparisons between gene sets defined in the |
plotOut |
Logical indicating whether PDF files of plots are generated. If |
pdfName |
Name to be appended to output PDF files. |
A two-sided Wilcoxon Rank Sum test is performed to identify significant differences in uORFs detected between gene sets of interest, or against the background gene set.
If the unitOut = "number", the output will be a named list of vectors with the number of uORFs
detected in a particular Kozak context for each gene. This list can be used with the downstream
featureIntegration function. If the unitOut = "position", the output will be a named
list of lists with the start and end positions of each uORF for each gene. These positions cannot be used
in featureIntegration. The uorfAnalysis function can also return PDF files of output
plots with statistical comparisons between gene sets of interest when unitOut = "number".
tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Identify uORFs with canonical start codons in a strong Kozak context, # fully contained within 5'UTRs, and compare between # translationUp vs. translationDown genes: uORFs <- uorfAnalysis(ptn = ptn, comparisons = list(c(1, 2)), startCodon = "ATG", KozakContext = c("strong"), onlyUTR5 = TRUE, unitOut = "number", plotOut = TRUE, pdfName = tmp) str(uORFs)tmp <- tempfile(fileext = ".pdf") # load example data: data("postNetExample", package = "postNet") ptn <- postNetExample$ptn # Identify uORFs with canonical start codons in a strong Kozak context, # fully contained within 5'UTRs, and compare between # translationUp vs. translationDown genes: uORFs <- uorfAnalysis(ptn = ptn, comparisons = list(c(1, 2)), startCodon = "ATG", KozakContext = c("strong"), onlyUTR5 = TRUE, unitOut = "number", plotOut = TRUE, pdfName = tmp) str(uORFs)