| Title: | Protein motifs analysis and visualisation |
|---|---|
| Description: | Provides tools for large-scale protein motif analysis and visualization in R. PMScanR facilitates the identification of motifs using external tools like PROSITE's ps_scan (handling necessary file downloads and execution) and enables downstream analysis of results. Key features include parsing scan outputs, converting formats (e.g., to GFF-like structures), generating motif occurrence matrices, and creating informative visualizations such as heatmaps, sequence logos (via seqLogo/ggseqlogo). The package also offers an optional Shiny-based graphical user interface for interactive analysis, aiming to streamline the process of exploring motif patterns across multiple protein sequences. |
| Authors: | Jan Pawel Jastrzebski [aut, cre] (ORCID: <https://orcid.org/0000-0001-8699-7742>), Monika Gawronska [ctb] (ORCID: <https://orcid.org/0009-0001-2677-6371>), Wiktor Babis [ctb] (ORCID: <https://orcid.org/0009-0006-3648-3413>), Miriana Quaranta [ctb] (ORCID: <https://orcid.org/0009-0003-0855-485X>), Damian Czopek [ctb, aut] (ORCID: <https://orcid.org/0009-0005-3471-4866>) |
| Maintainer: | Jan Pawel Jastrzebski <[email protected]> |
| License: | GPL-3 |
| Version: | 1.3.0 |
| Built: | 2026-05-30 07:11:03 UTC |
| Source: | https://github.com/bioc/PMScanR |
This function reads a file in PSA format containing protein sequences and extracts motifs based on specific patterns.
extractProteinMotifs(file_path)extractProteinMotifs(file_path)
file_path |
A PSA file which specifying the path to the input file. The file should contain protein motifs. |
A list with a motif identifier (e.g. PSXXXXX) and the coresponding to it motif sequence which is associated wuth that identifier.
file_path <- system.file("extdata/out_Hb_psa.txt", package = "PMScanR") if (file_path != "") { protein_motifs <- extractProteinMotifs(file_path) }file_path <- system.file("extdata/out_Hb_psa.txt", package = "PMScanR") if (file_path != "") { protein_motifs <- extractProteinMotifs(file_path) }
This function iterates over a list of sequences and extracts a sub-sequence from each based on a specified start and end position.
extractSegments(sequences, from, to)extractSegments(sequences, from, to)
sequences |
A list of sequences, where each element is a vector of single characters. This is typically the output of 'seqinr::read.fasta'. |
from |
An integer specifying the starting position for the extraction. |
to |
An integer specifying the ending position for the extraction. |
A list representing the extracted sub-sequences. Sequences that were too short to have a fragment extracted are omitted from the list.
# Get the path to the example FASTA file fasta_file <- system.file("extdata", "hemoglobins.fasta", package = "PMScanR") if (nzchar(fasta_file)) { sequences <- seqinr::read.fasta(fasta_file, seqtype = "AA") segments <- extractSegments(sequences, from = 10, to = 20) }# Get the path to the example FASTA file fasta_file <- system.file("extdata", "hemoglobins.fasta", package = "PMScanR") if (nzchar(fasta_file)) { sequences <- seqinr::read.fasta(fasta_file, seqtype = "AA") segments <- extractSegments(sequences, from = 10, to = 20) }
This function calculates the occurrences and percentages for each protein motif in the 'Name' column of a GFF-like data frame. It then creates a pie chart using 'ggplot2' to visualize the distribution.
freqPie(data)freqPie(data)
data |
A data frame in GFF format containing a column named 'Name' with the names of each protein motif. |
A ggplot object representing the pie chart.
# Create sample data frame similar to parsed GFF output sample_data <- data.frame( seqid = rep(c("Seq1", "Seq2"), each = 5), source = rep("PROSITE", 10), type = rep("MOTIF", 10), start = sample(1:100, 10), end = sample(101:200, 10), score = runif(10), strand = sample(c("+", "-"), 10, replace = TRUE), phase = sample(0:2, 10, replace = TRUE), Name = sample(c("Zinc_finger", "EGF_domain", "Kinase_domain"), 10, replace = TRUE) ) # Generate the pie chart motif_pie_chart <- freqPie(sample_data) # print(motif_pie_chart)# Create sample data frame similar to parsed GFF output sample_data <- data.frame( seqid = rep(c("Seq1", "Seq2"), each = 5), source = rep("PROSITE", 10), type = rep("MOTIF", 10), start = sample(1:100, 10), end = sample(101:200, 10), score = runif(10), strand = sample(c("+", "-"), 10, replace = TRUE), phase = sample(0:2, 10, replace = TRUE), Name = sample(c("Zinc_finger", "EGF_domain", "Kinase_domain"), 10, replace = TRUE) ) # Generate the pie chart motif_pie_chart <- freqPie(sample_data) # print(motif_pie_chart)
This function takes a GFF data frame and converts it into a binary matrix, indicating the presence (1) or absence (0) of a feature in a sequence.
gff2matrix(input)gff2matrix(input)
input |
A data frame containing GFF data, typically generated by 'rtracklayer::import.gff' and converted to a data frame. It must have 'type', 'start', 'end', and 'seqnames' columns. |
A matrix, where values are binary: '1' indicates the presence of a feature, and '0' indicates its absence.
gff_file <- system.file("extdata/out_Hb_gff.txt", package = "PMScanR") if (nzchar(gff_file)) { gff_data <- as.data.frame(rtracklayer::import.gff(gff_file)) motif_matrix <- gff2matrix(gff_data) # print(head(motif_matrix)) }gff_file <- system.file("extdata/out_Hb_gff.txt", package = "PMScanR") if (nzchar(gff_file)) { gff_data <- as.data.frame(rtracklayer::import.gff(gff_file)) motif_matrix <- gff2matrix(gff_data) # print(head(motif_matrix)) }
This function generates a heatmap using the 'plotly' package. The heatmap highlights specific rows and columns provided by the user, while the rest of the matrix is dimmed. The function also adds grid lines to the heatmap for better readability.
matrix2hm(input, x = NULL, y = NULL)matrix2hm(input, x = NULL, y = NULL)
input |
A matrix containing the data to be visualized in the heatmap |
x |
A character vector specifying the columns to highlight in the heatmap |
y |
A character vector specifying the rows to highlight in the heatmap |
A heatmap with highlighted specified rows and columns
# Create a sample matrix with row and column names mat <- matrix(c(1, 0, 1, 0), 2, 2) colnames(mat) <- c("Col1", "Col2") rownames(mat) <- c("Row1", "Row2") heatmap <- matrix2hm(input = mat, x = "Col1", y = "Row1") heatmap# Create a sample matrix with row and column names mat <- matrix(c(1, 0, 1, 0), 2, 2) colnames(mat) <- c("Col1", "Col2") rownames(mat) <- c("Row1", "Row2") heatmap <- matrix2hm(input = mat, x = "Col1", y = "Row1") heatmap
This function generates a heatmap using 'plotly', ensuring the plot has a square aspect ratio. It highlights user-specified rows and columns.
matrixToSquareHeatmap(input, x = NULL, y = NULL)matrixToSquareHeatmap(input, x = NULL, y = NULL)
input |
A matrix containing the data to be visualized. |
x |
A character vector specifying the columns to highlight. |
y |
A character vector specifying the rows to highlight. |
A plotly heatmap object with a square layout.
# Create a sample matrix mat <- matrix(c(1, 0, 1, 0), 2, 2) colnames(mat) <- c("Col1", "Col2") rownames(mat) <- c("Row1", "Row2") sq_heatmap <- matrixToSquareHeatmap(input = mat, x = "Col1", y = "Row1") # To display in an interactive session: # sq_heatmap# Create a sample matrix mat <- matrix(c(1, 0, 1, 0), 2, 2) colnames(mat) <- c("Col1", "Col2") rownames(mat) <- c("Row1", "Row2") sq_heatmap <- matrixToSquareHeatmap(input = mat, x = "Col1", y = "Row1") # To display in an interactive session: # sq_heatmap
This function parses a file from a PROSITE scan into a data frame, extracting information about motif occurrences into a GFF-like structure.
readProsite(prosite_input)readProsite(prosite_input)
prosite_input |
Path to the PROSITE scan output file. |
A data frame with columns approximating GFF fields plus additional PROSITE-specific information.
prosite_file <- system.file("extdata", "PROSITEoutput.txt", package = "PMScanR") # Check that the example file exists before running if (nzchar(prosite_file)) { gff_like_data <- readProsite(prosite_file) # You can view the output with: # head(gff_like_data) }prosite_file <- system.file("extdata", "PROSITEoutput.txt", package = "PMScanR") # Check that the example file exists before running if (nzchar(prosite_file)) { gff_like_data <- readProsite(prosite_file) # You can view the output with: # head(gff_like_data) }
This function reads a file in PSA format and converts it into a standardized, GFF-like data frame for downstream analysis. It is robust to files that may contain extraneous, non-PSA formatted data at the end.
readPsa(psa_file)readPsa(psa_file)
psa_file |
A character string specifying the path to the input PSA file. |
A data frame with a GFF-like structure, including all original placeholder columns.
# Get the path to the example PSA file included with the package psa_file_path <- system.file("extdata", "out_Hb_psa.txt", package = "PMScanR") # Check that the file exists before running the example if (nzchar(psa_file_path)) { gff_like_data <- readPsa(psa_file_path) # You can view the output with: # head(gff_like_data) }# Get the path to the example PSA file included with the package psa_file_path <- system.file("extdata", "out_Hb_psa.txt", package = "PMScanR") # Check that the file exists before running the example if (nzchar(psa_file_path)) { gff_like_data <- readPsa(psa_file_path) # You can view the output with: # head(gff_like_data) }
Calling this function will launch the interactive graphical user interface for the PMScanR package.
runPMScanRShiny()runPMScanRShiny()
This function sets a higher file upload size limit for Shiny and then launches the application, which is built using an internal UI function ('buildUi') and server function ('buildServer').
This function is called for its side effect of launching the Shiny application and does not return a value.
if (interactive()) { # To run the app, simply call the function runPMScanRShiny() }if (interactive()) { # To run the app, simply call the function runPMScanRShiny() }
This function runs the PROSITE ps_scan tool. It handles the downloading and caching of required executables and databases using BiocFileCache, detects the operating system, and executes the scan in a robust manner.
runPsScan(in_file, out_file, out_format, os = NULL)runPsScan(in_file, out_file, out_format, os = NULL)
in_file |
Path to the input file containing protein sequences. |
out_file |
Path for the output file where results will be saved. |
out_format |
The output format for ps_scan (e.g., 'gff', 'psa'). |
os |
The operating system ('WIN', 'LINUX', 'MAC'). If NULL, it is detected automatically. |
Invisibly returns the exit status of the ps_scan command. The primary output is the result file created at 'out_file'.
# This example is resource-intensive and requires an internet connection # on first run to cache necessary files. if (interactive()) { # Create a dummy input file for the example fasta_content <- c(">sp|P02025|HEMA_MESAU Hemoglobin subunit alpha", "MVLSAADKGNVKAAWGKVGGHAAEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG", "KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFT", "PAVHASLDKFLASVSTVLTSKYR") in_file <- tempfile(fileext = ".fasta") writeLines(fasta_content, in_file) out_file <- tempfile(fileext = ".gff") # The first run will download and cache ~100MB of data. # Subsequent runs will use the cached files. runPsScan(in_file = in_file, out_format = 'gff', out_file = out_file) # Clean up temporary files unlink(in_file) unlink(out_file) }# This example is resource-intensive and requires an internet connection # on first run to cache necessary files. if (interactive()) { # Create a dummy input file for the example fasta_content <- c(">sp|P02025|HEMA_MESAU Hemoglobin subunit alpha", "MVLSAADKGNVKAAWGKVGGHAAEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG", "KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFT", "PAVHASLDKFLASVSTVLTSKYR") in_file <- tempfile(fileext = ".fasta") writeLines(fasta_content, in_file) out_file <- tempfile(fileext = ".gff") # The first run will download and cache ~100MB of data. # Subsequent runs will use the cached files. runPsScan(in_file = in_file, out_format = 'gff', out_file = out_file) # Clean up temporary files unlink(in_file) unlink(out_file) }