| Title: | A Comprehensive Workflow for Long Non-coding RNA Identification and Functional Analysis |
|---|---|
| Description: | Provides a complete workflow for the identification, analysis, and functional annotation of long non-coding RNAs (lncRNAs) from RNA-Seq data. The package includes functions for filtering transcripts from GTF files, evaluating the performance of multiple coding potential prediction tools (e.g., CPC2, PLEK, CPAT), and summarizing their agreement. It enables systematic performance analysis of individual tools, "at least N" tool consensus, and all possible tool combinations. Functional analysis is supported through the identification of potential cis- and trans-acting interactions with protein-coding genes, followed by enrichment analysis. Results can be visualized using a variety of plots, including radar plots, clock plots, and interactive Sankey diagrams. |
| Authors: | Jan Pawel Jastrzebski [aut, cre] (ORCID: <https://orcid.org/0000-0001-8699-7742>), Damian Czopek [ctb, aut] (ORCID: <https://orcid.org/0009-0005-3471-4866>), Mariusz Jankowski [ctb] (ORCID: <https://orcid.org/0009-0000-7872-4023>), Monika Gawronska [ctb] (ORCID: <https://orcid.org/0009-0001-2677-6371>), Wiktor Babis [ctb] (ORCID: <https://orcid.org/0009-0006-3648-3413>), Stefano Pascarella [ctb] (ORCID: <https://orcid.org/0000-0002-6822-4022>), Hugo Gruson [ctb] (ORCID: <https://orcid.org/0000-0002-4094-1476>) |
| Maintainer: | Jan Pawel Jastrzebski <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-06-05 06:31:44 UTC |
| Source: | https://github.com/bioc/lncRna |
Reads output files from various coding potential prediction tools and aggregates their results into a structured list. Non-coding predictions are encoded as 1, and coding predictions as 0.
aggregateCodPot( CPC2_outfile = NULL, PLEK_outfile = NULL, FEELnc_outfile = NULL, CPAT_outfile = NULL, CPAT_cutoff = 0.364, CNCI_outfile = NULL, LncFinder_outfile = NULL, lncRNA_Mdeep_outfile = NULL )aggregateCodPot( CPC2_outfile = NULL, PLEK_outfile = NULL, FEELnc_outfile = NULL, CPAT_outfile = NULL, CPAT_cutoff = 0.364, CNCI_outfile = NULL, LncFinder_outfile = NULL, lncRNA_Mdeep_outfile = NULL )
CPC2_outfile |
Path to the CPC2 output file. |
PLEK_outfile |
Path to the PLEK output file. |
FEELnc_outfile |
Path to the FEELnc output file. |
CPAT_outfile |
Path to the CPAT output file. |
CPAT_cutoff |
Cutoff value for CPAT probability (default: 0.364). |
CNCI_outfile |
Path to the CNCI output file. |
LncFinder_outfile |
Path to the LncFinder output file. |
lncRNA_Mdeep_outfile |
Path to the lncRNA_Mdeep output file. |
A list containing seqIDs (a character vector of all unique
sequence IDs found across all provided files) and tools (a list where
each element is a binary vector of predictions for a given tool).
# --- 1. Create temporary output files for a reproducible example --- cpc2File <- tempfile() plekFile <- tempfile() # CPC2 format: ID ... classification write.table( data.frame(V1="ID1", V8="noncoding"), cpc2File, col.names=FALSE, row.names=FALSE, sep="\t" ) write.table( data.frame(V1="ID2", V8="coding"), cpc2File, col.names=FALSE, row.names=FALSE, sep="\t", append=TRUE ) # PLEK format: Classification Score >ID write.table( data.frame(V1="Non-coding", V2="0.9", V3=">ID2"), plekFile, col.names=FALSE, row.names=FALSE, sep=" " ) # --- 2. Run the function with the temporary files --- codpotResults <- aggregateCodPot(CPC2_outfile = cpc2File, PLEK_outfile = plekFile) print("Sequence IDs:") print(codpotResults$seqIDs) print("Tool predictions:") print(codpotResults$tools) # --- 3. Clean up the temporary files --- unlink(c(cpc2File, plekFile))# --- 1. Create temporary output files for a reproducible example --- cpc2File <- tempfile() plekFile <- tempfile() # CPC2 format: ID ... classification write.table( data.frame(V1="ID1", V8="noncoding"), cpc2File, col.names=FALSE, row.names=FALSE, sep="\t" ) write.table( data.frame(V1="ID2", V8="coding"), cpc2File, col.names=FALSE, row.names=FALSE, sep="\t", append=TRUE ) # PLEK format: Classification Score >ID write.table( data.frame(V1="Non-coding", V2="0.9", V3=">ID2"), plekFile, col.names=FALSE, row.names=FALSE, sep=" " ) # --- 2. Run the function with the temporary files --- codpotResults <- aggregateCodPot(CPC2_outfile = cpc2File, PLEK_outfile = plekFile) print("Sequence IDs:") print(codpotResults$seqIDs) print("Tool predictions:") print(codpotResults$tools) # --- 3. Clean up the temporary files --- unlink(c(cpc2File, plekFile))
Merges a table of genomic interactions (e.g., lncRNA-mRNA) with functional enrichment results (e.g., from g:Profiler), annotating each interaction with relevant functional terms.
annotateInteractions( gostResult, interactionTable, type, lncRnaCol = "lncRNAId", targetCol = "targetRNAId" )annotateInteractions( gostResult, interactionTable, type, lncRnaCol = "lncRNAId", targetCol = "targetRNAId" )
gostResult |
A |
interactionTable |
A |
type |
A character string specifying the interaction type to be added to the output (e.g., "cis", "trans", "LncTar"). |
lncRnaCol |
An optional character string specifying the column name in
|
targetCol |
An optional character string specifying the column name in
|
A data.frame where enrichment terms are merged with the interactions
they are associated with.
# --- 1. Create mock input data --- # a) Mock g:Profiler result mockGostResult <- data.frame( term_name = c("response to stress", "cell activation"), intersection = c("TARGET_G1,TARGET_G2", "TARGET_G3"), source = c("GO:BP", "GO:BP"), stringsAsFactors = FALSE ) # b) Mock interaction table with standard column names mockInteractionTable1 <- data.frame( lncRNAId = c("LNC_G1", "LNC_G2", "LNC_G3"), targetRNAId = c("TARGET_G1", "TARGET_G2", "TARGET_G3") ) # --- 2. Run with standard column names --- annotated_trans <- annotateInteractions( gostResult = mockGostResult, interactionTable = mockInteractionTable1, type = "trans" ) print(annotated_trans) # --- 3. Run with custom column names --- # a) Mock interaction table with custom column names mockInteractionTable2 <- data.frame( Query = "LNC_G1", Target = "TARGET_G1" ) annotated_lnctar <- annotateInteractions( gostResult = mockGostResult, interactionTable = mockInteractionTable2, type = "LncTar", lncRnaCol = "Query", targetCol = "Target" ) print(annotated_lnctar)# --- 1. Create mock input data --- # a) Mock g:Profiler result mockGostResult <- data.frame( term_name = c("response to stress", "cell activation"), intersection = c("TARGET_G1,TARGET_G2", "TARGET_G3"), source = c("GO:BP", "GO:BP"), stringsAsFactors = FALSE ) # b) Mock interaction table with standard column names mockInteractionTable1 <- data.frame( lncRNAId = c("LNC_G1", "LNC_G2", "LNC_G3"), targetRNAId = c("TARGET_G1", "TARGET_G2", "TARGET_G3") ) # --- 2. Run with standard column names --- annotated_trans <- annotateInteractions( gostResult = mockGostResult, interactionTable = mockInteractionTable1, type = "trans" ) print(annotated_trans) # --- 3. Run with custom column names --- # a) Mock interaction table with custom column names mockInteractionTable2 <- data.frame( Query = "LNC_G1", Target = "TARGET_G1" ) annotated_lnctar <- annotateInteractions( gostResult = mockGostResult, interactionTable = mockInteractionTable2, type = "LncTar", lncRnaCol = "Query", targetCol = "Target" ) print(annotated_lnctar)
Computes a comprehensive set of confusion matrix statistics (e.g., accuracy,
sensitivity, specificity) for individual coding potential prediction tools.
The function operates on summary data prepared by summarizeTestSet.
bestTool(summaryList, tools = NULL, digits = 4)bestTool(summaryList, tools = NULL, digits = 4)
summaryList |
A list generated by |
tools |
An optional character vector specifying the names of the tools
(from |
digits |
The number of decimal places to round the final statistics to (default: 4). |
A data frame where rows correspond to performance metrics and
columns correspond to the selected tools. Returns NULL if the input is
invalid or no statistics can be calculated.
# --- 1. Create a mock object mimicking the output of prepareEvaluationSets --- # This object contains filtered data ready for evaluation. set.seed(123) n_seq <- 100 exampleEvaluationSummary <- list( tools = list( CPC2 = sample(c(0, 1), n_seq, replace = TRUE), CPAT = sample(c(0, 1), n_seq, replace = TRUE), PLEK = sample(c(0, 1), n_seq, replace = TRUE) ), isNC = sample(c(0, 1), n_seq, replace = TRUE) # True labels ) # --- 2. Run bestTool --- # Example 1: Analyze a specific subset of tools performance_stats <- bestTool( summaryList = exampleEvaluationSummary, tools = c("CPC2", "CPAT") ) print(performance_stats) # Example 2: Analyze all available tools (non-interactively) all_tools_stats <- bestTool(summaryList = exampleEvaluationSummary) print(all_tools_stats) # --- 3. Interactive example (do not run in scripts) --- if (interactive()) { # The following would prompt you to select tools from the console: # interactive_stats <- bestTool(summaryList = exampleSummaryList) }# --- 1. Create a mock object mimicking the output of prepareEvaluationSets --- # This object contains filtered data ready for evaluation. set.seed(123) n_seq <- 100 exampleEvaluationSummary <- list( tools = list( CPC2 = sample(c(0, 1), n_seq, replace = TRUE), CPAT = sample(c(0, 1), n_seq, replace = TRUE), PLEK = sample(c(0, 1), n_seq, replace = TRUE) ), isNC = sample(c(0, 1), n_seq, replace = TRUE) # True labels ) # --- 2. Run bestTool --- # Example 1: Analyze a specific subset of tools performance_stats <- bestTool( summaryList = exampleEvaluationSummary, tools = c("CPC2", "CPAT") ) print(performance_stats) # Example 2: Analyze all available tools (non-interactively) all_tools_stats <- bestTool(summaryList = exampleEvaluationSummary) print(all_tools_stats) # --- 3. Interactive example (do not run in scripts) --- if (interactive()) { # The following would prompt you to select tools from the console: # interactive_stats <- bestTool(summaryList = exampleSummaryList) }
Computes confusion matrix statistics for each "at least n" agreement
threshold generated by evaluateToolsThresholds.
bestToolAtleast(agreementSummary, digits = 4)bestToolAtleast(agreementSummary, digits = 4)
agreementSummary |
A list generated by |
digits |
The number of decimal places to round the final statistics to (default: 4). |
A data frame where rows are performance metrics and columns
correspond to the "at least n" (atl1, atl2, ...) thresholds. Returns
NULL if the input is invalid or no statistics can be calculated.
# --- 1. Create a mock object mimicking the output of prepareEvaluationSets --- # This object serves as input for evaluating thresholds. set.seed(789) n_seq <- 100 # First, create the summary object (as if from prepareEvaluationSets) evaluationSummary <- list( seqIDs = paste0("S", 1:n_seq), isNC = sample(c(0, 1), n_seq, replace = TRUE), type = sample(c("nc", "cds"), n_seq, replace = TRUE), tools = list( ToolA = sample(c(0, 1), n_seq, replace = TRUE), ToolB = sample(c(0, 1), n_seq, replace = TRUE), ToolC = sample(c(0, 1), n_seq, replace = TRUE) ) ) # --- 2. Create the agreement summary using evaluateToolsThresholds --- # This is the object that bestToolAtleast actually takes as input agreementSummary <- evaluateToolsThresholds(summaryList = evaluationSummary) # --- 3. Run bestToolAtleast on the result --- if (!is.null(agreementSummary)) { performance_thresholds <- bestToolAtleast( agreementSummary = agreementSummary ) print(performance_thresholds) }# --- 1. Create a mock object mimicking the output of prepareEvaluationSets --- # This object serves as input for evaluating thresholds. set.seed(789) n_seq <- 100 # First, create the summary object (as if from prepareEvaluationSets) evaluationSummary <- list( seqIDs = paste0("S", 1:n_seq), isNC = sample(c(0, 1), n_seq, replace = TRUE), type = sample(c("nc", "cds"), n_seq, replace = TRUE), tools = list( ToolA = sample(c(0, 1), n_seq, replace = TRUE), ToolB = sample(c(0, 1), n_seq, replace = TRUE), ToolC = sample(c(0, 1), n_seq, replace = TRUE) ) ) # --- 2. Create the agreement summary using evaluateToolsThresholds --- # This is the object that bestToolAtleast actually takes as input agreementSummary <- evaluateToolsThresholds(summaryList = evaluationSummary) # --- 3. Run bestToolAtleast on the result --- if (!is.null(agreementSummary)) { performance_thresholds <- bestToolAtleast( agreementSummary = agreementSummary ) print(performance_thresholds) }
Computes confusion matrix statistics for predictions generated by various
tool combinations, using the output from evaluateToolCombinations.
bestToolCombination(combinationSummaryList, combinations = NULL, digits = 4)bestToolCombination(combinationSummaryList, combinations = NULL, digits = 4)
combinationSummaryList |
A list generated by |
combinations |
An optional character vector specifying the names of the
tool combinations (e.g., "ToolA+ToolB") to analyze. If |
digits |
The number of decimal places to round final statistics to (default: 4). |
A data frame where rows are performance metrics and columns correspond
to the analyzed tool combinations. Returns NULL on error or if no
statistics can be calculated.
# --- 1. Create a mock object mimicking evaluateToolCombinations output --- set.seed(202) n_seq <- 80 exampleCombinationSummary <- list( isNC = sample(c(0, 1), n_seq, replace = TRUE), toolCombinations = list( `ToolA+ToolB` = sample(c(0, 1), n_seq, replace = TRUE, prob = c(0.7, 0.3)), `ToolA+ToolC` = sample(c(0, 1), n_seq, replace = TRUE, prob = c(0.6, 0.4)), `ToolB+ToolC` = sample(c(0, 1), n_seq, replace = TRUE, prob = c(0.5, 0.5)) ) # Other elements like seqIDs, type would also be present ) # --- 2. Run the function --- # Example 1: Analyze specific combinations perf_specific <- bestToolCombination( combinationSummaryList = exampleCombinationSummary, combinations = c("ToolA+ToolB", "ToolB+ToolC") ) print("Performance for specific combinations:") print(perf_specific) # Example 2: Analyze all combinations (non-interactively) perf_all <- bestToolCombination( combinationSummaryList = exampleCombinationSummary ) print("Performance for all combinations:") print(perf_all)# --- 1. Create a mock object mimicking evaluateToolCombinations output --- set.seed(202) n_seq <- 80 exampleCombinationSummary <- list( isNC = sample(c(0, 1), n_seq, replace = TRUE), toolCombinations = list( `ToolA+ToolB` = sample(c(0, 1), n_seq, replace = TRUE, prob = c(0.7, 0.3)), `ToolA+ToolC` = sample(c(0, 1), n_seq, replace = TRUE, prob = c(0.6, 0.4)), `ToolB+ToolC` = sample(c(0, 1), n_seq, replace = TRUE, prob = c(0.5, 0.5)) ) # Other elements like seqIDs, type would also be present ) # --- 2. Run the function --- # Example 1: Analyze specific combinations perf_specific <- bestToolCombination( combinationSummaryList = exampleCombinationSummary, combinations = c("ToolA+ToolB", "ToolB+ToolC") ) print("Performance for specific combinations:") print(perf_specific) # Example 2: Analyze all combinations (non-interactively) perf_all <- bestToolCombination( combinationSummaryList = exampleCombinationSummary ) print("Performance for all combinations:") print(perf_all)
This function calculates confusion matrices for various tool combinations based
on predictions from evaluateToolCombinations. It can also filter these
matrices based on pre-calculated performance metrics and a given threshold.
calculateCM( combinationSummaryList, metricsData = NULL, printMetricThresholds = FALSE, threshold = 0.8, returnOnlyHigh = FALSE, metricToExtract = "Accuracy" )calculateCM( combinationSummaryList, metricsData = NULL, printMetricThresholds = FALSE, threshold = 0.8, returnOnlyHigh = FALSE, metricToExtract = "Accuracy" )
combinationSummaryList |
A list generated by |
metricsData |
An optional data frame where rows are metric names and
columns are tool combination names (typically the output of
|
printMetricThresholds |
Logical. If |
threshold |
A numeric value (0 to 1) used to evaluate |
returnOnlyHigh |
Logical. If |
metricToExtract |
The performance metric to use for filtering and printing (e.g., "Accuracy", "Sensitivity"). Defaults to "Accuracy". |
A named list where each element represents a confusion matrix for a tool combination. Each element is itself a list containing:
table |
A 2x2 numeric matrix representing the confusion table. |
positive |
A character string indicating the positive class level ("1"). |
metrics |
A named numeric vector of all calculated performance metrics. |
The returned list is filtered if returnOnlyHigh = TRUE.
# --- 1. Create mock data mimicking the outputs of previous functions --- set.seed(202) n_seq <- 100 # a) Mock output from evaluateToolCombinations() mockCombinationSummary <- list( isNC = sample(c(0, 1), n_seq, replace = TRUE), toolCombinations = list( `ToolA+ToolB` = sample(c(0,1), n_seq, replace=TRUE, prob=c(0.2,0.8)), `ToolA+ToolC` = sample(c(0,1), n_seq, replace=TRUE, prob=c(0.6,0.4)), `ToolB+ToolC` = sample(c(0,1), n_seq, replace=TRUE, prob=c(0.5,0.5)) ) ) # b) Mock output from bestToolCombination() mockMetricsData <- bestToolCombination( combinationSummaryList = mockCombinationSummary ) # --- 2. Run calculateCM in different modes --- # Example 1: Calculate all CMs without filtering all_cms <- calculateCM(combinationSummaryList = mockCombinationSummary) print(names(all_cms)) # Inspect the structure of a single element str(all_cms[[1]]) # Example 2: Print and filter based on a Sensitivity threshold >= 0.5 filtered_cms <- calculateCM( combinationSummaryList = mockCombinationSummary, metricsData = mockMetricsData, printMetricThresholds = TRUE, returnOnlyHigh = TRUE, metricToExtract = "Sensitivity", threshold = 0.5 ) print("Filtered CMs (Sensitivity >= 0.5):") print(names(filtered_cms))# --- 1. Create mock data mimicking the outputs of previous functions --- set.seed(202) n_seq <- 100 # a) Mock output from evaluateToolCombinations() mockCombinationSummary <- list( isNC = sample(c(0, 1), n_seq, replace = TRUE), toolCombinations = list( `ToolA+ToolB` = sample(c(0,1), n_seq, replace=TRUE, prob=c(0.2,0.8)), `ToolA+ToolC` = sample(c(0,1), n_seq, replace=TRUE, prob=c(0.6,0.4)), `ToolB+ToolC` = sample(c(0,1), n_seq, replace=TRUE, prob=c(0.5,0.5)) ) ) # b) Mock output from bestToolCombination() mockMetricsData <- bestToolCombination( combinationSummaryList = mockCombinationSummary ) # --- 2. Run calculateCM in different modes --- # Example 1: Calculate all CMs without filtering all_cms <- calculateCM(combinationSummaryList = mockCombinationSummary) print(names(all_cms)) # Inspect the structure of a single element str(all_cms[[1]]) # Example 2: Print and filter based on a Sensitivity threshold >= 0.5 filtered_cms <- calculateCM( combinationSummaryList = mockCombinationSummary, metricsData = mockMetricsData, printMetricThresholds = TRUE, returnOnlyHigh = TRUE, metricToExtract = "Sensitivity", threshold = 0.5 ) print("Filtered CMs (Sensitivity >= 0.5):") print(names(filtered_cms))
This function partitions a vector of sequence names into training and testing
subsets based on a specified percentage. It uses random sampling, so for
reproducible splits, set a seed with set.seed() before calling it.
This single function replaces the previous, separate test.train.cds and
test.train.nc functions.
createTrainTestSets(sequences, percentTrain = 0.6, prefix = "set")createTrainTestSets(sequences, percentTrain = 0.6, prefix = "set")
sequences |
A character vector of sequence names, or an object with a
|
percentTrain |
A numeric value between 0 and 1 indicating the proportion of sequences to allocate to the training set (default: 0.6). |
prefix |
A character string used as a prefix for the names of the
elements in the returned list (default: "set"). For example, |
A list containing two character vectors. The names of the list
elements are constructed using the prefix argument (e.g., cds.train
and cds.test).
# --- Example 1: Splitting CDS sequences (replaces test.train.cds) --- all_cds_names <- paste0("cds_seq_", 1:200) set.seed(123) # for a reproducible split cds_split <- createTrainTestSets( sequences = all_cds_names, percentTrain = 0.7, prefix = "cds" ) names(cds_split) length(cds_split$cds.train) length(cds_split$cds.test) # --- Example 2: Splitting non-coding sequences (replaces test.train.nc) --- # Input can also be a list with names nc_fasta_like <- as.list(paste0("nc_seq_", 1:100)) names(nc_fasta_like) <- paste0("nc_seq_", 1:100) set.seed(456) nc_split <- createTrainTestSets( sequences = nc_fasta_like, percentTrain = 0.5, prefix = "nc" ) names(nc_split) length(nc_split$nc.train) length(nc_split$nc.test)# --- Example 1: Splitting CDS sequences (replaces test.train.cds) --- all_cds_names <- paste0("cds_seq_", 1:200) set.seed(123) # for a reproducible split cds_split <- createTrainTestSets( sequences = all_cds_names, percentTrain = 0.7, prefix = "cds" ) names(cds_split) length(cds_split$cds.train) length(cds_split$cds.test) # --- Example 2: Splitting non-coding sequences (replaces test.train.nc) --- # Input can also be a list with names nc_fasta_like <- as.list(paste0("nc_seq_", 1:100)) names(nc_fasta_like) <- paste0("nc_seq_", 1:100) set.seed(456) nc_split <- createTrainTestSets( sequences = nc_fasta_like, percentTrain = 0.5, prefix = "nc" ) names(nc_split) length(nc_split$nc.train) length(nc_split$nc.test)
Identifies sequences predicted as non-coding by specific combinations (intersections) of tools. It generates prediction vectors for all combinations of 2 or more selected tools.
evaluateToolCombinations(summaryList, tools = NULL)evaluateToolCombinations(summaryList, tools = NULL)
summaryList |
A list generated by |
tools |
An optional character vector specifying which tools to use for
creating combinations. If |
A list containing:
seqIDs |
Original sequence identifiers from the input. |
isNC |
Original true labels (1=nc, 0=cds) from the input. |
type |
Original type annotation ('nc', 'cds') from the input. |
selectedToolsPredictions |
A list of prediction vectors for the tools selected for this analysis. |
toolCombinations |
A list where each element is named after a tool combination (e.g., "ToolA+ToolB") and contains a binary vector (0/1) indicating if a sequence was predicted as non-coding by ALL tools in that combination. |
# --- 1. Create a mock object mimicking the output of prepareEvaluationSets --- set.seed(101) n_seq <- 50 evaluationSummary <- list( seqIDs = paste0("Seq", 1:n_seq), tools = list( ToolX = sample(c(0, 1), n_seq, replace = TRUE), ToolY = sample(c(0, 1), n_seq, replace = TRUE), ToolZ = sample(c(0, 1), n_seq, replace = TRUE) ), type = sample(c("nc", "cds"), n_seq, replace = TRUE), isNC = sample(c(0, 1), n_seq, replace = TRUE) ) # --- 2. Run the function --- # Example 1: Analyze combinations of specific tools results_comb <- evaluateToolCombinations( summaryList = evaluationSummary, tools = c("ToolX", "ToolY", "ToolZ") ) if (!is.null(results_comb)) { print("Names of generated combinations:") print(names(results_comb$toolCombinations)) print("Head of 'ToolX+ToolY' combination results:") print(head(results_comb$toolCombinations[["ToolX+ToolY"]])) } # Example 2: Non-interactively analyze all tools results_all_comb <- evaluateToolCombinations(summaryList = evaluationSummary)# --- 1. Create a mock object mimicking the output of prepareEvaluationSets --- set.seed(101) n_seq <- 50 evaluationSummary <- list( seqIDs = paste0("Seq", 1:n_seq), tools = list( ToolX = sample(c(0, 1), n_seq, replace = TRUE), ToolY = sample(c(0, 1), n_seq, replace = TRUE), ToolZ = sample(c(0, 1), n_seq, replace = TRUE) ), type = sample(c("nc", "cds"), n_seq, replace = TRUE), isNC = sample(c(0, 1), n_seq, replace = TRUE) ) # --- 2. Run the function --- # Example 1: Analyze combinations of specific tools results_comb <- evaluateToolCombinations( summaryList = evaluationSummary, tools = c("ToolX", "ToolY", "ToolZ") ) if (!is.null(results_comb)) { print("Names of generated combinations:") print(names(results_comb$toolCombinations)) print("Head of 'ToolX+ToolY' combination results:") print(head(results_comb$toolCombinations[["ToolX+ToolY"]])) } # Example 2: Non-interactively analyze all tools results_all_comb <- evaluateToolCombinations(summaryList = evaluationSummary)
Calculates agreement statistics for a selection of coding potential tools. For each sequence, it determines if it was predicted as non-coding by at least 'n' of the selected tools, for all possible values of 'n'.
evaluateToolsThresholds(summaryList, tools = NULL)evaluateToolsThresholds(summaryList, tools = NULL)
summaryList |
A list generated by |
tools |
An optional character vector specifying which tools to include
in the agreement analysis. If |
A list containing:
seqIDs |
Original sequence identifiers from the input. |
isNC |
Original true labels (1=nc, 0=cds) from the input. |
type |
Original type annotation ('nc', 'cds') from the input. |
selectedToolsPredictions |
A list containing only the prediction vectors for the tools that were selected for analysis. |
sumsSelectedTools |
An integer vector with the count of selected tools that predicted "non-coding" (1) for each sequence. |
atLeastN |
A list where each element |
# --- 1. Create Example Data (mimicking summarizeTestSet output) --- set.seed(456) n_seq <- 50 exampleSummaryList <- list( seqIDs = paste0("ID", 1:n_seq), tools = list( ToolA = sample(c(0, 1), n_seq, replace = TRUE), ToolB = sample(c(0, 1), n_seq, replace = TRUE), ToolC = sample(c(0, 1), n_seq, replace = TRUE) ), type = sample(c("nc", "cds"), n_seq, replace = TRUE), isNC = sample(c(0, 1), n_seq, replace = TRUE) ) # --- 2. Run the function --- results <- evaluateToolsThresholds( summaryList = exampleSummaryList, tools = c("ToolA", "ToolB") ) if (!is.null(results)) { # Accessing the nested list of thresholds print("Accessing the 'at-least-2' threshold vector:") print(head(results$atLeastN$atl2)) }# --- 1. Create Example Data (mimicking summarizeTestSet output) --- set.seed(456) n_seq <- 50 exampleSummaryList <- list( seqIDs = paste0("ID", 1:n_seq), tools = list( ToolA = sample(c(0, 1), n_seq, replace = TRUE), ToolB = sample(c(0, 1), n_seq, replace = TRUE), ToolC = sample(c(0, 1), n_seq, replace = TRUE) ), type = sample(c("nc", "cds"), n_seq, replace = TRUE), isNC = sample(c(0, 1), n_seq, replace = TRUE) ) # --- 2. Run the function --- results <- evaluateToolsThresholds( summaryList = exampleSummaryList, tools = c("ToolA", "ToolB") ) if (!is.null(results)) { # Accessing the nested list of thresholds print("Accessing the 'at-least-2' threshold vector:") print(head(results$atLeastN$atl2)) }
This function reads and filters the output file from FEELnc (.classes file)
to identify potential cis-interactions between lncRNAs and mRNAs based on
user-specified criteria such as genomic distance and specific gene/transcript lists.
findCisInteractions( FEELncClassesFile, lncRnas = NULL, mRnas = NULL, filterIsBest = TRUE, lncRnaLevel = c("transcript", "gene"), mRnaLevel = c("gene", "transcript"), maxDist = 1e+05 )findCisInteractions( FEELncClassesFile, lncRnas = NULL, mRnas = NULL, filterIsBest = TRUE, lncRnaLevel = c("transcript", "gene"), mRnaLevel = c("gene", "transcript"), maxDist = 1e+05 )
FEELncClassesFile |
A character string specifying the path to the
FEELnc |
lncRnas |
An optional character vector of lncRNA IDs (gene or transcript)
to filter the results. If |
mRnas |
An optional character vector of mRNA IDs (gene or transcript)
to filter the results. If |
filterIsBest |
Logical. If |
lncRnaLevel |
A character string specifying the level for lncRNA filtering:
|
mRnaLevel |
A character string specifying the level for mRNA filtering:
|
maxDist |
A numeric value for the maximum distance (in base pairs) to consider for a cis-interaction (default: 100,000). |
A data.frame containing the filtered cis-interaction data with
renamed columns (lncRNAId, targetRNAId).
# --- 1. Create a temporary FEELnc output file for a reproducible example --- feelncFile <- tempfile() mock_data <- data.frame( isBest = c(1, 1, 0, 1, 1), lncRNA_gene = c("LNC_G1", "LNC_G1", "LNC_G2", "LNC_G3", "LNC_G4"), lncRNA_transcript = c("LNC_T1", "LNC_T2", "LNC_T3", "LNC_T4", "LNC_T5"), partnerRNA_gene = c("TARGET_G1", "TARGET_G2", "TARGET_G1", "TARGET_G3", "TARGET_G4"), partnerRNA_transcript = c("T_T1", "T_T2", "T_T3", "T_T4", "T_T5"), distance = c(5000, 80000, 1000, 120000, 9000) ) utils::write.table(mock_data, feelncFile, sep = "\t", row.names = FALSE, col.names = TRUE) # --- 2. Define lncRNA and mRNA lists for filtering --- lncRnaList <- c("LNC_T1", "LNC_T4") # Filter by transcript ID mRnaList <- c("TARGET_G1") # Filter by gene ID # --- 3. Run the function --- cis_interactions <- findCisInteractions( FEELncClassesFile = feelncFile, lncRnas = lncRnaList, mRnas = mRnaList, lncRnaLevel = "transcript", mRnaLevel = "gene", maxDist = 100000 ) print(cis_interactions) # --- 4. Clean up the temporary file --- unlink(feelncFile)# --- 1. Create a temporary FEELnc output file for a reproducible example --- feelncFile <- tempfile() mock_data <- data.frame( isBest = c(1, 1, 0, 1, 1), lncRNA_gene = c("LNC_G1", "LNC_G1", "LNC_G2", "LNC_G3", "LNC_G4"), lncRNA_transcript = c("LNC_T1", "LNC_T2", "LNC_T3", "LNC_T4", "LNC_T5"), partnerRNA_gene = c("TARGET_G1", "TARGET_G2", "TARGET_G1", "TARGET_G3", "TARGET_G4"), partnerRNA_transcript = c("T_T1", "T_T2", "T_T3", "T_T4", "T_T5"), distance = c(5000, 80000, 1000, 120000, 9000) ) utils::write.table(mock_data, feelncFile, sep = "\t", row.names = FALSE, col.names = TRUE) # --- 2. Define lncRNA and mRNA lists for filtering --- lncRnaList <- c("LNC_T1", "LNC_T4") # Filter by transcript ID mRnaList <- c("TARGET_G1") # Filter by gene ID # --- 3. Run the function --- cis_interactions <- findCisInteractions( FEELncClassesFile = feelncFile, lncRnas = lncRnaList, mRnas = mRnaList, lncRnaLevel = "transcript", mRnaLevel = "gene", maxDist = 100000 ) print(cis_interactions) # --- 4. Clean up the temporary file --- unlink(feelncFile)
Hmisc and tidyr
packages.Estimates trans-interactions between lncRNAs and target RNAs based on
expression correlation. This function requires the Hmisc and tidyr
packages.
findTransInteractions( exprMatrix, corMethod = "pearson", rval = 0.7, pval = 0.05, lncRnaList = NULL, tarRnaList = NULL, fullCorMatrixFile = NULL )findTransInteractions( exprMatrix, corMethod = "pearson", rval = 0.7, pval = 0.05, lncRnaList = NULL, tarRnaList = NULL, fullCorMatrixFile = NULL )
exprMatrix |
A numeric matrix or data.frame of expression values. Rownames should contain gene/transcript IDs and columns should be samples. |
corMethod |
Correlation method: |
rval |
The cutoff for the correlation coefficient (default: 0.7). |
pval |
The cutoff for the p-value (default: 0.05). |
lncRnaList |
A list of lncRNA gene/transcript IDs. Must be present in
|
tarRnaList |
A list of target gene/transcript IDs. Must be present in
|
fullCorMatrixFile |
An optional file path to save the full correlation matrix. |
A data.frame of significant trans-interactions with columns for
lncRNA ID, target RNA ID, r-value, and p-value.
# --- 1. Create a mock expression matrix --- set.seed(123) lnc_genes <- paste0("LNC", 1:5) target_genes <- paste0("TARGET", 1:20) all_genes <- c(lnc_genes, target_genes) mockExprMatrix <- matrix(rnorm(25 * 10), nrow = 25, ncol = 10, dimnames = list(all_genes, paste0("Sample", 1:10))) mockExprMatrix["LNC1", ] <- mockExprMatrix["TARGET1", ] * 2 + rnorm(10, 0, 0.2) # --- 2. Run the function --- trans_interactions <- findTransInteractions( exprMatrix = mockExprMatrix, lncRnaList = lnc_genes, tarRnaList = target_genes, rval = 0.9, pval = 0.05 ) print(trans_interactions)# --- 1. Create a mock expression matrix --- set.seed(123) lnc_genes <- paste0("LNC", 1:5) target_genes <- paste0("TARGET", 1:20) all_genes <- c(lnc_genes, target_genes) mockExprMatrix <- matrix(rnorm(25 * 10), nrow = 25, ncol = 10, dimnames = list(all_genes, paste0("Sample", 1:10))) mockExprMatrix["LNC1", ] <- mockExprMatrix["TARGET1", ] * 2 + rnorm(10, 0, 0.2) # --- 2. Run the function --- trans_interactions <- findTransInteractions( exprMatrix = mockExprMatrix, lncRnaList = lnc_genes, tarRnaList = target_genes, rval = 0.9, pval = 0.05 ) print(trans_interactions)
This function extracts unique biotypes for genes or transcripts from a
GRanges object, typically imported from a reference GTF/GFF file.
getBiotypes(refGtf, level = "transcript")getBiotypes(refGtf, level = "transcript")
refGtf |
A |
level |
A character string specifying whether to extract information for
|
A DataFrame (from the S4Vectors package) with two columns:
the identifier (gene_id or transcript_id) and the corresponding
biotype (gene_biotype or transcript_biotype).
# --- 1. Create a sample GRanges object mimicking a reference GTF --- sampleRefGtf <- GenomicRanges::GRanges( seqnames = "chr1", ranges = IRanges::IRanges(start = 1:6, width = 100), gene_id = c("G1", "G1", "G2", "G2", "G3", "G3"), gene_biotype = c("protein_coding", "protein_coding", "lncRNA", "lncRNA", "pseudogene", "pseudogene"), transcript_id = c("T1.1", "T1.2", "T2.1", "T2.1", "T3.1", "T3.1"), transcript_biotype = c("protein_coding", "protein_coding_variant", "lncRNA", "lncRNA", "pseudogene", "pseudogene") ) # --- 2. Extract biotypes at the transcript level --- transcript_biotypes <- getBiotypes(refGtf = sampleRefGtf, level = "transcript") print(transcript_biotypes) # --- 3. Extract biotypes at the gene level --- gene_biotypes <- getBiotypes(refGtf = sampleRefGtf, level = "gene") print(gene_biotypes)# --- 1. Create a sample GRanges object mimicking a reference GTF --- sampleRefGtf <- GenomicRanges::GRanges( seqnames = "chr1", ranges = IRanges::IRanges(start = 1:6, width = 100), gene_id = c("G1", "G1", "G2", "G2", "G3", "G3"), gene_biotype = c("protein_coding", "protein_coding", "lncRNA", "lncRNA", "pseudogene", "pseudogene"), transcript_id = c("T1.1", "T1.2", "T2.1", "T2.1", "T3.1", "T3.1"), transcript_biotype = c("protein_coding", "protein_coding_variant", "lncRNA", "lncRNA", "pseudogene", "pseudogene") ) # --- 2. Extract biotypes at the transcript level --- transcript_biotypes <- getBiotypes(refGtf = sampleRefGtf, level = "transcript") print(transcript_biotypes) # --- 3. Extract biotypes at the gene level --- gene_biotypes <- getBiotypes(refGtf = sampleRefGtf, level = "gene") print(gene_biotypes)
This function takes a GRanges object (imported from a GTF file) and calculates the number of exons and total transcript length for each transcript.
getGtfStats(gtfObject)getGtfStats(gtfObject)
gtfObject |
A |
A data.frame with columns: "transcript_id", "exons", and
"trans_length".
# Create a sample GRanges object to mimic a GTF import sample_gtf <- GenomicRanges::GRanges( seqnames = "chr1", ranges = IRanges::IRanges( start = c(100, 300, 800, 950), end = c(200, 400, 900, 1050) ), strand = "+", type = "exon", transcript_id = c("T1", "T1", "T2", "T2"), exon_number = c("1", "2", "1", "20") # Example with exon_number > 9 ) # Calculate statistics transcript_stats <- getGtfStats(sample_gtf) print(transcript_stats)# Create a sample GRanges object to mimic a GTF import sample_gtf <- GenomicRanges::GRanges( seqnames = "chr1", ranges = IRanges::IRanges( start = c(100, 300, 800, 950), end = c(200, 400, 900, 1050) ), strand = "+", type = "exon", transcript_id = c("T1", "T1", "T2", "T2"), exon_number = c("1", "2", "1", "20") # Example with exon_number > 9 ) # Calculate statistics transcript_stats <- getGtfStats(sample_gtf) print(transcript_stats)
Generates circular bar plots (clock plots) to visualize performance metrics. Supports both single and multiple plot layouts for comparing methods.
plotClockMetrics( cmList, methods = NULL, metrics = c("Accuracy", "Sensitivity", "Specificity", "Precision", "Recall"), plotTitle = "Clock Plot of Metrics", colors = NULL, layout = c("single", "multiple"), ... )plotClockMetrics( cmList, methods = NULL, metrics = c("Accuracy", "Sensitivity", "Specificity", "Precision", "Recall"), plotTitle = "Clock Plot of Metrics", colors = NULL, layout = c("single", "multiple"), ... )
cmList |
A named list where each element represents a confusion matrix,
typically the output from |
methods |
An optional character vector of method names to include.
If |
metrics |
A character vector of metrics to display. Defaults to
|
plotTitle |
The title for the plot(s). |
colors |
An optional vector of colors for each method. |
layout |
The layout of the plots: |
... |
Additional arguments (not currently used). |
A ggplot object (for layout = "single") or a patchwork
object (for layout = "multiple"), which can be printed to display.
# --- 1. Create a mock cmList object (as from calculateCM) --- set.seed(123) mockCmList <- list( `MethodA` = list(metrics = c(Accuracy=0.9, Sensitivity=0.8, Specificity=0.95, Precision=0.85, Recall=0.8)), `MethodB` = list(metrics = c(Accuracy=0.8, Sensitivity=0.9, Specificity=0.7, Precision=0.75, Recall=0.9)) ) # --- 2. Run the plot function --- # Example 1: Single plot comparing selected methods plotClockMetrics( cmList = mockCmList, methods = c("MethodA", "MethodB"), plotTitle = "Comparison Clock Plot" ) # Example 2: Multiple plots, one for each method plotClockMetrics(cmList = mockCmList, layout = "multiple")# --- 1. Create a mock cmList object (as from calculateCM) --- set.seed(123) mockCmList <- list( `MethodA` = list(metrics = c(Accuracy=0.9, Sensitivity=0.8, Specificity=0.95, Precision=0.85, Recall=0.8)), `MethodB` = list(metrics = c(Accuracy=0.8, Sensitivity=0.9, Specificity=0.7, Precision=0.75, Recall=0.9)) ) # --- 2. Run the plot function --- # Example 1: Single plot comparing selected methods plotClockMetrics( cmList = mockCmList, methods = c("MethodA", "MethodB"), plotTitle = "Comparison Clock Plot" ) # Example 2: Multiple plots, one for each method plotClockMetrics(cmList = mockCmList, layout = "multiple")
Generates radar plots to visualize and compare performance metrics for different methods or tool combinations. Supports a single plot for comparing multiple methods or a grid of plots for individual evaluation.
plotRadarMetrics( cmList, methods = NULL, metrics = c("Accuracy", "Sensitivity", "Specificity", "Precision", "Recall"), plotTitle = "Radar Plot of Metrics", colors = NULL, layout = c("single", "multiple"), displayArea = FALSE, displayFill = TRUE, saveData = FALSE, fileName = NULL, ... )plotRadarMetrics( cmList, methods = NULL, metrics = c("Accuracy", "Sensitivity", "Specificity", "Precision", "Recall"), plotTitle = "Radar Plot of Metrics", colors = NULL, layout = c("single", "multiple"), displayArea = FALSE, displayFill = TRUE, saveData = FALSE, fileName = NULL, ... )
cmList |
A named list where each element represents a confusion matrix,
typically the output from |
methods |
An optional character vector of method names (from |
metrics |
A character vector of metrics to visualize on the radar plot axes.
Defaults to |
plotTitle |
A character string for the plot title. |
colors |
An optional character vector of colors for each method. If |
layout |
Specifies the plot layout: |
displayArea |
Logical. If |
displayFill |
Logical. If |
saveData |
Logical. If |
fileName |
The name of the file for saving data (required if |
... |
Additional arguments passed to |
This function is called for its side effect of generating plots and
does not return a value (invisible(NULL)).
# --- 1. Create a mock cmList object (as from calculateCM) --- set.seed(123) mockCmList <- list( `MethodA` = list(metrics = c(Accuracy=0.9, Sensitivity=0.8, Specificity=0.95, Precision=0.85, Recall=0.8)), `MethodB` = list(metrics = c(Accuracy=0.8, Sensitivity=0.9, Specificity=0.7, Precision=0.75, Recall=0.9)), `MethodC` = list(metrics = c(Accuracy=0.85, Sensitivity=0.85, Specificity=0.85, Precision=0.85, Recall=0.85)) ) # --- 2. Run the plot function --- # To prevent plots from showing up during automated checks, we wrap in a device temp_png <- tempfile(fileext = ".png") png(temp_png) # Example 1: Single plot comparing selected methods plotRadarMetrics( cmList = mockCmList, methods = c("MethodA", "MethodC"), plotTitle = "Comparison Plot" ) # Example 2: Multiple plots, one for each method plotRadarMetrics(cmList = mockCmList, layout = "multiple") dev.off() unlink(temp_png)# --- 1. Create a mock cmList object (as from calculateCM) --- set.seed(123) mockCmList <- list( `MethodA` = list(metrics = c(Accuracy=0.9, Sensitivity=0.8, Specificity=0.95, Precision=0.85, Recall=0.8)), `MethodB` = list(metrics = c(Accuracy=0.8, Sensitivity=0.9, Specificity=0.7, Precision=0.75, Recall=0.9)), `MethodC` = list(metrics = c(Accuracy=0.85, Sensitivity=0.85, Specificity=0.85, Precision=0.85, Recall=0.85)) ) # --- 2. Run the plot function --- # To prevent plots from showing up during automated checks, we wrap in a device temp_png <- tempfile(fileext = ".png") png(temp_png) # Example 1: Single plot comparing selected methods plotRadarMetrics( cmList = mockCmList, methods = c("MethodA", "MethodC"), plotTitle = "Comparison Plot" ) # Example 2: Multiple plots, one for each method plotRadarMetrics(cmList = mockCmList, layout = "multiple") dev.off() unlink(temp_png)
Visualizes genomic interaction data (lncRNAId -> Target -> Functional Term)
as an interactive Sankey diagram using Plotly. Replaces plot_by_terms,
plot_by_lnc, etc., providing a unified interface for filtering and plotting.
plotSankeyInteractions( interactionData, groupBy = NULL, selectIds = NULL, showLabels = FALSE, highlightSelected = FALSE, color = NULL, title = NULL )plotSankeyInteractions( interactionData, groupBy = NULL, selectIds = NULL, showLabels = FALSE, highlightSelected = FALSE, color = NULL, title = NULL )
interactionData |
A |
groupBy |
A character string specifying the filtering criteria. Valid
options are: |
selectIds |
Optional character vector of IDs/terms to filter or highlight.
If |
showLabels |
Logical. If |
highlightSelected |
Logical. If |
color |
Optional character string. A single color for all nodes/links.
If |
title |
Optional character string for the plot title. |
A plotly object containing the Sankey diagram.
# --- 1. Create mock interaction data --- mockData <- data.frame( lncRNAId = c(rep("LNC1", 3), rep("LNC2", 2)), intersection = c("T1", "T2", "T1", "T3", "T2"), term_name = c("Stress", "Growth", "Stress", "Immunity", "Growth"), type = c("cis", "cis", "trans", "trans", "cis"), stringsAsFactors = FALSE ) # --- 2. Run in non-interactive mode --- # Example 1: Filter by specific Term fig1 <- plotSankeyInteractions( interactionData = mockData, groupBy = "term", selectIds = "Stress", title = "Stress Interactions" ) # fig1 # Example 2: Highlight specific lncRNAId fig2 <- plotSankeyInteractions( interactionData = mockData, groupBy = "lncRNAId", selectIds = "LNC1", highlightSelected = TRUE, title = "Highlighting LNC1" ) # fig2# --- 1. Create mock interaction data --- mockData <- data.frame( lncRNAId = c(rep("LNC1", 3), rep("LNC2", 2)), intersection = c("T1", "T2", "T1", "T3", "T2"), term_name = c("Stress", "Growth", "Stress", "Immunity", "Growth"), type = c("cis", "cis", "trans", "trans", "cis"), stringsAsFactors = FALSE ) # --- 2. Run in non-interactive mode --- # Example 1: Filter by specific Term fig1 <- plotSankeyInteractions( interactionData = mockData, groupBy = "term", selectIds = "Stress", title = "Stress Interactions" ) # fig1 # Example 2: Highlight specific lncRNAId fig2 <- plotSankeyInteractions( interactionData = mockData, groupBy = "lncRNAId", selectIds = "LNC1", highlightSelected = TRUE, title = "Highlighting LNC1" ) # fig2
Generates a Venn diagram of noncoding predictions from multiple tools, based
on the output of aggregateCodPot. This function requires the 'venn' package,
which should be declared in the Suggests field of the DESCRIPTION file.
plotVennCodPot( codPot, selection = NULL, vennColors = grDevices::palette.colors(n = 9, palette = "Okabe-Ito") )plotVennCodPot( codPot, selection = NULL, vennColors = grDevices::palette.colors(n = 9, palette = "Okabe-Ito") )
codPot |
A list object generated by |
selection |
An optional logical vector indicating which tools to include
(TRUE for include, FALSE for exclude). If |
vennColors |
A vector of colors for the Venn diagram segments. |
Invisibly returns the result of venn::venn. The primary effect is
plotting the diagram to the active graphics device.
# Example with all tools mockCodPot <- list( seqIDs = c("tx1", "tx2", "tx3", "tx4"), tools = list( CPC2 = c(0, 1, 1, 0), PLEK = c(0, 1, 0, 1) ) ) if (requireNamespace("venn", quietly = TRUE)) { plotVennCodPot(codPot = mockCodPot) }# Example with all tools mockCodPot <- list( seqIDs = c("tx1", "tx2", "tx3", "tx4"), tools = list( CPC2 = c(0, 1, 1, 0), PLEK = c(0, 1, 0, 1) ) ) if (requireNamespace("venn", quietly = TRUE)) { plotVennCodPot(codPot = mockCodPot) }
Filters coding potential results to include only sequences present in
provided test sets. It annotates each sequence as non-coding (nc) or
protein-coding (cds), creating a summary object ready for various evaluation
functions (e.g., bestTool, bestToolAtleast).
prepareEvaluationSets(codPotList, ncTest, cdsTest)prepareEvaluationSets(codPotList, ncTest, cdsTest)
codPotList |
A list object, typically from |
ncTest |
A character vector of sequence IDs known to be non-coding. |
cdsTest |
A character vector of sequence IDs known to be protein-coding. |
A list containing elements filtered to include only sequences from
ncTest or cdsTest:
seqIDs |
Filtered character vector of sequence IDs. |
tools |
Filtered list of tool prediction vectors. |
type |
Character vector with annotation ("nc" or "cds"). |
isNC |
Integer vector: 1 if type is "nc", 0 if "cds". |
sums |
Integer vector with the sum of predictions across all tools. |
# --- 1. Create mock data mimicking package outputs --- # a) Output from aggregateCodPot() mockCodPotList <- list( seqIDs = c("nc_seq1", "cds_seq1", "nc_seq2", "other_seq"), tools = list( CPC2 = c(1, 0, 1, 1), PLEK = c(1, 1, 0, 0) ) ) # b) Outputs from createTrainTestSets() mockNcSets <- list( nc.train = c("nc_seq3", "nc_seq4"), nc.test = c("nc_seq1", "nc_seq2") ) mockCdsSets <- list( cds.train = c("cds_seq2"), cds.test = c("cds_seq1") ) # --- 2. Run the function to prepare the evaluation set --- evaluationSummary <- prepareEvaluationSets( codPotList = mockCodPotList, ncTest = mockNcSets$nc.test, cdsTest = mockCdsSets$cds.test ) # --- 3. Inspect the prepared data --- # Note: "other_seq" was filtered out as it was not in the test sets. print(evaluationSummary)# --- 1. Create mock data mimicking package outputs --- # a) Output from aggregateCodPot() mockCodPotList <- list( seqIDs = c("nc_seq1", "cds_seq1", "nc_seq2", "other_seq"), tools = list( CPC2 = c(1, 0, 1, 1), PLEK = c(1, 1, 0, 0) ) ) # b) Outputs from createTrainTestSets() mockNcSets <- list( nc.train = c("nc_seq3", "nc_seq4"), nc.test = c("nc_seq1", "nc_seq2") ) mockCdsSets <- list( cds.train = c("cds_seq2"), cds.test = c("cds_seq1") ) # --- 2. Run the function to prepare the evaluation set --- evaluationSummary <- prepareEvaluationSets( codPotList = mockCodPotList, ncTest = mockNcSets$nc.test, cdsTest = mockCdsSets$cds.test ) # --- 3. Inspect the prepared data --- # Note: "other_seq" was filtered out as it was not in the test sets. print(evaluationSummary)