Title: | circRNAprofiler: An R-Based Computational Framework for the Downstream Analysis of Circular RNAs |
---|---|
Description: | R-based computational framework for a comprehensive in silico analysis of circRNAs. This computational framework allows to combine and analyze circRNAs previously detected by multiple publicly available annotation-based circRNA detection tools. It covers different aspects of circRNAs analysis from differential expression analysis, evolutionary conservation, biogenesis to functional analysis. |
Authors: | Simona Aufiero |
Maintainer: | Simona Aufiero <[email protected]> |
License: | GPL-3 |
Version: | 1.21.0 |
Built: | 2024-12-19 06:00:13 UTC |
Source: | https://github.com/bioc/circRNAprofiler |
A data frame containing the AnnotationHub id related to the *.chain.gz file, specific for each species and genome. The file name reflects the assembly conversion data contained within in the format <db1>To<Db2>.over.chain.gz. For example, a file named mm10ToHg19.over.chain.gz file contains the liftOver data needed to convert mm10 (Mouse GRCm38) coordinates to hg19 (Human GRCh37).
data(ahChainFiles)
data(ahChainFiles)
A data frame with 1113 rows and 2 columns
is the AnnotationHub id
is the *.chain.gz file
data(ahChainFiles)
data(ahChainFiles)
A data frame containing the AnnotationHub id related to the repeatMasker db, specific for each species and genome.
data(ahRepeatMasker)
data(ahRepeatMasker)
A data frame with 84 rows and 3 columns
is the AnnotationHub id
is the species
is the genome assembly
data(ahRepeatMasker)
data(ahRepeatMasker)
The function annotateBSJs() annotates the circRNA structure and the introns flanking the corresponding back-spliced junctions. The genomic features are extracted from the user provided gene annotation.
annotateBSJs( backSplicedJunctions, gtf, isRandom = FALSE, pathToTranscripts = NULL )
annotateBSJs( backSplicedJunctions, gtf, isRandom = FALSE, pathToTranscripts = NULL )
backSplicedJunctions |
A data frame containing the back-spliced junction
coordinates (e.g. detected or randomly selected).
For detected back-spliced junctions see |
gtf |
A data frame containing genome annotation information. It can be
generated with |
isRandom |
A logical indicating whether the back-spliced junctions have
been randomly generated with |
pathToTranscripts |
A string containing the path to the transcripts.txt file. The file transcripts.txt contains the transcript ids of the circRNA host gene to analyze. It must have one column with header id. By default pathToTranscripts is set to NULL and the file it is searched in the working directory. If transcripts.txt is located in a different directory then the path needs to be specified. If this file is empty or absent the longest transcript of the circRNA host gene containing the back-spliced junctions are considered in the annotation analysis. |
A data frame.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf)
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf)
The function annotateRepeats() annotates repetitive elements
located in the region flanking the back-spliced junctions of each circRNA.
Repetitive elements are provided by AnnotationHub storage which
collected repeats from RepeatMasker database. See AnnotationHub
and http://www.repeatmasker.org for more details.
An empty list is returned if none overlapping repeats are found.
annotateRepeats(targets, annotationHubID = "AH5122", complementary = TRUE)
annotateRepeats(targets, annotationHubID = "AH5122", complementary = TRUE)
targets |
A list containing the target regions to analyze.
It can be generated with |
annotationHubID |
A string specifying the AnnotationHub id to use. Type data(ahRepeatMasker) to see all possible options. E.g. if AH5122 is specified, repetitive elements from Homo sapiens, genome hg19 will be downloaded and annotated. Default value is "AH5122". |
complementary |
A logical specifying whether to filter and report only back-spliced junctions of circRNAs which flanking introns contain complementary repeats, that is, repeats belonging to a same family but located on opposite strands. |
A list.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve targets targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Annotate repeats # repeats <- annotateRepeats(targets, annotationHubID = "AH5122", complementary = TRUE) }
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve targets targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Annotate repeats # repeats <- annotateRepeats(targets, annotationHubID = "AH5122", complementary = TRUE) }
The function annotateSNPsGWAS() annotates GWAS SNPs located in the region flanking the back-spliced junctions of each circRNA. SNPs information including the corresponding genomic coordinates are retrieved from the GWAs catalog database. The user can restric the analysis to specific traits/diseases. These must go in the file traits.txt. If this file is absent or empty, all traits in the GWAS catalog are considered in the analysis. An empty list is returned if none overlapping SNPs are found.
annotateSNPsGWAS( targets, assembly = "hg19", makeCurrent = TRUE, pathToTraits = NULL )
annotateSNPsGWAS( targets, assembly = "hg19", makeCurrent = TRUE, pathToTraits = NULL )
targets |
A list containing the target regions to analyze.
It can be generated with |
assembly |
A string specifying the human genome assembly. Possible options are hg38 or hg19. Current image for GWAS SNPS coordinates is hg38. If hg19 is specified SNPs coordinates are realtime liftOver to hg19 coordinates. |
makeCurrent |
A logical specifying whether to download the
current image of the GWAS catalog. If TRUE is specified, the function
|
pathToTraits |
A string containing the path to the traits.txt file. The file traits.txt contains diseases/traits specified by the user. It must have one column with header id. By default pathToTraits is set to NULL and the file it is searched in the working directory. If traits.txt is located in a different directory then the path needs to be specified. If this file is absent or empty SNPs associated with all diseases/traits in the GWAS catalog are considered in the SNPs analysis. |
A list.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve targets targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Annotate GWAS SNPs - slow #snpsGWAS <- annotateSNPsGWAS(targets, makeCurrent = TRUE, assembly = "hg19") }
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve targets targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Annotate GWAS SNPs - slow #snpsGWAS <- annotateSNPsGWAS(targets, makeCurrent = TRUE, assembly = "hg19") }
A data frame containing the species for which RBP motifs are available in ATtRACT db. See also http://attract.cnic.es.
data(attractSpecies)
data(attractSpecies)
A data frame with 37 rows and 1 columns
is the species
data(attractSpecies)
data(attractSpecies)
A data frame containing genomic coordinates (genome assembly hg19) of
circRNAs detected by three detection tools (CircMarker(cm), MapSplice2 (m)
and NCLscan (n)) in the human left ventricle tissues of 3 controls, 3
patients with dilated cardiomyopathies (DCM) and 3 patients with hypertrophic
cardiomyopathies (HCM). This data frame was generated with
getBackSplicedJunctions
.
data(backSplicedJunctions)
data(backSplicedJunctions)
A data frame with 63521 rows and 16 columns
Unique identifier
is the gene name whose exon coordinates overlap that of the given back-spliced junctions
is the strand where the gene is transcribed
the chromosome from which the circRNA is derived
is the 5' coordinate of the upstream back-spliced exon in the transcript. This corresponds to the back-spliced junction / acceptor site
is the 3' coordinate of the downstream back-spliced exon in the transcript. This corresponds to the back-spliced junction / donor site
are the tools that identified the back-spliced junctions
Number of occurences of each circRNA in control 1
Number of occurences of each circRNA in control 2
Number of occurences of each circRNA in control 3
Number of occurences of each circRNA in DCM 1
Number of occurences of each circRNA in DCM 2
Number of occurences of each circRNA in DCM 3
Number of occurences of each circRNA in HCM 1
Number of occurences of each circRNA in HCM 2
Number of occurences of each circRNA in HCM 3
data(backSplicedJunctions)
data(backSplicedJunctions)
The function checkProjectFolder() verifies that the
project folder is set up correctly. It checks that the mandatory files
(.gtf file, the folders with the circRNAs_X.txt files and experiemnt.txt)
are present in the working directory.The function
initCircRNAprofiler
can be used to initialize the project folder.
checkProjectFolder( pathToExperiment = NULL, pathToGTF = NULL, pathToMotifs = NULL, pathToMiRs = NULL, pathToTranscripts = NULL, pathToTraits = NULL )
checkProjectFolder( pathToExperiment = NULL, pathToGTF = NULL, pathToMotifs = NULL, pathToMiRs = NULL, pathToTranscripts = NULL, pathToTraits = NULL )
pathToExperiment |
A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers: - label (1st column): unique names of the samples (short but informative). - fileName (2nd column): name of the input files - e.g. circRNAs_X.txt, where x can be can be 001, 002 etc. - group (3rd column): biological conditions - e.g. A or B; healthy or diseased if you have only 2 conditions. By default pathToExperiment is set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified. |
pathToGTF |
A string containing the path to the the GTF file. Use the same annotation file used during the RNA-seq mapping procedure. By default pathToGTF is set to NULL and the file it is searched in the working directory. If .gtf is located in a different directory then the path needs to be specified. |
pathToMotifs |
A string containing the path to the motifs.txt file. The file motifs.txt contains motifs/regular expressions specified by the user. It must have 3 columns with headers: - id (1st column): name of the motif. - e.g. RBM20 or motif1. - motif (2nd column): motif/pattern to search. - length (3rd column): length of the motif. By default pathToMotifs is set to NULL and the file it is searched in the working directory. If motifs.txt is located in a different directory then the path needs to be specified. If this file is absent or empty only the motifs of RNA Binding Proteins in the ATtRACT or MEME database are considered in the motifs analysis. |
pathToMiRs |
A string containing the path to the miRs.txt file. The file miRs.txt contains the microRNA ids from miRBase specified by the user. It must have one column with header id. The first row must contain the miR name starting with the ">", e.g >hsa-miR-1-3p. The sequences of the miRs will be automatically retrieved from the mirBase latest release or from the given mature.fa file, that should be present in the working directory. By default pathToMiRs is set to NULL and the file it is searched in the working directory. If miRs.txt is located in a different directory then the path needs to be specified. If this file is absent or empty, all miRs of the species specified in input are considered in the miRNA analysis. |
pathToTranscripts |
A string containing the path to the transcripts.txt file. The file transcripts.txt contains the transcript ids of the circRNA host gene to analyze. It must have one column with header id. By default pathToTranscripts is set to NULL and the file it is searched in the working directory. If transcripts.txt is located in a different directory then the path needs to be specified. If this file is empty or absent the longest transcript of the circRNA host gene containing the back-spliced junctions are considered in the annotation analysis. |
pathToTraits |
A string containing the path to the traits.txt file. contains diseases/traits specified by the user. It must have one column with header id. By default pathToTraits is set to NULL and the file it is searched in the working directory. If traits.txt is located in a different directory then the path needs to be specified. If this file is absent or empty SNPs associated with all diseases/traits in the GWAS catalog are considered in the SNPs analysis. |
An integer. If equals to 0 the project folder is correctly set up.
checkProjectFolder()
checkProjectFolder()
The functions filterCirc() filters circRNAs on different criteria: condition and read counts. The info reported in experiment.txt file are needed for filtering step.
filterCirc( backSplicedJunctions, allSamples = FALSE, min = 3, pathToExperiment = NULL )
filterCirc( backSplicedJunctions, allSamples = FALSE, min = 3, pathToExperiment = NULL )
backSplicedJunctions |
A data frame containing back-spliced junction
coordinates and counts. See |
allSamples |
A string specifying whether to apply the filter to all samples. Default valu is FALSE. |
min |
An integer specifying the read counts cut-off. If allSamples = TRUE and min = 0 all circRNAs are kept. If allSamples = TRUE and min = 3, a circRNA is kept if all samples have at least 3 counts. If allSamples = FALSE and min = 2 the filter is applied to the samples of each condition separately meaning that a circRNA is kept if at least 2 counts are present in all sample of 1 of the conditions. Default value is 3. |
pathToExperiment |
A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers:
By default pathToExperiment is set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified. |
A data frame.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filteredCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment)
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filteredCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment)
The function formatGTF() formats the given annotation file.
formatGTF(pathToGTF = NULL)
formatGTF(pathToGTF = NULL)
pathToGTF |
A string containing the path to the GTF file. Use the same annotation file used during the RNA-seq mapping procedure. If .gtf file is not present in the current working directory the full path should be specified. |
A data frame.
gtf <- formatGTF()
gtf <- formatGTF()
The function getBackSplicedJunctions() reads the circRNAs_X.txt with the detected circRNAs, adapts the content and generates a unique data frame with all circRNAs identified by each circRNA detection tool and the occurrences found in each sample (named as reported in the column label in experiment.txt).
getBackSplicedJunctions(gtf, pathToExperiment = NULL)
getBackSplicedJunctions(gtf, pathToExperiment = NULL)
gtf |
A data frame containing the annotation information. It can be
generated with |
pathToExperiment |
A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers:
By default pathToExperiment is set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified. |
A data frame.
backSplicedJunctions
for a description of the data frame
containing back-spliced junctions coordinates.
check <- checkProjectFolder() if(check == 0){ # Create gtf object gtf <- formatGTF(pathToGTF) # Read and adapt detected circRNAs backSplicedJunctions<- getBackSplicedJunctions(gtf)}
check <- checkProjectFolder() if(check == 0){ # Create gtf object gtf <- formatGTF(pathToGTF) # Read and adapt detected circRNAs backSplicedJunctions<- getBackSplicedJunctions(gtf)}
The function getCircSeqs() retrieves the circRNA sequences. The circRNA sequence is given by the sequences of the exons in between the back-spliced-junctions.The exon sequences are retrieved and then concatenated together to recreate the circRNA sequence. To recreate the back-spliced junction sequence 50 nucleotides are taken from the 5' head and attached at the 3' tail of each circRNA sequence.
getCircSeqs(annotatedBSJs, gtf, genome)
getCircSeqs(annotatedBSJs, gtf, genome)
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
It can be generated with |
gtf |
A data frame containing genome annotation information. It can be
generated with |
genome |
A BSgenome object containing the genome sequences.
It can be generated with |
A list.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getCircSeqs( annotatedBSJs, gtf, genome) }
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getCircSeqs( annotatedBSJs, gtf, genome) }
The helper functions getDeseqRes() identifies differentially expressed circRNAs. The latter uses respectively the R Bioconductor packages DESeq2 which implements a beta-binomial model to model changes in circRNA expression.
getDeseqRes( backSplicedJunctions, condition, pAdjustMethod = "BH", pathToExperiment = NULL, ... )
getDeseqRes( backSplicedJunctions, condition, pAdjustMethod = "BH", pathToExperiment = NULL, ... )
backSplicedJunctions |
A data frame containing the back-spliced junction
coordinates and counts in each analyzed sample.
See |
condition |
A string specifying which conditions to compare. Only 2 conditions at the time can be analyzed. Separate the 2 conditions with a dash, e.g. A-B. Use the same name used in column condition in experiment.txt. log2FC calculation is perfomed by comparing the condition positioned forward against the condition positioned backward in the alphabet. E.g. if there are 2 conditions A and B then a negative log2FC means that in condition B there is a downregulation of the corresponding circRNA. If a positive log2FC is found means that there is an upregulation in condition B of that circRNA. |
pAdjustMethod |
A character string stating the method used to adjust
p-values for multiple testing. See |
pathToExperiment |
A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers:
By default pathToExperiment is set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified. |
... |
Arguments to be passed to the DESeq function used internally from
DESeq2 package. If nothing is specified the default
values of the function |
A data frame.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filteredCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment) # Find differentially expressed circRNAs deseqResBvsA <- getDeseqRes( filteredCirc, condition = "A-B", pAdjustMethod = "BH", pathToExperiment)
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filteredCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment) # Find differentially expressed circRNAs deseqResBvsA <- getDeseqRes( filteredCirc, condition = "A-B", pAdjustMethod = "BH", pathToExperiment)
The function getDetectionTools() creates a data frame containing the codes corresponding to each circRNA detection tool for which a specific import function has been developed.
getDetectionTools()
getDetectionTools()
A data frame
getDetectionTools()
getDetectionTools()
The helper functions edgerRes() identifies differentially expressed circRNAs. The latter uses respectively the R Bioconductor packages EdgeR which implements a beta-binomial model to model changes in circRNA expression. The info reported in experiment.txt file are needed for differential expression analysis.
getEdgerRes( backSplicedJunctions, condition, normMethod = "TMM", pAdjustMethod = "BH", pathToExperiment = NULL )
getEdgerRes( backSplicedJunctions, condition, normMethod = "TMM", pAdjustMethod = "BH", pathToExperiment = NULL )
backSplicedJunctions |
A data frame containing the back-spliced junction
coordinates and counts in each analyzed sample.
See |
condition |
A string specifying which conditions to compare. Only 2 conditions at the time can be analyzed. Separate the 2 conditions with a dash, e.g. A-B. Use the same name used in column condition in experiment.txt. log2FC calculation is perfomed by comparing the condition positioned forward against the condition positioned backward in the alphabet. E.g. if there are 2 conditions A and B then a negative log2FC means that in condition B there is a downregulation of the corresponding circRNA. If a positive log2FC is found means that there is an upregulation in condition B of that circRNA. |
normMethod |
A character string specifying the normalization method to
be used. It can be "TMM","RLE","upperquartile" or"none". The value given
to the method argument is given to the |
pAdjustMethod |
A character string stating the method used to adjust
p-values for multiple testing. See |
pathToExperiment |
A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers:
By default pathToExperiment is set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified. |
A data frame.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filteredCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment) # Find differentially expressed circRNAs deseqResBvsA <- getEdgerRes( filteredCirc, condition = "A-B", normMethod = "TMM", pAdjustMethod = "BH", pathToExperiment)
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filteredCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment) # Find differentially expressed circRNAs deseqResBvsA <- getEdgerRes( filteredCirc, condition = "A-B", normMethod = "TMM", pAdjustMethod = "BH", pathToExperiment)
The function getMirSites() searches miRNA binding sites within the circRNA sequences. The user can restrict the analisis only to a subset of miRs. In this case, miR ids must go in miR.txt file. If the latter is absent or empty, all miRs of the specified miRspeciesCode are considered in the analysis.
getMiRsites( targets, miRspeciesCode = "hsa", miRBaseLatestRelease = TRUE, totalMatches = 7, maxNonCanonicalMatches = 1, pathToMiRs = NULL )
getMiRsites( targets, miRspeciesCode = "hsa", miRBaseLatestRelease = TRUE, totalMatches = 7, maxNonCanonicalMatches = 1, pathToMiRs = NULL )
targets |
A list containing the target sequences to analyze.
This data frame can be generated with |
miRspeciesCode |
A string specifying the species code (3 letters) as reported in miRBase db. E.g. to analyze the mouse microRNAs specify "mmu", to analyze the human microRNAs specify "hsa". Type data(miRspeciesCode) to see the available codes and the corresponding species reported in miRBase 22 release. Default value is "hsa". |
miRBaseLatestRelease |
A logical specifying whether to download the latest release of the mature sequences of the microRNAs from miRBase (http://www.mirbase.org/ftp.shtml). If TRUE is specified then the latest release is automatically downloaded. If FALSE is specified then a file named mature.fa containing fasta format sequences of all mature miRNA sequences previously downloaded by the user from mirBase must be present in the working directory. Default value is TRUE. |
totalMatches |
An integer specifying the total number of matches that have to be found between the seed region of the miR and the seed site of the target sequence. If the total number of matches found is less than the cut-off, the seed site is discarded. The maximun number of possible matches is 7. Default value is 7. |
maxNonCanonicalMatches |
An integer specifying the max number of non-canonical matches (G:U) allowed between the seed region of the miR and the seed site of the target sequence. If the max non-canonical matches found is greater than the cut-off, the seed site is discarded. Default value is 1. |
pathToMiRs |
A string containing the path to the miRs.txt file. The file miRs.txt contains the microRNA ids from miRBase specified by the user. It must have one column with header id. The first row must contain the miR name starting with the ">", e.g >hsa-miR-1-3p. The sequences of the miRs will be automatically retrieved from the mirBase latest release or from the given mature.fa file, that should be present in the working directory. By default pathToMiRs is set to NULL and the file it is searched in the working directory. If miRs.txt is located in a different directory then the path needs to be specified. If this file is absent or empty, all miRs of the specified species are considered in the miRNA analysis. |
A list.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences. targets <- getCircSeqs( annotatedBSJs, gtf, genome) # Screen target sequence for miR binding sites. pathToMiRs <- system.file("extdata", "miRs.txt", package="circRNAprofiler") #miRsites <- getMiRsites( # targets, # miRspeciesCode = "hsa", # miRBaseLatestRelease = TRUE, # totalMatches = 6, # maxNonCanonicalMatches = 1, # pathToMiRs ) }
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences. targets <- getCircSeqs( annotatedBSJs, gtf, genome) # Screen target sequence for miR binding sites. pathToMiRs <- system.file("extdata", "miRs.txt", package="circRNAprofiler") #miRsites <- getMiRsites( # targets, # miRspeciesCode = "hsa", # miRBaseLatestRelease = TRUE, # totalMatches = 6, # maxNonCanonicalMatches = 1, # pathToMiRs ) }
The function getMotifs() scans the target sequences for the presence of recurrent motifs of a specific length defined in input. By setting rbp equals to TRUE, the identified motifs are matched with motifs of known RNA Binding Proteins (RBPs) deposited in the ATtRACT (http://attract.cnic.es) or MEME database (http://meme-suite.org/) and with motifs specified by the user. The user motifs must go in the file motifs.txt. If this file is absent or empty, only motifs from the ATtRACT or MEME database are considered in the analysis. By setting rbp equals to FALSE, only motifs that do not match with any motifs deposited in the databases or user motifs are reported in the final output. Location of the selected motifs is also reported. This corresponds to the start position of the motif within the sequence (1-index based).
getMotifs( targets, width = 6, database = "ATtRACT", species = "Hsapiens", memeIndexFilePath = 18, rbp = TRUE, reverse = FALSE, pathToMotifs = NULL )
getMotifs( targets, width = 6, database = "ATtRACT", species = "Hsapiens", memeIndexFilePath = 18, rbp = TRUE, reverse = FALSE, pathToMotifs = NULL )
targets |
A list containing the target sequences to analyze.
It can be generated with |
width |
An integer specifying the length of all possible motifs to extract from the target sequences. Default value is 6. |
database |
A string specifying the RBP database to use. Possible options are ATtRACT or MEME. Default database is "ATtRACT". |
species |
A string specifying the species of the ATtRACT RBP motifs to use. Type data(attractSpecies) to see the possible options. Default value is "Hsapiens". |
memeIndexFilePath |
An integer specifying the index of the file path of the meme file to use.Type data(memeDB) to see the possible options. Default value is 18 corresponding to the following file: motif_databases/RNA/Ray2013_rbp_Homo_sapiens.meme |
rbp |
A logical specifying whether to report only motifs matching with known RBP motifs from ATtRACT database or user motifs specified in motifs.txt. If FALSE is specified only motifs that do not match with any of these motifs are reported. Default values is TRUE. |
reverse |
A logical specifying whether to reverse the motifs collected from ATtRACT database and from motifs.txt. If TRUE is specified all the motifs are reversed and analyzed together with the direct motifs as they are reported in the ATtRACT db and motifs.txt. Default value is FALSE. |
pathToMotifs |
A string containing the path to the motifs.txt file. The file motifs.txt contains motifs/regular expressions specified by the user. It must have 3 columns with headers:
By default pathToMotifs is set to NULL and the file it is searched in the working directory. If motifs.txt is located in a different directory then the path needs to be specified. If this file is absent or empty only the motifs of RNA Binding Proteins in the ATtRACT database are considered in the motifs analysis. |
A list.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Example with the first back-spliced junction # Multiple back-spliced junctions can also be analyzed at the same time # Annotate the first back-spliced junction annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Get motifs motifs <- getMotifs( targets, width = 6, database = 'ATtRACT', species = "Hsapiens", rbp = TRUE, reverse = FALSE) }
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Example with the first back-spliced junction # Multiple back-spliced junctions can also be analyzed at the same time # Annotate the first back-spliced junction annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Get motifs motifs <- getMotifs( targets, width = 6, database = 'ATtRACT', species = "Hsapiens", rbp = TRUE, reverse = FALSE) }
The function getRandomBSJunctions() retrieves random back-spliced junctions from the user genome annotation.
getRandomBSJunctions(gtf, n = 100, f = 10, setSeed = NULL)
getRandomBSJunctions(gtf, n = 100, f = 10, setSeed = NULL)
gtf |
A dataframe containing genome annotation information This can be
generated with |
n |
Integer specifying the number of randomly selected transcripts from which random back-spliced junctions are extracted. Default value = 100. |
f |
An integer specifying the fraction of single exon circRNAs that have to be present in the output data frame. Default value is 10. |
setSeed |
An integer which is used for selecting random back-spliced junctions. Default values is set to NULL. |
A data frame.
# Load short version of the gencode v19 annotation file data("gtf") # Get 10 random back-spliced junctions randomBSJunctions <- getRandomBSJunctions(gtf, n = 10, f = 10)
# Load short version of the gencode v19 annotation file data("gtf") # Get 10 random back-spliced junctions randomBSJunctions <- getRandomBSJunctions(gtf, n = 10, f = 10)
The function getRegexPattern() converts a nucleotide sequence with IUPAC codes to an regular expression.
getRegexPattern(iupacSeq, isDNA = FALSE)
getRegexPattern(iupacSeq, isDNA = FALSE)
iupacSeq |
A character string with IUPAC codes. |
isDNA |
A logical specifying whether the iupacSeq is DNA or RNA. Deafult value is FALSE. |
A character string.
regextPattern <- getRegexPattern("CGUKMBVNN", isDNA = FALSE)
regextPattern <- getRegexPattern("CGUKMBVNN", isDNA = FALSE)
The function getSeqsAcrossBSJs() retrieves the sequences across the back-spliced junctions. A total of 11 nucleotides from each side of the back-spliced junction are taken and concatenated together.
getSeqsAcrossBSJs(annotatedBSJs, gtf, genome)
getSeqsAcrossBSJs(annotatedBSJs, gtf, genome)
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
It can be generated with |
gtf |
A dataframe containing genome annotation information. It can be
generated with |
genome |
A BSgenome object containing the genome sequences.
It can be generated with |
A list.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsAcrossBSJs( annotatedBSJs, gtf, genome) }
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsAcrossBSJs( annotatedBSJs, gtf, genome) }
The function getSeqsFromGRs() includes 3 modules to retrieve 3 types of sequences. Sequences of the introns flanking back-spliced junctions, sequences from a defined genomic window surrounding the back-spliced junctions and sequences of the back-spliced exons.
getSeqsFromGRs(annotatedBSJs, genome, lIntron = 100, lExon = 10, type = "ie")
getSeqsFromGRs(annotatedBSJs, genome, lIntron = 100, lExon = 10, type = "ie")
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
This data frame can be generated with |
genome |
A BSgenome object containing the genome sequences.
It can be generated with in |
lIntron |
An integer indicating how many nucleotides are taken from the introns flanking the back-spliced junctions. This number must be positive. Default value is 100. |
lExon |
An integer indicating how many nucleotides are taken from the back-spliced exons starting from the back-spliced junctions. This number must be positive. Default value is 10. |
type |
A character string specifying the sequences to retrieve. If type = "ie" the sequences are retrieved from the the genomic ranges defined by using the lIntron and lExon given in input. If type = "bse" the sequences of the back-spliced exons are retrieved. If type = "fi" the sequences of the introns flanking the back-spliced exons are retrieved. Default value is "ie". |
A list.
# Load data frame containing predicted back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) }
# Load data frame containing predicted back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){ genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) }
A data frame containing the traits/disease extracted on the 31th October 2018 from the GWAS catalog. See also https://www.ebi.ac.uk/gwas/ for more detail.
data(gwasTraits)
data(gwasTraits)
A data frame with 653 rows and 1 columns
is the trait/disease as reported in the GWAS catalog
data(gwasTraits)
data(gwasTraits)
The function initCircRNAprofiler() initializes the project forlder.
initCircRNAprofiler(projectFolderName, detectionTools)
initCircRNAprofiler(projectFolderName, detectionTools)
projectFolderName |
A character string specifying the name of the project folder. |
detectionTools |
A character vector specifying the tools used to predict circRNAs. The following options are allowed: mapsplice, nclscan, knife, circexplorer2, circmarker and uroborus. If the tool is not mapsplice, nclscan, knife, circexplorer2, uroborus or circmarker then use the option other. The user can choose 1 or multiple tools. Subfolders named as the specified tools will be generated under the working directory. |
A NULL object
## Not run: initCircRNAprofiler(projectFolderName = "circProject", detectionTools = "mapsplice") ## End(Not run)
## Not run: initCircRNAprofiler(projectFolderName = "circProject", detectionTools = "mapsplice") ## End(Not run)
A data frame containing IUPAC codes and the correspoding regular expression.
data(iupac)
data(iupac)
A data frame with 18 rows and 4 columns
is the IUPAC nucleotide code
is the base
is the regular expression of the corresponding IUPAC code
is the regular expression of the corresponding IUPAC code
data(iupac)
data(iupac)
The function liftBSJcoords() maps back-spliced junction coordinates between species ad genome assemblies by using the liftOver utility from UCSC. Only back-spliced junction coordinates where the mapping was successful are reported.
liftBSJcoords( backSplicedJunctions, map = "hg19ToMm9", annotationHubID = "AH14155" )
liftBSJcoords( backSplicedJunctions, map = "hg19ToMm9", annotationHubID = "AH14155" )
backSplicedJunctions |
A data frame containing the back-spliced junction
coordinates (predicted or randomly selected).
See |
map |
A character string in the format <db1>To<Db2> (e.g."hg19ToMm9") specifying the reference genome mapping logic associated with a valid .chain file. Default value is "hg19ToMm9". |
annotationHubID |
A string specifying the AnnotationHub id associated with a valid *.chain file. Type data(ahChainFiles) to see all possibile options. E.g. if AH14155 is specified, the hg19ToMm9.over.chain.gz will be used to convert the hg19 (Human GRCh37) coordinates to mm10 (Mouse GRCm38). Default value is "AH14155". |
a data frame.
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # LiftOver the first 10 back-spliced junction coordinates liftedBSJcoords <- liftBSJcoords(mergedBSJunctions[1:10,], map = "hg19ToMm9")
# Load a data frame containing detected back-spliced junctions data("mergedBSJunctions") # LiftOver the first 10 back-spliced junction coordinates liftedBSJcoords <- liftBSJcoords(mergedBSJunctions[1:10,], map = "hg19ToMm9")
A dataframe containing the file paths of the files containing the RBP motifs in meme format available in MEME db. See also http://meme-suite.org/doc/download.html. File paths extracted from RNA folder of motif_databases.12.19.tgz (Motif Databases updated 28 Oct 2019)
data(memeDB)
data(memeDB)
A character vector with 25 rows ans 2 columns
is the file path
index of the file path
data(memeDB)
data(memeDB)
The function mergeBSJunctions() shrinks the data frame by
grouping back-spliced junctions commonly identified by multiple
detection tools. The read counts of the samples reported in the final
data frame will be the ones of the tool that detected the highest total mean
across all samples. All the tools that detected the back-spliced junctions
are then listed in the column "tool" of the final data frame.
See getDetectionTools
for more detail about the code
corresponding to each circRNA detection tool.
NOTE: Since different detection tools can report sligtly different coordinates before grouping the back-spliced junctions, it is possible to fix the latter using the gtf file. In this way the back-spliced junctions coordinates will correspond to the exon coordinates reported in the gtf file. A difference of maximum 2 nucleodites is allowed between the bsj and exon coordinates. See param fixBSJsWithGTF.
mergeBSJunctions( backSplicedJunctions, gtf, pathToExperiment = NULL, exportAntisense = FALSE, fixBSJsWithGTF = FALSE )
mergeBSJunctions( backSplicedJunctions, gtf, pathToExperiment = NULL, exportAntisense = FALSE, fixBSJsWithGTF = FALSE )
backSplicedJunctions |
A data frame containing back-spliced junction
coordinates and counts generated with |
gtf |
A data frame containing genome annotation information,
generated with |
pathToExperiment |
A string containing the path to the experiment.txt file. The file experiment.txt contains the experiment design information. It must have at least 3 columns with headers: - label (1st column): unique names of the samples (short but informative). - fileName (2nd column): name of the input files - e.g. circRNAs_X.txt, where x can be can be 001, 002 etc. - group (3rd column): biological conditions - e.g. A or B; healthy or diseased if you have only 2 conditions. By default pathToExperiment i set to NULL and the file it is searched in the working directory. If experiment.txt is located in a different directory then the path needs to be specified. |
exportAntisense |
A logical specifying whether to export the identified antisense circRNAs in a file named antisenseCircRNAs.txt. Default value is FALSE. A circRNA is defined antisense if the strand reported in the prediction results is different from the strand reported in the genome annotation file. The antisense circRNAs are removed from the returned data frame. |
fixBSJsWithGTF |
A logical specifying whether to fix the back-spliced junctions coordinates using the GTF file. Default value is FALSE. |
A data frame.
# Load detected back-soliced junctions data("backSplicedJunctions") # Load short version of the gencode v19 annotation file data("gtf") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Merge commonly identified circRNAs mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf, pathToExperiment)
# Load detected back-soliced junctions data("backSplicedJunctions") # Load short version of the gencode v19 annotation file data("gtf") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Merge commonly identified circRNAs mergedBSJunctions <- mergeBSJunctions(backSplicedJunctions, gtf, pathToExperiment)
A data frame containing genomic coordinates (genome assembly hg19) detected
in human LV tissues of controls and diseased hearts. This data frame was
generated with mergedBSJunctions
which grouped circRNAs
commonly identified by the three tools (CircMarker(cm), MapSplice2 (m) and
NCLscan (n)) used for circRNAs detection.
data(mergedBSJunctions)
data(mergedBSJunctions)
A data frame with 41558 rows and 16 columns
Unique identifier
is the gene name whose exon coordinates overlap that of the given back-spliced junctions
is the strand where the gene is transcribed
the chromosome from which the circRNA is derived
is the 5' coordinate of the upstream back-spliced exon in the transcript. This corresponds to the back-spliced junction / acceptor site
is the 3' coordinate of the downstream back-spliced exon in the transcript. This corresponds to the back-spliced junction / donor site
are the tools that identified the back-spliced junctions
Number of occurences of each circRNA in control 1
Number of occurences of each circRNA in control 2
Number of occurences of each circRNA in control 3
Number of occurences of each circRNA in DCM 1
Number of occurences of each circRNA in DCM 2
Number of occurences of each circRNA in DCM 3
Number of occurences of each circRNA in HCM 1
Number of occurences of each circRNA in HCM 2
Number of occurences of each circRNA in HCM 3
data(mergedBSJunctions)
data(mergedBSJunctions)
A same RBP can recognize multiple motifs, the function mergeMotifs() groups all the motifs found for each RBP and report the total counts.
mergeMotifs(motifs)
mergeMotifs(motifs)
motifs |
A data frame generated with |
A data frame.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Example with the first back-spliced junctions. # Multiple back-spliced junctions can also be analyzed at the same time. # Annotate detected back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Get motifs motifs <- getMotifs( targets, width = 6, species = "Hsapiens", rbp = TRUE, reverse = FALSE) # Group motifs mergedMotifs <- mergeMotifs(motifs)
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Example with the first back-spliced junctions. # Multiple back-spliced junctions can also be analyzed at the same time. # Annotate detected back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences targets <- getSeqsFromGRs( annotatedBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Get motifs motifs <- getMotifs( targets, width = 6, species = "Hsapiens", rbp = TRUE, reverse = FALSE) # Group motifs mergedMotifs <- mergeMotifs(motifs)
A data frame containing the code of the species as reported in miRBase 22 release. See also http://www.mirbase.org
data(miRspeciesCodes)
data(miRspeciesCodes)
A data frame with 271 rows and 2 columns
is the unique identifier of the species as reported in miRBase db
is the species
data(miRspeciesCodes)
data(miRspeciesCodes)
The function plotExBetweenBSEs() generates a bar chart showing the no. of exons in between the back-spliced junctions.
plotExBetweenBSEs(annotatedBSJs, title = "")
plotExBetweenBSEs(annotatedBSJs, title = "")
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
This data frame can be generated with |
title |
A character string specifying the title of the plot. |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotExBetweenBSEs(annotatedBSJs, title = "") p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotExBetweenBSEs(annotatedBSJs, title = "") p
The function plotExPosition() generates a bar chart showing the position of the back-spliced exons within the transcript.
plotExPosition(annotatedBSJs, title = "", n = 0, flip = FALSE)
plotExPosition(annotatedBSJs, title = "", n = 0, flip = FALSE)
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
This data frame can be generated with |
title |
A character string specifying the title of the plot. |
n |
An integer specyfing the position counts cut-off. If 0 is specified all position are plotted. Deafult value is 0. |
flip |
A logical specifying whether to flip the transcripts. If TRUE all transcripts are flipped and the last exons will correspond to the first ones, the second last exons to the second ones etc. Default value is FALSE. |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotExPosition(annotatedBSJs, title = "", n = 0, flip = FALSE) p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotExPosition(annotatedBSJs, title = "", n = 0, flip = FALSE) p
The function plotHostGenes() generates a bar chart showing the no. of circRNAs produced from each the circRNA host gene.
plotHostGenes(annotatedBSJs, title = "")
plotHostGenes(annotatedBSJs, title = "")
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
This data frame can be generated with |
title |
A character string specifying the title of the plot. |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotHostGenes(annotatedBSJs, title = "") p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotHostGenes(annotatedBSJs, title = "") p
The function plotLenBSEs() generates vertical boxplots for comparison of length of back-spliced exons (e.g. detected Vs randomly selected).
plotLenBSEs( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "", setyLim = FALSE, ylim = c(0, 8) )
plotLenBSEs( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "", setyLim = FALSE, ylim = c(0, 8) )
annotatedFBSJs |
A data frame with the annotated back-spliced junctions
(e.g. detected). It can be generated with |
annotatedBBSJs |
A data frame with the annotated back-spliced junctions
(e.g. randomly selected). It can generated with |
df1Name |
A string specifying the name of the first data frame. This will be displayed in the legend of the plot. |
df2Name |
A string specifying the name of the first data frame. This will be displayed in the legend of the plot. |
title |
A character string specifying the title of the plot |
setyLim |
A logical specifying whether to set y scale limits. If TRUE the value in ylim will be used. Deafult value is FALSE. |
ylim |
An integer specifying the lower and upper y axis limits Deafult values are c(0, 8). |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedFBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Get random back-spliced junctions randomBSJunctions <- getRandomBSJunctions(n = 10, f = 10, gtf) # Annotate random back-spliced junctions annotatedBBSJs <- annotateBSJs(randomBSJunctions, gtf, isRandom = TRUE) # Plot p <- plotLenBSEs( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "") p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedFBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Get random back-spliced junctions randomBSJunctions <- getRandomBSJunctions(n = 10, f = 10, gtf) # Annotate random back-spliced junctions annotatedBBSJs <- annotateBSJs(randomBSJunctions, gtf, isRandom = TRUE) # Plot p <- plotLenBSEs( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "") p
The function plotLenIntrons() generates vertical boxplots for comparison of length of introns flanking the back-spliced junctions (e.g. detected Vs randomly selected).
plotLenIntrons( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "", setyLim = FALSE, ylim = c(0, 8) )
plotLenIntrons( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "", setyLim = FALSE, ylim = c(0, 8) )
annotatedFBSJs |
A data frame with the annotated back-spliced junctions
(e.g. detected). It can be generated with |
annotatedBBSJs |
A data frame with the annotated back-spliced junctions
(e.g. randomly selected). It can generated with |
df1Name |
A string specifying the name of the first data frame. This will be displayed in the legend of the plot. Deafult value is "foreground". |
df2Name |
A string specifying the name of the first data frame. This will be displayed in the legend of the plot. Deafult value is "background". |
title |
A character string specifying the title of the plot. |
setyLim |
A logical specifying whether to set y scale limits. If TRUE the value in ylim will be used. Deafult value is FALSE. |
ylim |
An integer specifying the lower and upper y axis limits Deafult values are c(0, 8). |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedFBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Get random back-spliced junctions randomBSJunctions <- getRandomBSJunctions( gtf, n = 10, f = 10) # Annotate random back-spliced junctions annotatedBBSJs <- annotateBSJs(randomBSJunctions, gtf, isRandom = TRUE) # Plot p <- plotLenIntrons( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "") p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedFBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Get random back-spliced junctions randomBSJunctions <- getRandomBSJunctions( gtf, n = 10, f = 10) # Annotate random back-spliced junctions annotatedBBSJs <- annotateBSJs(randomBSJunctions, gtf, isRandom = TRUE) # Plot p <- plotLenIntrons( annotatedFBSJs, annotatedBBSJs, df1Name = "foreground", df2Name = "background", title = "") p
The function plotMiR() generates a scatter plot showing the number of miRNA binding sites for each miR found in the target sequence.
plotMiR(rearragedMiRres, n = 40, color = "blue", miRid = FALSE, id = 1)
plotMiR(rearragedMiRres, n = 40, color = "blue", miRid = FALSE, id = 1)
rearragedMiRres |
A list containing containing rearranged
miRNA analysis results. See |
n |
An integer specifying the miRNA binding sites cut-off. The miRNA with a number of binding sites equals or higher to the cut-off will be colored. Deafaut value is 40. |
color |
A string specifying the color of the top n miRs. Default value is "blue". |
miRid |
A logical specifying whether or not to show the miR ids in the plot. default value is FALSE. |
id |
An integer specifying which element of the list rearragedMiRres to plot. Each element of the list contains the miR resutls relative to one circRNA. Deafult value is 1. |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences. targets <- getCircSeqs( annotatedBSJs, gtf, genome) # Screen target sequence for miR binding sites. pathToMiRs <- system.file("extdata", "miRs.txt", package="circRNAprofiler") # miRsites <- getMiRsites( # targets, # miRspeciesCode = "hsa", # miRBaseLatestRelease = TRUE, # totalMatches = 6, # maxNonCanonicalMatches = 1, # pathToMiRs) # Rearrange miR results # rearragedMiRres <- rearrangeMiRres(miRsites) # Plot # p <- plotMiR( # rearragedMiRres, # n = 20, # color = "blue", # miRid = TRUE, # id = 3) # p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences. targets <- getCircSeqs( annotatedBSJs, gtf, genome) # Screen target sequence for miR binding sites. pathToMiRs <- system.file("extdata", "miRs.txt", package="circRNAprofiler") # miRsites <- getMiRsites( # targets, # miRspeciesCode = "hsa", # miRBaseLatestRelease = TRUE, # totalMatches = 6, # maxNonCanonicalMatches = 1, # pathToMiRs) # Rearrange miR results # rearragedMiRres <- rearrangeMiRres(miRsites) # Plot # p <- plotMiR( # rearragedMiRres, # n = 20, # color = "blue", # miRid = TRUE, # id = 3) # p
The function plotMotifs() generates 2 bar charts showing the log2FC and the number of occurences of each motif found in the target sequences (e.g detected Vs randomly selected).
plotMotifs( mergedMotifsFTS, mergedMotifsBTS, log2FC = 1, n = 0, removeNegLog2FC = FALSE, nf1 = 1, nf2 = 1, df1Name = "foreground", df2Name = "background", angle = 0 )
plotMotifs( mergedMotifsFTS, mergedMotifsBTS, log2FC = 1, n = 0, removeNegLog2FC = FALSE, nf1 = 1, nf2 = 1, df1Name = "foreground", df2Name = "background", angle = 0 )
mergedMotifsFTS |
A data frame containing the number of occurences
of each motif found in foreground target sequences (e.g from detected
back-spliced junctions). It can be generated with the
|
mergedMotifsBTS |
A data frame containing the number of occurences
of each motif found in the background target sequences (e.g. from
random back-spliced junctions). It can be generated with the
|
log2FC |
An integer specifying the log2FC cut-off. Default value is 1. NOTE: log2FC is calculated as follow: normalized number of occurences of each motif found in the foreground target sequences / normalized number of occurences of each motif found in the background target sequences. To avoid infinity as a value for fold change, 1 was added to number of occurences of each motif found in the foreground and background target sequences before the normalization. |
n |
An integer specifying the number of motifs cutoff. E.g. if 3 is specifiyed only motifs that are found at least 3 times in the foreground or background target sequences are retained. Deafaut value is 0. |
removeNegLog2FC |
A logical specifying whether to remove the RBPs having a negative log2FC. If TRUE then only positive log2FC will be visualized. Default value is FALSE. |
nf1 |
An integer specifying the normalization factor for the first data frame mergedMotifsFTS. The occurrences of each motif plus 1 are divided by nf1. The normalized values are then used for fold-change calculation. Set this to the number or length of target sequences (e.g from detected back-spliced junctions) where the motifs were extracted from. Default value is 1. |
nf2 |
An integer specifying the normalization factor for the second data frame mergedMotifsBTS. The occurrences of each motif plus 1 are divided by nf2. The The normalized values are then used for fold-change calculation. Set this to the number of target sequences (e.g from random back-spliced junctions) where the motifs were extracted from. Default value is 1. NOTE: If nf1 and nf2 is set equals to 1, the number or length of target sequences (e.g detected Vs randomly selected) where the motifs were extrated from, is supposed to be the same. |
df1Name |
A string specifying the name of the first data frame. This will be displayed in the legend of the plot. Deafult value is "foreground". |
df2Name |
A string specifying the name of the first data frame. This will be displayed in the legend of the plot. Deafult value is "background". |
angle |
An integer specifying the rotation angle of the axis labels. Default value is 0. |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedFBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get random back-spliced junctions randomBSJunctions <- getRandomBSJunctions(gtf, n = 1, f = 10) # Annotate random back-spliced junctions annotatedBBSJs <- annotateBSJs(randomBSJunctions, gtf, isRandom = TRUE) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences from detected back-spliced junctions targetsFTS <- getSeqsFromGRs( annotatedFBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Retrieve target sequences from random back-spliced junctions targetsBTS <- getSeqsFromGRs( annotatedBBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Get motifs motifsFTS <- getMotifs( targetsFTS, width = 6, database = 'ATtRACT', species = "Hsapiens", rbp = TRUE, reverse = FALSE) motifsBTS <- getMotifs( targetsBTS, width = 6, database = 'ATtRACT', species = "Hsapiens", rbp = TRUE, reverse = FALSE) # Merge motifs mergedMotifsFTS <- mergeMotifs(motifsFTS) mergedMotifsBTS <- mergeMotifs(motifsBTS) # Plot p <- plotMotifs( mergedMotifsFTS, mergedMotifsBTS, log2FC = 2, nf1 = nrow(annotatedFBSJs), nf2 = nrow(annotatedBBSJs), df1Name = "foreground", df2Name = "background")
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedFBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get random back-spliced junctions randomBSJunctions <- getRandomBSJunctions(gtf, n = 1, f = 10) # Annotate random back-spliced junctions annotatedBBSJs <- annotateBSJs(randomBSJunctions, gtf, isRandom = TRUE) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences from detected back-spliced junctions targetsFTS <- getSeqsFromGRs( annotatedFBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Retrieve target sequences from random back-spliced junctions targetsBTS <- getSeqsFromGRs( annotatedBBSJs, genome, lIntron = 200, lExon = 10, type = "ie" ) # Get motifs motifsFTS <- getMotifs( targetsFTS, width = 6, database = 'ATtRACT', species = "Hsapiens", rbp = TRUE, reverse = FALSE) motifsBTS <- getMotifs( targetsBTS, width = 6, database = 'ATtRACT', species = "Hsapiens", rbp = TRUE, reverse = FALSE) # Merge motifs mergedMotifsFTS <- mergeMotifs(motifsFTS) mergedMotifsBTS <- mergeMotifs(motifsBTS) # Plot p <- plotMotifs( mergedMotifsFTS, mergedMotifsBTS, log2FC = 2, nf1 = nrow(annotatedFBSJs), nf2 = nrow(annotatedBBSJs), df1Name = "foreground", df2Name = "background")
The function plotTotExons() generates a bar chart showing the total number of exons (totExon column) in the transcripts selected for the downstream analysis.
plotTotExons(annotatedBSJs, title = "")
plotTotExons(annotatedBSJs, title = "")
annotatedBSJs |
A data frame with the annotated back-spliced junctions.
This data frame can be generated with |
title |
A character string specifying the title of the plot |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotTotExons(annotatedBSJs, title = "") p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first 10 back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1:10, ], gtf) # Plot p <- plotTotExons(annotatedBSJs, title = "") p
The function rearrangeMiRres() rearranges the results of the getMiRsites() function. Each element of the list contains the miR results relative to one circRNA. For each circRNA only miRNAs for which at least 1 miRNA binding site is found are reported.
rearrangeMiRres(miRsites)
rearrangeMiRres(miRsites)
miRsites |
A list containing the miR sites found in the RNA
target sequence. it can be generated with |
A list.
# Load data frame containing predicted back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences. targets <- getCircSeqs( annotatedBSJs, gtf, genome) # Screen target sequence for miR binding sites. pathToMiRs <- system.file("extdata", "miRs.txt",package="circRNAprofiler") # miRsites <- getMiRsites( # targets, # miRspeciesCode = "hsa", # miRBaseLatestRelease = TRUE, # totalMatches = 6, # maxNonCanonicalMatches = 1, # pathToMiRs) # Rearrange miR results # rearragedMiRres <- rearrangeMiRres(miRsites)
# Load data frame containing predicted back-spliced junctions data("mergedBSJunctions") # Load short version of the gencode v19 annotation file data("gtf") # Annotate the first back-spliced junctions annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf) # Get genome genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19") # Retrieve target sequences. targets <- getCircSeqs( annotatedBSJs, gtf, genome) # Screen target sequence for miR binding sites. pathToMiRs <- system.file("extdata", "miRs.txt",package="circRNAprofiler") # miRsites <- getMiRsites( # targets, # miRspeciesCode = "hsa", # miRBaseLatestRelease = TRUE, # totalMatches = 6, # maxNonCanonicalMatches = 1, # pathToMiRs) # Rearrange miR results # rearragedMiRres <- rearrangeMiRres(miRsites)
The function volcanoPlot() generates a volcano plot with the results of the differential expression analysis.
volcanoPlot( res, log2FC = 1, padj = 0.05, title = "", gene = FALSE, geneSet = c(""), setxLim = FALSE, xlim = c(-8, 8), setyLim = FALSE, ylim = c(0, 5), color = "blue" )
volcanoPlot( res, log2FC = 1, padj = 0.05, title = "", gene = FALSE, geneSet = c(""), setxLim = FALSE, xlim = c(-8, 8), setyLim = FALSE, ylim = c(0, 5), color = "blue" )
res |
A data frame containing the the differential expression resuls.
It can be generated with |
log2FC |
An integer specifying the log2FC cut-off. Deafult value is 1. |
padj |
An integer specifying the adjusted P value cut-off. Deafult value is 0.05. |
title |
A character string specifying the title of the plot. |
gene |
A logical specifying whether to show all the host gene names of the differentially expressed circRNAs to the plot. Deafult value is FALSE. |
geneSet |
A character vector specifying which host gene name of the differentially expressed circRNAs to show in the plot. Multiple host gene names can be specofied. E.g. c('TTN, RyR2') |
setxLim |
A logical specifying whether to set x scale limits. If TRUE the value in xlim will be used. Deafult value is FALSE. |
xlim |
A numeric vector specifying the lower and upper x axis limits. Deafult values are c(-8 , 8). |
setyLim |
A logical specifying whether to set y scale limits. If TRUE the value in ylim will be used. Deafult value is FALSE. |
ylim |
An integer specifying the lower and upper y axis limits Deafult values are c(0, 5). |
color |
A string specifying the color of the differentially expressed circRNAs. Default value is "blue". |
A ggplot object.
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filterdCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment) # Find differentially expressed circRNAs deseqResBvsA <- getDeseqRes( filterdCirc, condition = "A-B", pAdjustMethod = "BH", pathToExperiment) # Plot p <- volcanoPlot( deseqResBvsA, log2FC = 1, padj = 0.05, title = "", setxLim = TRUE, xlim = c(-8 , 7.5), setyLim = FALSE, ylim = c(0 , 4), gene = FALSE) p
# Load data frame containing detected back-spliced junctions data("mergedBSJunctions") pathToExperiment <- system.file("extdata", "experiment.txt", package ="circRNAprofiler") # Filter circRNAs filterdCirc <- filterCirc( mergedBSJunctions, allSamples = FALSE, min = 5, pathToExperiment) # Find differentially expressed circRNAs deseqResBvsA <- getDeseqRes( filterdCirc, condition = "A-B", pAdjustMethod = "BH", pathToExperiment) # Plot p <- volcanoPlot( deseqResBvsA, log2FC = 1, padj = 0.05, title = "", setxLim = TRUE, xlim = c(-8 , 7.5), setyLim = FALSE, ylim = c(0 , 4), gene = FALSE) p