Title: | Prediction of pri-miRNA Transcription Start Site |
---|---|
Description: | A fast, convenient tool to identify the TSSs of miRNAs by integrating the data of H3K4me3 and Pol II as well as combining the conservation level and sequence feature, provided within both command-line and graphical interfaces, which achieves a better performance than the previous non-cell-specific methods on miRNA TSSs. |
Authors: | Pumin Li [aut, cre], Qi Xu [aut], Jie Li [aut], Jin Wang [aut] |
Maintainer: | Pumin Li <[email protected]> |
License: | GPL-2 |
Version: | 1.25.0 |
Built: | 2024-10-31 03:45:30 UTC |
Source: | https://github.com/bioc/primirTSS |
Search for putative TSSs of miRNA, together with integrating available data such as H3K4me3 data, Pol II data, miRNA expression data, and protein-coding gene data, as well as provide the transcriptional regulation relationship between TF and miRNA.
find_tss( bed_merged, expressed_mir = "all", flanking_num = 1000, threshold = 0.7, ignore_DHS_check = TRUE, DHS, allmirdhs_byforce = TRUE, expressed_gene = "all", allmirgene_byforce = TRUE, seek_tf = FALSE, tf_n = 1000, min.score = 0.8 )
find_tss( bed_merged, expressed_mir = "all", flanking_num = 1000, threshold = 0.7, ignore_DHS_check = TRUE, DHS, allmirdhs_byforce = TRUE, expressed_gene = "all", allmirgene_byforce = TRUE, seek_tf = FALSE, tf_n = 1000, min.score = 0.8 )
bed_merged |
Peaks from ChIP-seq data to be provided for analysis can be
H3K4me3 peaks, Pol II peaks or both. Notice that peaks are supposed to be
merged(see also |
expressed_mir |
This parameter allows users to specify certain miRNAs,
the TSSs of which they want to search for by providing a list of
miRNAs(e.g., expressed miRNAs in a certain cell-line). If
|
flanking_num |
A parameter in Eponine model to detect TSSs. It is concluded that a peak signal with flanking regions of C-G enrichment are important to mark TSSs. The default value is 1000. |
threshold |
The threshold for candidate TSSs scored with Eponine method. The default value is 0.7. |
ignore_DHS_check |
The process of DHS_check further assists to filter putative TSSs. When there is a DHS peak that locates within 1 kb upstream of a putative TSS, this predicted TSS will be retained for its character is consistent with that of an authentic TSS. Or the TSSs with no DHSs locating within 1 kb upstream of them would be discarded. |
DHS |
ChIP-seq data of DNase I hypersensitive sites(DHSs). |
allmirdhs_byforce |
When we use DHS data to check the validity of TSSs,
there is a possibility where no DHSs locates within 1 kb upstream of all
putative TSSs and all these putative TSSs might be filtered out by our
method resulting no outputs. While " |
expressed_gene |
Users can specify genes expressed in certain
cell-lines that are analyzed. Or the default value is " |
allmirgene_byforce |
While integrating expressed_gene data to improve
prediction, there might be a circumstance where all the putative TSS are
discarded. To prevent this condition, users are allowed to use
" |
seek_tf |
With the result of predicted TSSs, seek_tf provides users with
an option to predict related TFs for miRNA. The data of transcription
factors refer to |
tf_n |
TFBS locates on the upstream of the TSS of a certain TF, which is
considered as the promoter region. |
min.score |
The threshold for scoring transcription factor binding sites. A single absolute value between 0 and 1. |
The first part of the result returns details of predicted TSSs,
composed of seven columns: mir_name, chrom, stem_loop_p1,
stem_loop_p2, strand mir_context, tss_type gene and predicted_tss
:
mir_name
: Name of miRNA.
chrom
: Chromosome.
stem_loop_p1
: The start site of a stem-loop.
stem_loop_p2
: The end site of a stem-loop.
strand
: Polynucleotide strands. (+/-
)
mir_context
: The relative positon relationship between stem-loop and
protein-coding gene. (intra/inter
)
tss_type
: Four types of predicted TSSs. See the section below
TSS types
for details.
(host_TSS/intra_TSS/overlap_inter_TSS/inter_TSS
)
gene
: Ensembl gene ID
predicted_tss
: Predicted transcription start sites(TSSs).
pri_tss_distance
: The distance between a predicted TSS and the start
site of the stem-loop.
TSSs are catalogued into 4 types as below.
host_TSS
The TSSs of miRNA that are close to the TSS of
protein-coding gene implying they may share the same TSS, on the condition
where mir_context
is "intra
". (See above:
Value-mir_context
)
intra_TSS
The TSSs of miRNA that are NOT close to the TSS of
the protein-coding gene, on the condition where mir_context
is
"intra
".
overlap_inter_TSS
The TSSs of miRNA are catalogued as
"overlap_inter_TSS
" when the pri-miRNA gene overlaps with Ensembl
gene, on the condition where "mir_context
" is "inter
".
inter_inter_TSS
The TSSs of miRNA are catalogued as
"inter_inter_TSS
" when the miRNA gene does NOT overlap with Ensembl
gene, on the condition where "mir_context
" is "inter
".
(See Xu HUA et al 2016 for more details)
The second part of the result returns logs during the process of
prediction: find_nearest_peak_log
If no peaks locate in the
upstream of a stem-loop to help determine putative TSSs of miRNA, we will
fail to find the nearest peak and this miRNA will be logged in
find_nearest_peak_log
.
eponine_score_log
For a certain miRNA, if none of the candidate
TSSs scored with Eponine method
meet the threshold we set, we will
fail to get an eponine score and this miRNA will be logged in
eponine_score_log
.
DHS_check_log
For a certain miRNA, if no DHS signals locate
within 1 kb upstream of each putative TSSs, these putative TSSs will be
filtered out and this miRNA will be logged in DHS_check_log
.
gene_filter_log
For a certain miRNA, when integrating
expressed_gene data to improve prediction, if no putative TSSs are
confirmed after considering the relative position relationship among TSSs,
stem-loops and expressed genes, this miRNA will be filtered out and logged
in gene_filter_log
.
Xu Hua, Luxiao Chen, Jin Wang*, Jie Li* and Edgar Wingender*, Identifying cell-specific microRNA transcriptional start sites. Bioinformatics 2016, 32(16), 2403-10.
bed_merged <- data.frame( chrom = c("chr1", "chr1", "chr1", "chr1", "chr2"), start = c(9910686, 9942202, 9996940, 10032962, 9830615), end = c(9911113, 9944469, 9998065, 10035458, 9917994), stringsAsFactors = FALSE) bed_merged <- as(bed_merged, "GRanges") ## Not run: ownmiRNA <- find_tss(bed_merged, expressed_mir = "hsa-mir-5697", ignore_DHS_check = TRUE, expressed_gene = "all", allmirgene_byforce = TRUE) ## End(Not run)
bed_merged <- data.frame( chrom = c("chr1", "chr1", "chr1", "chr1", "chr2"), start = c(9910686, 9942202, 9996940, 10032962, 9830615), end = c(9911113, 9944469, 9998065, 10035458, 9917994), stringsAsFactors = FALSE) bed_merged <- as(bed_merged, "GRanges") ## Not run: ownmiRNA <- find_tss(bed_merged, expressed_mir = "hsa-mir-5697", ignore_DHS_check = TRUE, expressed_gene = "all", allmirgene_byforce = TRUE) ## End(Not run)
Integrate peaks from H3K4me3 and Pol II data. To conduct the overlappd ranges for the further analysis by imposing H3K4me3 peaks on Pol II peaks, if both of these two different kinds of ChIP-seq data are available.
peak_join(peak1, peak2)
peak_join(peak1, peak2)
peak1 |
H3K4me3 peaks. Merged peak data as |
peak2 |
Pol II peaks. Merged peak data as |
A GRanges object. The joined peaks for the following analysis to search for TSSs.
Peak1 and peak2 are signals seperately from the ChIP-seq
data of H3K4me3 and Pol II data that to be integrated. The data is
GRange
object containing three columns Chrom
, Ranges
,
Strand
. And the order of these two kinds of data when input as
peak1
and peak2
can be swapped.
peak_df1 <- data.frame(chrom = c("chr1", "chr1", "chr1", "chr2"), start = c(100, 460, 600, 70), end = c(200, 500, 630, 100), stringsAsFactors = FALSE) peak1 <- as(peak_df1, "GRanges") peak_df2 <- data.frame(chrom = c("chr1", "chr1", "chr1", "chr2"), start = c(160, 470, 640, 71), end = c(210, 480, 700, 90), stringsAsFactors = FALSE) peak2 <- as(peak_df2, "GRanges") peak_join(peak1, peak2)
peak_df1 <- data.frame(chrom = c("chr1", "chr1", "chr1", "chr2"), start = c(100, 460, 600, 70), end = c(200, 500, 630, 100), stringsAsFactors = FALSE) peak1 <- as(peak_df1, "GRanges") peak_df2 <- data.frame(chrom = c("chr1", "chr1", "chr1", "chr2"), start = c(160, 470, 640, 71), end = c(210, 480, 700, 90), stringsAsFactors = FALSE) peak2 <- as(peak_df2, "GRanges") peak_join(peak1, peak2)
Merge the adjacent segments provided as GRange
object from original
data. This function will merge adjacent peaks the distance between which is
less than n base pairs apart and then return the merged segments.
peak_merge(peak, n = 250)
peak_merge(peak, n = 250)
peak |
A |
n |
A number. |
A GRanges object. The merged peaks for the following analysis to search for TSSs.
peak_df <- data.frame(chrom = c("chr1", "chr2", "chr1"), chromStart = c(450, 460, 680), chromEnd = c(470, 480, 710), stringsAsFactors = FALSE) peak <- as(peak_df, "GRanges") peak_merge(peak, n =250)
peak_df <- data.frame(chrom = c("chr1", "chr2", "chr1"), chromStart = c(450, 460, 680), chromEnd = c(470, 480, 710), stringsAsFactors = FALSE) peak <- as(peak_df, "GRanges") peak_merge(peak, n =250)
For each miRNA, plot the position of TSS, pri-miRNA, related Ensemble gene, eponine socre and conservation score according to the result of prediction using primirTSS.
plot_primiRNA( expressed_mir, bed_merged, flanking_num = 1000, threshold = 0.7, ignore_DHS_check = TRUE, DHS, allmirdhs_byforce = TRUE, expressed_gene = "all", allmirgene_byforce = TRUE )
plot_primiRNA( expressed_mir, bed_merged, flanking_num = 1000, threshold = 0.7, ignore_DHS_check = TRUE, DHS, allmirdhs_byforce = TRUE, expressed_gene = "all", allmirgene_byforce = TRUE )
expressed_mir |
This parameter allows users to specify certain miRNAs,
the TSSs of which they want to search for by providing a list of
miRNAs(e.g. expressed miRNAs in a certain cell-line). If
|
bed_merged |
Peaks from ChIP-seq data to be provided for analysis can be
H3K4me3 peaks, Pol II peaks or both. Notice that peaks are supposed to be
merged(see also |
flanking_num |
A parameter in Eponine model to detect TSSs. It is concluded that a peak signal with flanking regions of C-G enrichment are important to mark TSSs. The default value is 1000. |
threshold |
Threshold for candidate TSSs scored with Eponine method. The default value is 0.7. |
ignore_DHS_check |
The process of DHS_check further assist to filter putative TSSs. When there are a DHS peak that locates within 1 kb upstream of a putative TSS, this predicted TSS will be retain for it character is consistent with that of an authentic TSS. Or the TSSs with no DHSs locating within 1 kb upstream of them would be discard. |
DHS |
ChIP-seq data of DNase I hypersensitive sites(DHSs). |
allmirdhs_byforce |
When we use DHS data to check the validity of TSSs,
there is possibility where no DHSs locates within 1 kb upstream of all
putative TSSs and all these putative TSSs might be filtered out by our
method resulting no outputs. While " |
expressed_gene |
Users can speicify genes expressed in certain
cell-lines that is analyzed. Or the default value is " |
allmirgene_byforce |
While integrating expressed_gene data to improve
prediction, there might be a circumstance where all the putative TSS are
discarded. To prevent this condition, users are allowed to use
" |
NOTICE that this function is used for visualizing the predicted result of ONLY ONE specific miRNA every single time.
There will be six tracks plotted as return:
Chrom
: Position of miRNA on the chromosome.
hg38
: Reference genome coordinate in hg38.
pri-miRNA
: Position of pri-miRNA.
Ensemble genes
: Position of related protein-coding gene.
eponine score
: Score of best putative TSS conducted by eponine method.
conservation score
: Conservation score should be integrated with
eponine score to find out putative TSSs.
expressed_mir <- "hsa-mir-5697" bed_merged <- data.frame( chrom = c("chr1", "chr1", "chr1", "chr1", "chr2"), start = c(9180799, 9201483, 9234339, 9942202, 9830615), end = c(9183889, 9202580, 9235853, 9944469, 9917994), stringsAsFactors = FALSE ) bed_merged <- as(bed_merged, "GRanges") ## Not run: plot_primiRNA(expressed_mir, bed_merged) ## End(Not run)
expressed_mir <- "hsa-mir-5697" bed_merged <- data.frame( chrom = c("chr1", "chr1", "chr1", "chr1", "chr2"), start = c(9180799, 9201483, 9234339, 9942202, 9830615), end = c(9183889, 9202580, 9235853, 9944469, 9917994), stringsAsFactors = FALSE ) bed_merged <- as(bed_merged, "GRanges") ## Not run: plot_primiRNA(expressed_mir, bed_merged) ## End(Not run)
A fast, convenient tool to identify the TSSs of miRNAs by integrating the data of H3K4me3 and Pol II as well as combining the conservation level and sequence feature, provided within both command-line and graphical interfaces, which achieves a better performance than the previous non-cell-specific methods on miRNA TSSs.
See find_tss
for deailted instruction
to predict TSSs of miRNA;
See run_primirTSSapp
to predict TSSs of miRNA using graphical web interface.
Maintainer: Pumin Li [email protected]
Other contributors: Qi Xu [email protected]
Useful links: Xu HUA et al
A graphical web interface is provided for users to achieve the functions of
find_tss
and plot_primiRNA
to intuitively and
conveniently predict putative TSSs of miRNA.
run_primirTSSapp()
run_primirTSSapp()
Users can refer documents of the two functions mentioned ABOVE for details.
A graphical interface.
## Not run: run_primirTSSapp() ## End(Not run)
## Not run: run_primirTSSapp() ## End(Not run)
Convert coordinates between different genomes when necessary.
trans_cor(peak, hg_from, hg_to)
trans_cor(peak, hg_from, hg_to)
peak |
A |
hg_from |
The genome are coverting from. This parameter can be "hg18", "hg19" or "hg38", etc. |
hg_to |
Which type the genome is converting to. This parameter can be "hg18",
"hg19" or "hg38", etc. NOTICE |
A GRanges object.
peak_df <- data.frame(chrom = c("chr7", "chr7", "chr7"), chromStart = c(128043908, 128045075, 128046242), chromEnd = c(128045074, 128046241, 128047408), stringsAsFactors = FALSE) peak <- as(peak_df, "GRanges") trans_cor(peak, "hg19", "hg38")
peak_df <- data.frame(chrom = c("chr7", "chr7", "chr7"), chromStart = c(128043908, 128045075, 128046242), chromEnd = c(128045074, 128046241, 128047408), stringsAsFactors = FALSE) peak <- as(peak_df, "GRanges") trans_cor(peak, "hg19", "hg38")