Title: | Differential pattern analysis for Ribo-seq data |
---|---|
Description: | This package performs differential pattern analysis for Ribo-seq data. It identifies genes with significantly different patterns in the ribosome footprint between two conditions. RiboDiPA contains five major components including bam file processing, P-site mapping, data binning, differential pattern analysis and footprint visualization. |
Authors: | Keren Li [aut], Matt Hope [aut], Xiaozhong Wang [aut], Ji-Ping Wang [aut, cre] |
Maintainer: | Ji-Ping Wang <[email protected]> |
License: | LGPL (>= 3) |
Version: | 1.15.1 |
Built: | 2024-12-06 03:50:07 UTC |
Source: | https://github.com/bioc/RiboDiPA |
igvR
(per bin)
This function outputs a list of GRanges
format objects of binned
ribosome P-site footprints, as well as test results. It can be used for
igvR
visualization.
binTrack(data, exon.anno)
binTrack(data, exon.anno)
data |
Data object from |
exon.anno |
|
R package igvR
is not implemented in RiboDiPA
. Users need install
igvR
through Bioconductor or relevant source package. A simple
illustration example of how to use it for igvR
visualization is given
below.
A list of GRanges
format objects. Each element is a GRanges
object of the binned P-site footprint tracks with differential pattern test
results.
data(result.pst) data(data.psite) tracks.bin <- binTrack(data = result.pst, exon.anno = data.psite$exons) # library(igvR) # thred <- 0.05 # igv <- igvR() # setBrowserWindowTitle(igv, "ribosome binned track example") # setGenome(igv, "saccer3") # for(track.name in names(tracks.bin)){ # track.rep <- tracks.bin[[track.name]] # resize(track.rep, width(track.rep) + 1) # track <- GRangesQuantitativeTrack(trackName = paste(track.name,"binned"), # track.rep[,1], color = "black") # displayTrack(igv, track) # } # track.rep2 <- tracks.bin[[1]] # sig.bin <- (values(track.rep2)[,5] <= thred) # log10.padj <- - log10(values(track.rep2)[,5]) # mcols(track.rep2) <- data.frame(log10padj = log10.padj) # track.rep2 <- track.rep2[which(sig.bin),] # track <- GRangesQuantitativeTrack(trackName = "- log 10 of padj", # track.rep2, color = "red", trackHeight = 40) # displayTrack(igv, track)
data(result.pst) data(data.psite) tracks.bin <- binTrack(data = result.pst, exon.anno = data.psite$exons) # library(igvR) # thred <- 0.05 # igv <- igvR() # setBrowserWindowTitle(igv, "ribosome binned track example") # setGenome(igv, "saccer3") # for(track.name in names(tracks.bin)){ # track.rep <- tracks.bin[[track.name]] # resize(track.rep, width(track.rep) + 1) # track <- GRangesQuantitativeTrack(trackName = paste(track.name,"binned"), # track.rep[,1], color = "black") # displayTrack(igv, track) # } # track.rep2 <- tracks.bin[[1]] # sig.bin <- (values(track.rep2)[,5] <= thred) # log10.padj <- - log10(values(track.rep2)[,5]) # mcols(track.rep2) <- data.frame(log10padj = log10.padj) # track.rep2 <- track.rep2[which(sig.bin),] # track <- GRangesQuantitativeTrack(trackName = "- log 10 of padj", # track.rep2, color = "red", trackHeight = 40) # displayTrack(igv, track)
igvR
(per bp)
This function outputs a list of GRanges
format objects of
ribosome P-site footprints per bp. It can be used for igvR
visualization.
bpTrack(data, names.rep = NULL, genes.list)
bpTrack(data, names.rep = NULL, genes.list)
data |
Data object from |
names.rep |
Customized names of the replicates. Default value uses the column names
of |
genes.list |
A list of genes for visualization. Default value uses all genes listed in
|
R package igvR
is not implemented in RiboDiPA
. Users need install
igvR
through Bioconductor or relevant source package. A simple
illustration example of how to use it for igvR
visualization is given
below.
A list of GRanges
format objects. Each element is a GRanges
object of the P-site footprint tracks of selected genes.
data(data.psite) names.rep <- c("mutant1", "mutant2", "wildtype1", "wildtype2") tracks.bp <- bpTrack(data = data.psite, names.rep = names.rep, genes.list = c("YDR050C", "YDR062W", "YDR064W")) # library(igvR) # igv <- igvR() # setBrowserWindowTitle(igv, "ribosome footprint track example") # setGenome(igv, "saccer3") # for(track.name in names.rep){ # track.rep <- tracks.bp[[track.name]] # track <- GRangesQuantitativeTrack(trackName = paste(track.name, "bp"), # track.rep[,1], color = "green") # displayTrack(igv, track) # }
data(data.psite) names.rep <- c("mutant1", "mutant2", "wildtype1", "wildtype2") tracks.bp <- bpTrack(data = data.psite, names.rep = names.rep, genes.list = c("YDR050C", "YDR062W", "YDR064W")) # library(igvR) # igv <- igvR() # setBrowserWindowTitle(igv, "ribosome footprint track example") # setGenome(igv, "saccer3") # for(track.name in names.rep){ # track.rep <- tracks.bp[[track.name]] # track <- GRangesQuantitativeTrack(trackName = paste(track.name, "bp"), # track.rep[,1], color = "green") # displayTrack(igv, track) # }
A example data containing binned ribosome P-site tracks of 4 replicates on 885
genes, two biological replicates each for wild type cells and
New1 mutant cells, respectively. It is the output of the data binning function
dataBinning
on P-site coverage data, and input for diffPatternTest
function for differential pattern analysis.
data("data.binned")
data("data.binned")
A list of 885 matrices corresponding to 885 genes: in each matrix, rows correspond to replicates, columns correspond to bins. Bin names are set to "start-end" genomic coordinates.
The raw data was adapted from Kasari et al 2019.
data(data.binned) classlabel <- data.frame(condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1)) rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2") result.pst <- diffPatternTest(data = data.binned, classlabel = classlabel, method = c('gtxr', 'qvalue'))
data(data.binned) classlabel <- data.frame(condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1)) rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2") result.pst <- diffPatternTest(data = data.binned, classlabel = classlabel, method = c('gtxr', 'qvalue'))
An example data set containing 4 ribo-seq replicates of 885
genes, two biological replicates each for wild type cells and
New1 mutant cells, respectively. It is the output of P-site mapping funcion
psiteMapping
. It contains the
p-site count at each location of the total transcript within each
replicate.
data("data.psite")
data("data.psite")
A list of size 4
ribosome P-site coverage tracks
ribosome P-site total count, one count per gene
P-site mapping offset rule
relative start and end positions of each exon in the total transcript if a given gene, as well as genomic start and end coordinates
Raw data was adapted from Kasari et al 2019.
data(data.psite) data.binned <- dataBinning(data = data.psite$coverage, bin.width = 0, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)
data(data.psite) data.binned <- dataBinning(data = data.psite$coverage, bin.width = 0, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)
This function bins a mapped P-site data matrix for a given gene into a binned matrix, for statistical testing downstream. Data can be adaptively binned, where each gene has a different number of bins and bin widths, but the bin positions for a given gene are the same across different conditions and replicates. Alternatively, data can also be binned into bins of fixed width, down to the single-codon level.
dataBinning(data, bin.width = 0, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = NULL)
dataBinning(data, bin.width = 0, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = NULL)
data |
A list of mapped P-site position matrices from the |
bin.width |
Binning width per bin. If specified, it is the number of codons merged per bin; if not specified, an adaptive binning width method is used. |
zero.omit |
If the |
bin.from.5UTR |
When the coding region length is not any integer multiple of binning
width, and if value of |
cores |
The number of cores to use for parallel execution. If not specified,
the number of cores is set to the value of |
We recommend to use an adaptive bin width following the
Freedman-Diaconis rule,
. To see
certain regions of transcripts in greater detail (e.g. near the start
and stop codons), a specified bin.width
per bin can be used to
check the local differential pattern, though it may lead to low power
at small fold change positions and potentially high computational time.
A list of binned P-site footprint matrices: in each matrix, rows correspond to replicates, columns correspond to bins. Bin names are set to "start-end" genomic coordinates.
data(data.psite) data.binned <- dataBinning(data = data.psite$coverage, bin.width = 0, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2) data.codon <- dataBinning(data = data.psite$coverage, bin.width = 1, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)
data(data.psite) data.binned <- dataBinning(data = data.psite$coverage, bin.width = 0, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2) data.codon <- dataBinning(data = data.psite$coverage, bin.width = 1, zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)
The normalized gene data are pooled into a large matrix, where parameter estimations and tests are performed. Within each gene, multiplicity correction are then performed for codon/bin-level p-values. The minimum of adjusted codon/bin-level p-value is defined to be the gene-level p-value.
diffPatternTest(data, classlabel, method = c('gtxr', 'qvalue'))
diffPatternTest(data, classlabel, method = c('gtxr', 'qvalue'))
data |
A list of named matrices input from the |
classlabel |
For matrix input: a DataFrame or data.frame with at least a column
|
method |
For a 2-component character vector input: the first argument is the
multiplicity correction method for codon/bin-level p-value adjustment.
The second argument is the multiplicity correction method for
gene-level p-value adjustment. Methods include: "qvalue" for q-value
from |
Using binned data, this function first estimates normalizing constant by exclusing outlier bins which may represent the true differential pattern. An outlier bin is defined as that whose log2-fold change value is more than 1.5 interquartile ranges below the first quartile or above the third quartile. For a given gene, the normalizing constant is defined based on the total read counts from each replicate.
It then performs differential pattern testing on P-site counts bin by bin for each gene. Briefly, counts are modeled by a negative binomial distribution to call bins with statistically significant differences across conditions, bin level p-values are adjusted for multiple hypothesis testing for a given gene, and then the smallest p-value for a gene is adjusted to control for multiple hypothesis testing across all genes.
Additionally, the T-value is a supplementary statistic that quantifies the magnitude of difference between conditions, with larger numbers indicating a greater difference. The $T$-value is defined to be 1-cosine of the angle between the first right singular vectors of the footprint matrices of the two conditions under comparison. It ranges from 0-1, with larger values representing larger differences between conditions, and practically speaking, can be used to identify genes with larger magnitude of pattern difference beyond statistical significance. This might be helpful to investigators to prioritize certain genes for investigation among many that may pass the significance test for differential pattern.
bin |
A List object of codon/bin-level results. Each element
of list is of a gene, containing codon/bin results columns: |
gene |
A DataFrame object of gene-level results. It contains columns:
|
method |
The same as input |
small |
Names of genes without sufficient reads, not reported in
|
data |
Subset of input |
classlabel |
The same as input |
data(data.binned) classlabel <- data.frame(condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1)) rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2") result.pst <- diffPatternTest(data = data.binned, classlabel = classlabel, method = c('gtxr', 'qvalue'))
data(data.binned) classlabel <- data.frame(condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1)) rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2") result.pst <- diffPatternTest(data = data.binned, classlabel = classlabel, method = c('gtxr', 'qvalue'))
An alternative version of diffPatternTest
for exon level binning. Both
data binning and differential pattern analysis are implemented. Instead of a
fixed width or adaptive method, the positions of exons in the genome are
used as bins. Therefore the number of exons per gene and their relative
sizes determines the bins used for differential pattern testing.
diffPatternTestExon(psitemap, classlabel, method = c("gtxr", "qvalue"))
diffPatternTestExon(psitemap, classlabel, method = c("gtxr", "qvalue"))
psitemap |
A list object from value of |
classlabel |
For matrix input: a DataFrame or data.frame with at least a column
|
method |
For a 2-component character vector input: the first argument is the
multiplicity correction method for exon-level p-value adjustment. The
second argument is the multiplicity correction method for gene-level
p-value adjustment. Methods include: "qvalue" for q-value from
|
For mammalian species, when the reads are sparse, it’s more meaningful
to perform a exon level pattern analysis. diffPatternTestExon
() provides
the option of exon level pattern differentiation analysis by treating
each exon as one bin. But for organisms such as yeast, as most genes
only contain one exon, the exon-level analysis is not meaningful
since the analysis will simply result in the RNA-seq type of analysis,
i.e. differential abundance test instead of the pattern analysis.
Using diffPatternTestExon
() on yeast data is not for organisms with
minimal alternative splicing or multiple exons. For a given gene,
the normalizing constant is estimated at codon level.
bin |
A List object of exon-level results. Each element of list
is of a gene, containing exon results columns: |
gene |
A DataFrame object of gene-level results. It contains
columns: |
small |
Names of genes without sufficient reads |
classlabel |
The same as input |
data |
A list of exon-binned P-site footprint matrices: in each matrix,
rows corrspond to replicates, columns corrspond to exons. All genes
reported in |
method |
The same as input |
diffPatternTest
data(data.psite) classlabel <- data.frame(condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison=c(2, 2, 1, 1)) rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2") result.exon <- diffPatternTestExon(psitemap = data.psite, classlabel = classlabel, method = c('gtxr', 'qvalue'))
data(data.psite) classlabel <- data.frame(condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison=c(2, 2, 1, 1)) rownames(classlabel) <- c("mutant1", "mutant2", "wildtype1", "wildtype2") result.exon <- diffPatternTestExon(psitemap = data.psite, classlabel = classlabel, method = c('gtxr', 'qvalue'))
igvR
(per exon)
This function outputs a list of GRanges
format objects of
ribosome P-site footprints per exon, as well as corresponding test results.
It can be used for igvR
visualization.
exonTrack(data, gene)
exonTrack(data, gene)
data |
Data object from |
gene |
One gene for visualization at a time, since different genes have different number of transcripts. |
R package igvR
is not implemented in RiboDiPA
. Users need install
igvR
through Bioconductor or relevant source package. A simple
illustration example of how to use it for igvR
visualization is given
below.
A list of lists. Each element is a list of GRanges
objects representing
replicates. Each second level list element is P-site footprint count per exon
with differential pattern test results in a transcript.
data(result.exon) tracks.exon <- exonTrack(data = result.exon, gene = "tY(GUA)D") # library(igvR) # igv <- igvR() # setBrowserWindowTitle(igv, "ribosome footprint per exon example") # setGenome(igv, "saccer3") # for(track.name in names(tracks.exon)){ # track.rep <- tracks.exon[[track.name]] # for(tx.name in names(track.rep)){ # track.tx <- tracks.exon[[track.name]][[tx.name]] # track <- GRangesQuantitativeTrack(trackName = # paste(track.name, tx.name), track.tx[,1], color = track.name) # displayTrack(igv, track) # } # }
data(result.exon) tracks.exon <- exonTrack(data = result.exon, gene = "tY(GUA)D") # library(igvR) # igv <- igvR() # setBrowserWindowTitle(igv, "ribosome footprint per exon example") # setGenome(igv, "saccer3") # for(track.name in names(tracks.exon)){ # track.rep <- tracks.exon[[track.name]] # for(tx.name in names(track.rep)){ # track.tx <- tracks.exon[[track.name]][[tx.name]] # track <- GRangesQuantitativeTrack(trackName = # paste(track.name, tx.name), track.tx[,1], color = track.name) # displayTrack(igv, track) # } # }
This function calculate the relative abundance of samples, in essence accounting for different sequencing depths across different Ribo-seq experiments.
normFactor(x, condition)
normFactor(x, condition)
x |
A matrix of mapped P-site positions. |
condition |
A vector of indicators. |
A vector of relative abundances.
data(data.binned) x <- data.binned$YDR050C condition <- c(2, 2, 1, 1) normFactor(x, condition)
data(data.binned) x <- data.binned$YDR050C condition <- c(2, 2, 1, 1) normFactor(x, condition)
This function visualizes the ribosome footprint in the form of mapped P-site frequency at the bin level along the total transcript. Bins that test positive for statistically significant differences are marked in black. Plotting is implmented with the ggplot2 package.
plotTest(result, genes.list = NULL, threshold = 0.05)
plotTest(result, genes.list = NULL, threshold = 0.05)
result |
Data object resulting from |
genes.list |
|
threshold |
The q-value threshold for genes whose footprint to be visualized.
This argument is ignored if |
Bin-level tracks of genes and test results.
data(result.pst) plotTest(result = result.pst, genes.list = NULL, threshold = 0.05)
data(result.pst) plotTest(result = result.pst, genes.list = NULL, threshold = 0.05)
This function visualizes the ribosome footprint in the form of P-site frequency at the per nucleotide level along the total transcript. Plotting is implemented with the ggplot2 package.
plotTrack(data, genes.list, replicates = NULL,exons = FALSE)
plotTrack(data, genes.list, replicates = NULL,exons = FALSE)
data |
Data object from |
genes.list |
A list of genes for visualization. |
replicates |
Names of the replicates for which the footprint to visualize. The default is for all. |
exons |
If value is |
Visualizes the Ribo-seq per nucleotide footprint on merged exons
of the genes and replicates specified. If exons
is TRUE
,
Ribo-seq footprint per exon of specified genes is also output.
data(data.psite) plotTrack(data = data.psite, genes.list = c("YDR050C", "YDR064W"), replicates = NULL, exons = FALSE)
data(data.psite) plotTrack(data = data.psite, genes.list = c("YDR050C", "YDR064W"), replicates = NULL, exons = FALSE)
This function outputs the P-site position, provided the CIGAR string of the alignment and the start position of read. It is implemented as a C++ function using the Rcpp package.
psiteCal(cigar, start, psitemap)
psiteCal(cigar, start, psitemap)
cigar |
A vector of CIGAR strings. |
start |
A vector of read start positions. |
psitemap |
A vector of relative P-site positions, which describe the offset from the 5 prime most nucleotide of the read to the P-site. |
A vector of P-site positions for the reads.
ex.cigar <- c("21M74731N7M", "2S11M57302N12M3S", "28M", "27M1S") ex.start <- c(177640, 249163, 249286, 249290) ex.psitemap <- c(9, 18, 9, 9) psiteCal(ex.cigar, ex.start, ex.psitemap)
ex.cigar <- c("21M74731N7M", "2S11M57302N12M3S", "28M", "27M1S") ex.start <- c(177640, 249163, 249286, 249290) ex.psitemap <- c(9, 18, 9, 9) psiteCal(ex.cigar, ex.start, ex.psitemap)
This function computes the corresponding P-site locations of all RPFs and the total RPF read counts for all genes and samples.
psiteMapping(bam_file_list, gtf_file, psite.mapping = "auto", cores = NULL)
psiteMapping(bam_file_list, gtf_file, psite.mapping = "auto", cores = NULL)
bam_file_list |
A vector of bam file names to be tested. Users should include path names if not located in the current working directory. Index files (.bai) will be generated if not already present. |
gtf_file |
Annotation file used to generate the BAM alignments. Note that a GTF file sourced from one organization (e.g. Ensembl) cannot be used with BAM files aligned with a GTF file sourced from another organization (e.g. UCSC). |
psite.mapping |
Rules for P-site offsets, to map a given read length of RPF to a P-site. Input for this parameter is a string input, or a user defined matrix assigning the P-site mapping rules. Strings include: "center" for taking center of the read as the P-site, and "auto" for an optimal P-site offset, which is the default. A user defined matrix should include two columns: "qwidth" and "psite", where "qwidth" is the range of possible read lengths and "psite" is the corresponding offset from the 5' end to map to the P-site. |
cores |
The number of cores to use for parallel execution. If not specified, the
number of cores is set to the value of |
All exons from the same gene are concatenated into a total transcript in order to get a merged picture of translation, using the reduce() function from the GRanges package to accomplish the concatenation. Then, RPFs are mapped with respect to the total transcript and the P-site positions are inferred accordingly.
If 'psite.mapping' is unspecified, a two-step algorithm on start codons of CDS regions is used to compute optimal P-site offsets, following Lauria et al (2018). First, for a given read length, the offset is calculated by taking the distance between the first nucleotide of the start codon and the 5' most nucleotide of the read, and then defining the offset as the 5' position with the most reads mapped to it. This process is repeated for all read lengths and then the temporary global offset is defined to be the offset of the read length with the maximum count. Lastly, for each read length, the adjusted offset is defined to be the one corresponding to the local maximum found in the profiles of the start codons closest to the temporary global offset.
The function will return a list of matrices that can then be used for data binning and downstream analysis, among other data objects.
coverage |
A list object of matrices. Each element is a matrix representing the P-site footprints of a gene. Rows correspond to replicates and columns corrspond to nucleotide location with respect to the total transcript. Each column is named by its genomic coordinate. |
counts |
A matrix object of read counts. Rows correspond to genes and columns correspond to replicates. |
psite.mapping |
The P-site (or A-site) mapping rule used to map RPFs to P-site positions. |
exons |
A List object of matrices. Each element contains the relative start and end positions of exons in the gene with respect to the total transcript for that given gene, as well as genomic start and end coordinates. |
Lauria, F., Tebaldi, T., Bernabò, P., Groen, E., Gillingwater, T. H., & Viero, G. (2018). riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data. PLoS computational biology, 14(8), e1006169.
library(BiocFileCache) file_names <- c("WT1.bam", "WT2.bam", "MUT1.bam", "MUT2.bam", "eg.gtf") url <- "https://github.com/jipingw/RiboDiPA-data/raw/master/" bfc <- BiocFileCache() bam_path <- bfcrpath(bfc,paste0(url,file_names)) classlabel <- data.frame( condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1) ) rownames(classlabel) <- c("mutant1","mutant2","wildtype1","wildtype2") data.psite <- psiteMapping(bam_file_list = bam_path[1:4], gtf_file = bam_path[5], psite.mapping = "auto", cores = 2)
library(BiocFileCache) file_names <- c("WT1.bam", "WT2.bam", "MUT1.bam", "MUT2.bam", "eg.gtf") url <- "https://github.com/jipingw/RiboDiPA-data/raw/master/" bfc <- BiocFileCache() bam_path <- bfcrpath(bfc,paste0(url,file_names)) classlabel <- data.frame( condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1) ) rownames(classlabel) <- c("mutant1","mutant2","wildtype1","wildtype2") data.psite <- psiteMapping(bam_file_list = bam_path[1:4], gtf_file = bam_path[5], psite.mapping = "auto", cores = 2)
An example output generated by the differential pattern analysis
function diffPatternTestExon
, including counts per exon,
exon-level differential pattern results, etc.
data("result.exon")
data("result.exon")
A list of 5
A list object of exon-level test results. Each element
of the list is the result from a gene, containing columns: pvalue
,
log2FoldChange
, the adjusted p-value by method "gtxr", and other exon
genomic information.
Gene-level differential pattern results, including T-value, p-value, and q-value
See diffPatternTestExon
Counts per exon, binned from data
, for
the final differential pattern analysis in the format of a list of named
matrices. In each element of the list, rows correspond to replicates,
columns correspond to exons.
The ribosome P-site coverage tracks in the format of a list of named matrices. In each element of the list, rows correspond to replicates, columns correspond to bps.
See diffPatternTestExon
The data was adapted from Kasari et al 2019.
data(result.exon) tracks.exon <- exonTrack(data = result.exon, gene = "YDR050C")
data(result.exon) tracks.exon <- exonTrack(data = result.exon, gene = "YDR050C")
An example output generated by the differential pattern analysis
function diffPatternTest
, including binned data, differential
pattern results, etc.
data("result.pst")
data("result.pst")
A list of size 5
A list object of codon/bin-level test results. Each element
of the list is the result from a gene, containing columns: pvalue
,
log2FoldChange
, and the adjusted p-value by method "gtxr"
Gene-level differential pattern results, including T-value, p-value, and q-value
See diffPatternTest
The input data for differential pattern analysis in the format of a list of named matrices. In each element of the list, rows correspond to replicates,columns correspond to bins.
See diffPatternTest
.
Names of genes without sufficient reads, not reported in
bin
and gene
.
The data was adapted from Kasari et al 2019.
data(result.pst) plotTrack(data = data.psite, genes.list = c("YDR050C", "YDR064W"), replicates = NULL, exons = FALSE)
data(result.pst) plotTrack(data = data.psite, genes.list = c("YDR050C", "YDR064W"), replicates = NULL, exons = FALSE)
A wrapper function for the RiboDiPA pipeline, that will call PsiteMapping, DataBinning, and DPTest in order. This function is provided for users' convenience and requires BAM files (one per biological replicate), a GTF file, and a classlabel object describing what comparisons to make. The minimal output from the function is a list of genes with signficant differential patterns.
RiboDiPA(bam_file_list, gtf_file, classlabel, psite.mapping = "auto", exon.binning = FALSE, bin.width = 0,zero.omit = FALSE, bin.from.5UTR = TRUE, method = c("gtxr", "qvalue"), cores = NULL)
RiboDiPA(bam_file_list, gtf_file, classlabel, psite.mapping = "auto", exon.binning = FALSE, bin.width = 0,zero.omit = FALSE, bin.from.5UTR = TRUE, method = c("gtxr", "qvalue"), cores = NULL)
bam_file_list |
A vector of bam file names to be tested. Users should include path names if not located in the current working directory. Index files (.bai) will be generated if not already present. |
gtf_file |
Annotation file used to generate the BAM alignments. Note that a GTF file sourced from one organization (e.g. Ensembl) cannot be used with BAM files aligned with a GTF file sourced from another organization (e.g. UCSC). |
classlabel |
For matrix input: a DataFrame or data.frame with at least one column
named |
psite.mapping |
Rules for P-site offsets, to map a given read length of RPF to a P-site.
See |
exon.binning |
Logical indicator. If |
bin.width |
Binning width per bin. |
zero.omit |
If this parameter is |
bin.from.5UTR |
When the coding region length is not any integer multiple of binning
width, and if value of |
method |
2-component character vector specifies the multiplicity correction
method for codon/bin-level p-value adjustment. The default See
|
cores |
The number of cores to use for parallel execution. If not specified,
the number of cores is set to the value of |
bin |
A List object of codon/bin-level results. Each element of
list is of a gene, containing codon/bin results columns: |
gene |
A DataFrame object of gene-level results. It contains columns:
|
small |
Names of genes without sufficient reads |
classlabel |
The same as input |
data |
Tracks of binned data of all genes reported in |
method |
The same as input |
coverage |
A list object of matrices. Each element is a matrix representing the P-site footprints of a gene. Rows corrspond to replicates and columns corrspond to nucleotide location with reference to the total transcript. |
counts |
A matrix object of read counts. Rows corrspond to genes and columns corrspond to replicates. |
exons |
A List object of matrices. Each element contains the relative start and end positions of exons in the gene with reference to the total transcript |
psite.mapping |
The P-site mapping rule or A-site mapping rule used. |
psiteMapping, dataBinning,
diffPatternTest, diffPatternTestExon
library(BiocFileCache) file_names <- c("WT1.bam", "WT2.bam", "MUT1.bam", "MUT2.bam", "eg.gtf") url <- "https://github.com/jipingw/RiboDiPA-data/raw/master/" bfc <- BiocFileCache() bam_path <- bfcrpath(bfc,paste0(url,file_names)) classlabel <- data.frame( condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1) ) rownames(classlabel) <- c("mutant1","mutant2","wildtype1","wildtype2") result.pip <- RiboDiPA(bam_path[1:4], bam_path[5], classlabel, cores=2)
library(BiocFileCache) file_names <- c("WT1.bam", "WT2.bam", "MUT1.bam", "MUT2.bam", "eg.gtf") url <- "https://github.com/jipingw/RiboDiPA-data/raw/master/" bfc <- BiocFileCache() bam_path <- bfcrpath(bfc,paste0(url,file_names)) classlabel <- data.frame( condition = c("mutant", "mutant", "wildtype", "wildtype"), comparison = c(2, 2, 1, 1) ) rownames(classlabel) <- c("mutant1","mutant2","wildtype1","wildtype2") result.pip <- RiboDiPA(bam_path[1:4], bam_path[5], classlabel, cores=2)