Title: | Assessment of duplication rates in RNA-Seq datasets |
---|---|
Description: | Duplication rate quality control for RNA-Seq datasets. |
Authors: | Sergi Sayols <[email protected]>, Holger Klein <[email protected]> |
Maintainer: | Sergi Sayols <[email protected]>, Holger Klein <[email protected]> |
License: | GPL-3 |
Version: | 1.37.0 |
Built: | 2024-12-18 05:41:44 UTC |
Source: | https://github.com/bioc/dupRadar |
analyzeDuprates
returns a data.frame with tag counts
analyzeDuprates( bam, gtf, stranded = 0, paired = FALSE, threads = 1, verbose = FALSE, ... )
analyzeDuprates( bam, gtf, stranded = 0, paired = FALSE, threads = 1, verbose = FALSE, ... )
bam |
The bam file containing the duplicate-marked reads |
gtf |
The gtf file describing the features |
stranded |
Whether the reads are strand specific |
paired |
Paired end experiment? |
threads |
The number of threads to be used for counting |
verbose |
Whether to output Rsubread messages into the console |
... |
Other params sent to featureCounts |
This function makes use of the Rsubread package to count tags on the GTF features in different scenarios. The scenarios are the 4 possible combinations of allowing multimappers (yes/no) and duplicate reads (yes/no).
A data.frame with counts on features, with and without taking into account multimappers/duplicated reads
bam <- system.file("extdata", "wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2_duprm.bam", package="dupRadar") gtf <- system.file("extdata","genes.gtf",package="dupRadar") stranded <- 2 # '0' (unstranded), '1' (stranded) and '2' (reverse) paired <- FALSE threads <- 4 # call the duplicate marker and analyze the reads dm <- analyzeDuprates(bam,gtf,stranded,paired,threads)
bam <- system.file("extdata", "wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2_duprm.bam", package="dupRadar") gtf <- system.file("extdata","genes.gtf",package="dupRadar") stranded <- 2 # '0' (unstranded), '1' (stranded) and '2' (reverse) paired <- FALSE threads <- 4 # call the duplicate marker and analyze the reads dm <- analyzeDuprates(bam,gtf,stranded,paired,threads)
bamutilMarkDuplicates
Mark duplicated reads from a BAM file by
calling bamutil
bamutilMarkDuplicates(bam, out, path, verbose)
bamutilMarkDuplicates(bam, out, path, verbose)
bam |
The bam file to mark duplicates from |
out |
Regular expression describing the transformation on the original filename to get the output filename. By default, a "_duprm" suffix is added before the bam extension |
path |
Path to the duplicate marker binaries |
verbose |
Redirect all the program output to the R console |
This function is supposed to be called through the markDuplicates
wrapper
The return code of the system call
cumulativeDuprateBarplot
Barplot showing the cumulative read counts
fraction
cumulativeDuprateBarplot(DupMat, stepSize = 0.05, ...)
cumulativeDuprateBarplot(DupMat, stepSize = 0.05, ...)
DupMat |
The duplication matrix calculated by |
stepSize |
The size of the windows used for plotting |
... |
Other params sent to barplot |
This function makes a barplot showing the cumulative read counts fraction
from the duplication matrix calculated by analyzeDuprates
.
nothing
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # call the plot cumulativeDuprateBarplot(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # call the plot cumulativeDuprateBarplot(DupMat=dm)
A dataset containing the duplication matrix of a good RNASeq experiment, in terms of duplicates. Comes from the GM12878 SR1x75 replicate 2 from Caltech (UCSC's table Browser name: wgEncodeCaltechRnaSeqGm12878R1x75dAlignsRep2V2)
data(dupRadar_examples)
data(dupRadar_examples)
A data frame with 23228 rows and 14 variables
A dataset containing the duplication matrix of a failed RNASeq experiment, containing unusual duplication rate. Comes from the HCT116 PE2x75 replicate 1 from Caltech (UCSC's table Browser name: wgEncodeCaltechRnaSeqHct116R2x75Il200AlignsRep1V2)
data(dupRadar_examples)
data(dupRadar_examples)
A data frame with 23228 rows and 14 variables
Precomputed duplication matrices for two RNASeq experiments used as examples of a good and a failed (in terms of high redundancy of reads) experiments. The experiments come from the ENCODE project, as a source of a broad variety of protocols, library types and sequencing facilities.
data(dupRadar_examples)
data(dupRadar_examples)
A list with two example duplication matrices
duprateExpBoxplot
Duplication rate ~ total reads per kilobase (RPK) boxplot
duprateExpBoxplot(DupMat, stepSize = 0.05, ...)
duprateExpBoxplot(DupMat, stepSize = 0.05, ...)
DupMat |
The duplication matrix calculated by |
stepSize |
Expression bin seze for the boxplot |
... |
Other params sent to boxplot |
This function makes a boxplot showing the distribution of per gene duplication rate versus the reads per kilobase (RPK) inside gene expression bins.
nothing
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate boxplot duprateExpBoxplot(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate boxplot duprateExpBoxplot(DupMat=dm)
duprateExpDensPlot
Duplication rate ~ total read count plot
duprateExpDensPlot( DupMat, pal = c("cyan", "blue", "green", "yellow", "red"), tNoAlternative = TRUE, tRPKM = TRUE, tRPKMval = 0.5, tFit = TRUE, addLegend = TRUE, ... )
duprateExpDensPlot( DupMat, pal = c("cyan", "blue", "green", "yellow", "red"), tNoAlternative = TRUE, tRPKM = TRUE, tRPKMval = 0.5, tFit = TRUE, addLegend = TRUE, ... )
DupMat |
The duplication matrix calculated by |
pal |
The color palette to use to display the density |
tNoAlternative |
Display threshold of 1000 reads per kilobase |
tRPKM |
Display threshold at a given RPKM level |
tRPKMval |
The given RPKM level |
tFit |
Whether to fit the model |
addLegend |
Whether to add a legend to the plot |
... |
Other parameters sent to plot() |
This function makes a scatter plot showing the per gene duplication rate versus the total read count.
nothing
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate plot duprateExpDensPlot(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate plot duprateExpDensPlot(DupMat=dm)
duprateExpDensPlot
Duplication rate ~ total read count fit model
duprateExpFit(DupMat)
duprateExpFit(DupMat)
DupMat |
The duplication matrix calculated by |
Fit a Generalized Linear Model using a logit function between thegene duplication rate and the total read count.
The GLM and the coefficients of the fitted logit function
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate plot duprateExpFit(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate plot duprateExpFit(DupMat=dm)
duprateExpPlot
duprateExpIdentify
Identify genes plotted by duprateExpPlot
duprateExpIdentify(DupMat, idCol = "ID")
duprateExpIdentify(DupMat, idCol = "ID")
DupMat |
The duplication matrix calculated by |
idCol |
The column from the duplication matrix containing the labels |
This function makes a barplot showing the cumulative read counts fraction
from the duplication matrix calculated by analyzeDuprates
.
The identified points. x
and y
values match the ones
from duprateExpPlot
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # call the plot and identify genes duprateExpPlot(DupMat=dm) duprateExpIdentify(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # call the plot and identify genes duprateExpPlot(DupMat=dm) duprateExpIdentify(DupMat=dm)
duprateExpPlot
Duplication rate ~ total read count plot
duprateExpPlot( DupMat, tNoAlternative = TRUE, tRPKM = TRUE, tRPKMval = 0.5, addLegend = TRUE, ... )
duprateExpPlot( DupMat, tNoAlternative = TRUE, tRPKM = TRUE, tRPKMval = 0.5, addLegend = TRUE, ... )
DupMat |
The duplication matrix calculated by |
tNoAlternative |
Display threshold of 1000 reads per kilobase |
tRPKM |
Display threshold at a given RPKM level |
tRPKMval |
The given RPKM level |
addLegend |
Whether to add a legend to the plot |
... |
Other parameters sent to smoothScatter() |
This function makes a smooth scatter plot showing the per gene duplication rate versus the total read count.
nothing
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate plot duprateExpPlot(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # duprate plot duprateExpPlot(DupMat=dm)
expressionHist
Draw histogram with the expression values
expressionHist(DupMat, value = "RPK", ...)
expressionHist(DupMat, value = "RPK", ...)
DupMat |
The duplication matrix calculated by |
value |
The column from the duplication matrix containing the expression values |
... |
Other parameters sent to hist() |
This function draws a histogram of the expression values from the
duplication matrix calculated by analyzeDuprates
.
nothing
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # histogram of expression values for annotation expressionHist(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # histogram of expression values for annotation expressionHist(DupMat=dm)
duprateExpBoxplot
getBinDuplication
get duplication rate for a subset of the
duplication matrix
getBinDuplication(p, stepSize, DupMat)
getBinDuplication(p, stepSize, DupMat)
p |
The vector of bins |
stepSize |
The window size |
DupMat |
The duplication matrix calculated by |
The duplication rate per bin
duprateExpBoxplot
getBinRpkMean
get mean duplication rate per bin
getBinRpkMean(p, stepSize, DupMat)
getBinRpkMean(p, stepSize, DupMat)
p |
The vector of bins |
stepSize |
The window size |
DupMat |
The duplication matrix calculated by |
The averaged RPK per bin
getBinDuplication
and
getBinRpkMean
getDupMatBin
get a subset of the matrix for values in a specific bin
defined by the upper bound p and stepSize
getDupMatBin(p, stepSize = 0.05, value = "allCounts", DupMat)
getDupMatBin(p, stepSize = 0.05, value = "allCounts", DupMat)
p |
The vector of bins |
stepSize |
The window size |
value |
The column to be subset |
DupMat |
The duplication matrix calculated by |
The subseted matrix
getDupMatStats
Report duplication stats based on the data calculated
in the duplication matrix
getDupMatStats(DupMat)
getDupMatStats(DupMat)
DupMat |
The duplication matrix calculated by |
A data.frame containing the stats about the number of genes covered (1+ tags) and the number of genes containing duplicates (1+)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # call the plot and identify genes getDupMatStats(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # call the plot and identify genes getDupMatStats(DupMat=dm)
getDynamicRange
Calculate the dynamic range of the RNAseq experiment
getDynamicRange(dm)
getDynamicRange(dm)
dm |
The duplication matrix calculated by |
This function calculates the dynamic range of the RNAseq eperiment
A list with 2 elements, containing the dynamic range counting all reads and the dynamic range after removing duplicates.
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # calculate the dynamic range getDynamicRange(dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # calculate the dynamic range getDynamicRange(dm)
readcountExpressionBoxplot
readcountExpressionBoxplot
Calculates the fraction of total reads in
a vector of bins
getRpkBinReadCountFraction(p, stepSize = stepSize, DupMat = DupMat)
getRpkBinReadCountFraction(p, stepSize = stepSize, DupMat = DupMat)
p |
The vector of bins |
stepSize |
The window size |
DupMat |
The duplication matrix calculated by |
The fraction of total reads in a vector of bins
readcountExpressionBoxplot
getRpkCumulativeReadCountFraction
get the cumulative read count
fraction
getRpkCumulativeReadCountFraction(p, DupMat = DupMat)
getRpkCumulativeReadCountFraction(p, DupMat = DupMat)
p |
The vector of bins |
DupMat |
The duplication matrix calculated by |
The cumulative read count fraction
markDuplicates
Mark duplicated reads from a BAM file by calling
widely used tools.
markDuplicates( dupremover = c("bamutil", "picard"), bam = NULL, out = gsub("\\.bam$", "_duprm.bam", bam), rminput = TRUE, path = ".", verbose = TRUE, ... )
markDuplicates( dupremover = c("bamutil", "picard"), bam = NULL, out = gsub("\\.bam$", "_duprm.bam", bam), rminput = TRUE, path = ".", verbose = TRUE, ... )
dupremover |
The tool to be called. Currently, "picard" and "bamutils" are supported |
bam |
The bam file to mark duplicates from |
out |
Regular expression describing the transformation on the original filename to get the output filename. By default, a "_duprm" suffix is added before the bam extension |
rminput |
Whether to keep the original, non duplicate-marked, bam file |
path |
Path to the duplicate marker binaries |
verbose |
Redirect all the program output to the R console |
... |
Other parameters sent to the caller function |
This function works as a wrapper for several tools widely adopted tr mark duplicated reads in a BAM file. Currently, it supports PICARD and BamUtils.
The output filename
## Not run: bam <- system.file("extdata","sample1Aligned.out.bam",package="dupRadar") gtf <- "genes.gtf" stranded <- 2 # '0' (unstranded), '1' (stranded) and '2' (reverse) paired <- FALSE threads <- 4 # call the duplicate marker and analyze the reads bamDuprm <- markDuplicates(dupremover="bamutil",bam, path="/opt/bamUtil-master/bin",rminput=FALSE) dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) ## End(Not run)
## Not run: bam <- system.file("extdata","sample1Aligned.out.bam",package="dupRadar") gtf <- "genes.gtf" stranded <- 2 # '0' (unstranded), '1' (stranded) and '2' (reverse) paired <- FALSE threads <- 4 # call the duplicate marker and analyze the reads bamDuprm <- markDuplicates(dupremover="bamutil",bam, path="/opt/bamUtil-master/bin",rminput=FALSE) dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) ## End(Not run)
picardMarkDuplicates
Mark duplicated reads from a BAM file by calling
picard tools
picardMarkDuplicates(bam, out, path, verbose, threads = 1, maxmem = "4g")
picardMarkDuplicates(bam, out, path, verbose, threads = 1, maxmem = "4g")
bam |
The bam file to mark duplicates from |
out |
Regular expression describing the transformation on the original filename to get the output filename. By default, a "_duprm" suffix is added before the bam extension |
path |
Path to the duplicate marker binaries |
verbose |
Redirect all the program output to the R console |
threads |
Number of threads to use |
maxmem |
Max memory assigned to the jvm |
This function is supposed to be called through the markDuplicates
wrapper
The return code of the system call
readcountExpBoxplot
Barplot of percentage of reads falling into
expression bins
readcountExpBoxplot(DupMat, stepSize = 0.05, ...)
readcountExpBoxplot(DupMat, stepSize = 0.05, ...)
DupMat |
The duplication matrix calculated by |
stepSize |
The number of bars to be shown |
... |
Other parameters sent to barplot() |
This function makes a barplot of percentage of reads falling into expression bins
nothing Other parameters sent to barplot()
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # barplot of percentage of reads falling into expression bins readcountExpBoxplot(DupMat=dm)
# dm is a duplication matrix calculated by analyzeDuprates: # R> dm <- analyzeDuprates(bamDuprm,gtf,stranded,paired,threads) attach(dupRadar_examples) # barplot of percentage of reads falling into expression bins readcountExpBoxplot(DupMat=dm)