Title: | Transcriptome Analysis of Differential Allelic Representation |
---|---|
Description: | This package provides functions to standardise the analysis of Differential Allelic Representation (DAR). DAR compromises the integrity of Differential Expression analysis results as it can bias expression, influencing the classification of genes (or transcripts) as being differentially expressed. DAR analysis results in an easy-to-interpret value between 0 and 1 for each genetic feature of interest, where 0 represents identical allelic representation and 1 represents complete diversity. This metric can be used to identify features prone to false-positive calls in Differential Expression analysis, and can be leveraged with statistical methods to alleviate the impact of such artefacts on RNA-seq data. |
Authors: | Lachlan Baer [aut, cre] , Stevie Pederson [aut] |
Maintainer: | Lachlan Baer <[email protected]> |
License: | GPL-3 |
Version: | 1.5.0 |
Built: | 2024-10-31 05:43:05 UTC |
Source: | https://github.com/bioc/tadar |
Assign DAR values to genomic features of interest by averaging the DAR values of ranges that overlap the feature range.
assignFeatureDar( dar, features, dar_val = c("origin", "region"), fill_missing = NA ) ## S4 method for signature 'GRangesList,GRanges' assignFeatureDar( dar, features, dar_val = c("origin", "region"), fill_missing = NA )
assignFeatureDar( dar, features, dar_val = c("origin", "region"), fill_missing = NA ) ## S4 method for signature 'GRangesList,GRanges' assignFeatureDar( dar, features, dar_val = c("origin", "region"), fill_missing = NA )
dar |
|
features |
|
dar_val |
|
fill_missing |
The DAR value to assign features with no overlaps.
Defaults to |
GRangesList
with ranges representing features of interest that
overlap at least one DAR range.
Feature metadata columns are retained and an additional column is added
for the assigned DAR value.
data("chr1_genes") fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar <- dar(props, contrasts, region_loci = 5) assignFeatureDar(dar, chr1_genes, dar_val = "origin") dar_regions <- flipRanges(dar, extend_edges = TRUE) assignFeatureDar(dar_regions, chr1_genes, dar_val = "region")
data("chr1_genes") fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar <- dar(props, contrasts, region_loci = 5) assignFeatureDar(dar, chr1_genes, dar_val = "origin") dar_regions <- flipRanges(dar, extend_edges = TRUE) assignFeatureDar(dar_regions, chr1_genes, dar_val = "region")
Gene features for example usage. Generation of this data is
documented in system.file("data-raw/chr1_genes.R", package = "tadar")
.
data(chr1_genes)
data(chr1_genes)
An object of class GRanges
of length 1456.
GRanges object with 1456 ranges and 2 metadata columns.
Ranges represent gene features for chromosome 1 of zebrafish GRCz11 genome.
p-values from differential expressiong testing for example usage.
data(chr1_tt)
data(chr1_tt)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 716 rows and 5 columns.
A 716 x 5 tibble object.
Summarise the alleles from genotype calls at each single nucleotide locus within each sample group.
countAlleles(genotypes, groups) ## S4 method for signature 'GRanges,list' countAlleles(genotypes, groups)
countAlleles(genotypes, groups) ## S4 method for signature 'GRanges,list' countAlleles(genotypes, groups)
genotypes |
|
groups |
Named |
GRangesList
containing a summary of allele counts at each range.
Each element of the list represents a distinct sample group.
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) countAlleles(genotypes, groups)
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) countAlleles(genotypes, groups)
Normalise allele-level counts across samples by converting to a proportion of total alleles in all samples.
countsToProps(counts) ## S4 method for signature 'GRangesList' countsToProps(counts)
countsToProps(counts) ## S4 method for signature 'GRangesList' countsToProps(counts)
counts |
|
GRangesList
containing a summary of normalised allele counts
(i.e. as proportions) at each range.
Each element of the list represents a distinct sample group.
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) countsToProps(counts_filt)
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) countsToProps(counts_filt)
Calculate DAR between two sample groups.
dar(props, contrasts, region_fixed = NULL, region_loci = NULL) ## S4 method for signature 'GRangesList,matrix' dar(props, contrasts, region_fixed = NULL, region_loci = NULL)
dar(props, contrasts, region_fixed = NULL, region_loci = NULL) ## S4 method for signature 'GRangesList,matrix' dar(props, contrasts, region_fixed = NULL, region_loci = NULL)
props |
|
contrasts |
Contrast |
region_fixed |
|
region_loci |
|
DAR is calculated as the Euclidean distance between the allelic proportions (i.e. proportion of As, Cs, Gs and Ts) of two sample groups at a single nucleotide locus, scaled such that all values range inclusively between 0 and 1. A DAR value of 0 represents identical allelic representation between the two sample groups, while a DAR value of 1 represents complete diversity.
GRangesList
containing DAR values at each overlapping range
between the contrasted sample groups.
Two types of DAR values are reported in the metadata columns of each GRanges
object:
dar_origin
: The raw DAR values calculated at single nucleotide positions
(the origin) between sample groups.
These values represent DAR estimates at a precise locus.
dar_region
: The mean of raw DAR values in a specified region surrounding
the origin.
This is optionally returned using either of the region_fixed
or
region_loci
arguments, which control the strategy and size for
establishing regions (more information below).
This option exists because eQTLs don't necessarily confer their effects on
genes in close proximity.
Therefore, DAR estimates that are representative of regions may be more
suitable for downstream assignment DAR values to genomic features.
Each element of the list represents a single contrast defined in the input contrast matrix.
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar(props, contrasts, region_loci = 5)
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar(props, contrasts, region_loci = 5)
Filter loci based on allele count criteria.
filterLoci(counts, filter = n_called > n_missing) ## S4 method for signature 'GRangesList' filterLoci(counts, filter = n_called > n_missing)
filterLoci(counts, filter = n_called > n_missing) ## S4 method for signature 'GRangesList' filterLoci(counts, filter = n_called > n_missing)
counts |
|
filter |
A logical expression indicating which rows to keep. Possible values include:
All values represent the sum of counts across all samples within the group. Defaults to return loci where the number of samples containing allele information is greater than number samples with missing information. |
GRangesList
containing a summary of allele counts at each range
passing the filter criteria.
Each element of the list represents a distinct sample group.
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) filterLoci(counts)
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) filterLoci(counts)
Convert the ranges element associated with origin DAR values to ranges associated with the region DAR values. This function can also be used to revert back to the original object containing origin ranges if desired.
flipRanges(dar, extend_edges = FALSE) ## S4 method for signature 'GRangesList' flipRanges(dar, extend_edges = FALSE)
flipRanges(dar, extend_edges = FALSE) ## S4 method for signature 'GRangesList' flipRanges(dar, extend_edges = FALSE)
dar |
|
extend_edges |
|
GRangesList
with ranges that represent either DAR regions or
DAR origins, depending on the ranges of the input object.
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) ## Establish regions using an elastic sliding window dar <- dar(props, contrasts, region_loci = 5) ## Convert ranges to regions associated with dar_region values dar_regions <- flipRanges(dar) ## Optionally extend the outer regions to completely cover chromosomes dar_regions <- flipRanges(dar, extend_edges = TRUE) ## Convert back to origin ranges associated with dar_origin values flipRanges(dar_regions) ## Establish regions using a fixed sliding window dar <- dar(props, contrasts, region_fixed = 1001) ## Convert ranges to regions associated with dar_region values dar_regions <- flipRanges(dar) ## Convert back to origin ranges associated with dar_origin values flipRanges(dar_regions)
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) ## Establish regions using an elastic sliding window dar <- dar(props, contrasts, region_loci = 5) ## Convert ranges to regions associated with dar_region values dar_regions <- flipRanges(dar) ## Optionally extend the outer regions to completely cover chromosomes dar_regions <- flipRanges(dar, extend_edges = TRUE) ## Convert back to origin ranges associated with dar_origin values flipRanges(dar_regions) ## Establish regions using a fixed sliding window dar <- dar(props, contrasts, region_fixed = 1001) ## Convert ranges to regions associated with dar_region values dar_regions <- flipRanges(dar) ## Convert back to origin ranges associated with dar_origin values flipRanges(dar_regions)
Moderate p-values from DE testing using assigned DAR values.
modP(pvals, dar, slope = -1.8) ## S4 method for signature 'numeric,numeric' modP(pvals, dar, slope = -1.8)
modP(pvals, dar, slope = -1.8) ## S4 method for signature 'numeric,numeric' modP(pvals, dar, slope = -1.8)
pvals |
|
dar |
|
slope |
|
numeric
of DAR-moderated p-values of same length as
input p-values.
data("chr1_genes") data("chr1_tt") fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar <- dar(props, contrasts, region_loci = 5) dar_regions <- flipRanges(dar, extend_edges = TRUE) gene_dar <- assignFeatureDar(dar_regions, chr1_genes, dar_val = "region") chr1_tt <- merge(chr1_tt, mcols(gene_dar$group1v2), sort = FALSE) chr1_tt$darP <- modP(chr1_tt$PValue, chr1_tt$dar)
data("chr1_genes") data("chr1_tt") fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar <- dar(props, contrasts, region_loci = 5) dar_regions <- flipRanges(dar, extend_edges = TRUE) gene_dar <- assignFeatureDar(dar_regions, chr1_genes, dar_val = "region") chr1_tt <- merge(chr1_tt, mcols(gene_dar$group1v2), sort = FALSE) chr1_tt$darP <- modP(chr1_tt$PValue, chr1_tt$dar)
Use Gviz
to plot the trend in DAR across a chromosomal region
with the option to add features of interest as separate tracks.
plotChrDar( dar, dar_val = c("origin", "region"), chr, foi, foi_anno, foi_highlight = TRUE, features, features_anno, features_highlight = TRUE, title = "" ) ## S4 method for signature 'GRanges' plotChrDar( dar, dar_val = c("origin", "region"), chr, foi, foi_anno, foi_highlight = TRUE, features, features_anno, features_highlight = TRUE, title = "" )
plotChrDar( dar, dar_val = c("origin", "region"), chr, foi, foi_anno, foi_highlight = TRUE, features, features_anno, features_highlight = TRUE, title = "" ) ## S4 method for signature 'GRanges' plotChrDar( dar, dar_val = c("origin", "region"), chr, foi, foi_anno, foi_highlight = TRUE, features, features_anno, features_highlight = TRUE, title = "" )
dar |
|
dar_val |
|
chr |
Optional.
|
foi |
Optional.
|
foi_anno |
Optional.
|
foi_highlight |
|
features |
Optional.
|
features_anno |
Optional.
|
features_highlight |
|
title |
|
A Gviz object
set.seed(230822) data("chr1_genes") foi <- sample(chr1_genes, 1) features <- sample(chr1_genes, 20) fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar <- dar(props, contrasts, region_loci = 5) plotChrDar( dar = dar$group1v2, dar_val = "region", chr = "1", foi = foi, foi_anno = "gene_name", foi_highlight = TRUE, features = features, features_anno = "gene_name", features_highlight = TRUE, title = "Example plot of DAR along Chromosome 1" )
set.seed(230822) data("chr1_genes") foi <- sample(chr1_genes, 1) features <- sample(chr1_genes, 20) fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") genotypes <- readGenotypes(fl) groups <- list( group1 = paste0("sample", 1:6), group2 = paste0("sample", 7:13) ) counts <- countAlleles(genotypes, groups) counts_filt <- filterLoci(counts) props <- countsToProps(counts_filt) contrasts <- matrix( data = c(1, -1), dimnames = list( Levels = c("group1", "group2"), Contrasts = c("group1v2") ) ) dar <- dar(props, contrasts, region_loci = 5) plotChrDar( dar = dar$group1v2, dar_val = "region", chr = "1", foi = foi, foi_anno = "gene_name", foi_highlight = TRUE, features = features, features_anno = "gene_name", features_highlight = TRUE, title = "Example plot of DAR along Chromosome 1" )
Plot the ECDF of DAR for each chromosome.
plotDarECDF(dar, dar_val = c("origin", "region"), highlight = NULL) ## S4 method for signature 'GRanges' plotDarECDF(dar, dar_val = c("origin", "region"), highlight = NULL)
plotDarECDF(dar, dar_val = c("origin", "region"), highlight = NULL) ## S4 method for signature 'GRanges' plotDarECDF(dar, dar_val = c("origin", "region"), highlight = NULL)
dar |
|
dar_val |
|
highlight |
|
A ggplot2 object.
set.seed(230704) ## Use simulated data that illustrates a commonly encountered scenario simulate_dar <- function(n, mean) { vapply( rnorm(n = n, mean = mean), function(x) exp(x) / (1 + exp(x)), numeric(1) ) } gr <- GRanges( paste0(rep(seq(1,25), each = 100), ":", seq(1,100)), dar_origin = c(simulate_dar(2400, -2), simulate_dar(100, 0.5)) ) ## No highlighting, all chromosomes will be given individual colours plotDarECDF(gr, dar_val = "origin") + theme_bw() ## With highlighting plotDarECDF(gr, dar_val = "origin", highlight = "25") + scale_colour_manual(values = c("TRUE" = "red", "FALSE" = "grey")) + theme_bw()
set.seed(230704) ## Use simulated data that illustrates a commonly encountered scenario simulate_dar <- function(n, mean) { vapply( rnorm(n = n, mean = mean), function(x) exp(x) / (1 + exp(x)), numeric(1) ) } gr <- GRanges( paste0(rep(seq(1,25), each = 100), ":", seq(1,100)), dar_origin = c(simulate_dar(2400, -2), simulate_dar(100, 0.5)) ) ## No highlighting, all chromosomes will be given individual colours plotDarECDF(gr, dar_val = "origin") + theme_bw() ## With highlighting plotDarECDF(gr, dar_val = "origin", highlight = "25") + scale_colour_manual(values = c("TRUE" = "red", "FALSE" = "grey")) + theme_bw()
Extract genotypes from a VCF file into a GRanges object for downstream DAR analysis.
readGenotypes(file, unphase = TRUE, ...) ## S4 method for signature 'character' readGenotypes(file, unphase = TRUE, ...) ## S4 method for signature 'TabixFile' readGenotypes(file, unphase = TRUE, ...)
readGenotypes(file, unphase = TRUE, ...) ## S4 method for signature 'character' readGenotypes(file, unphase = TRUE, ...) ## S4 method for signature 'TabixFile' readGenotypes(file, unphase = TRUE, ...)
file |
The file path of a VCF file containing genotype data.
Alternatively, a |
unphase |
A |
... |
Passed to readVcf. |
Extract genotypes from a VCF file with the option to remove phasing information for DAR analysis.
A GRanges
object constructed from the CHROM, POS, ID and REF
fields of the supplied VCF
file. Genotype data for each sample present in
the VCF
file is added to the metadata columns.
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") readGenotypes(fl)
fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") readGenotypes(fl)
Remove phasing information from genotype calls.
unphaseGT(gt) ## S4 method for signature 'matrix' unphaseGT(gt) ## S4 method for signature 'data.frame' unphaseGT(gt)
unphaseGT(gt) ## S4 method for signature 'matrix' unphaseGT(gt) ## S4 method for signature 'data.frame' unphaseGT(gt)
gt |
|
Phasing information is not required for a simple DAR analysis. Removing this enables easy counting of alleles from genotype calls.
matrix
containing unphased genotype calls.
library(VariantAnnotation) fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") vcf <- readVcf(fl) gt <- geno(vcf)$GT unphaseGT(gt)
library(VariantAnnotation) fl <- system.file("extdata", "chr1.vcf.bgz", package="tadar") vcf <- readVcf(fl) gt <- geno(vcf)$GT unphaseGT(gt)