Title: | Investigates Allele Specific Expression |
---|---|
Description: | Provides a framework for allelic specific expression investigation using RNA-seq data. |
Authors: | Jesper R Gadin, Lasse Folkersen |
Maintainer: | Jesper R Gadin <[email protected]> |
License: | GPL-3 |
Version: | 1.45.0 |
Built: | 2024-11-29 03:43:16 UTC |
Source: | https://github.com/bioc/AllelicImbalance |
Package AllelicImbalance
has functions for importing, filtering and
plotting high-throughput data to make an allele specific expression
analysis. A major aim of this package is to provide functions to collect as
much information as possible from regions of choice, and to be able to
explore the allelic expression of that region in detail.
Package: | AllelicImbalance |
Type: | Package |
Version: | 1.2.0 |
Date: | 2014-08-24 |
License: | GPL-3 |
Start out creating a GRange
object defining the region of interest. This can also be done using
getAreaFromGeneNames
providing gene names as arguments. Then use
BamImpGAList
to import reads from that reagion and find potential
SNPs using scanForHeterozygotes
. Then retrieve the allele counts of
heterozygote sites by the function getAlleleCount
. With this data
create an ASEset
. At this point all pre-requisites for a 'basic'
allele specific expression analysis is available. Two ways to go on could be
to apply chisq.test
or barplot
on this ASEset
object.
Author: Jesper Robert Gadin Author: Lasse Folkersen
Maintainer: Jesper Robert Gadin <[email protected]>
Reference to published application note (work in progress)
code?ASEset
These functions acts as wrappers to retrieve information from annotation
database objects (annotationDb objects
) or (transcriptDb
objects
)
getGenesFromAnnotation( OrgDb, GR, leftFlank = 0, rightFlank = 0, getUCSC = FALSE, verbose = FALSE ) getGenesVector(OrgDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getExonsFromAnnotation( TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE ) getExonsVector(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getTranscriptsFromAnnotation( TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE ) getTranscriptsVector(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getCDSFromAnnotation(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getCDSVector(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getAnnotationDataFrame( GR, strand = "+", annotationType = NULL, OrgDb = NULL, TxDb = NULL, verbose = FALSE )
getGenesFromAnnotation( OrgDb, GR, leftFlank = 0, rightFlank = 0, getUCSC = FALSE, verbose = FALSE ) getGenesVector(OrgDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getExonsFromAnnotation( TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE ) getExonsVector(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getTranscriptsFromAnnotation( TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE ) getTranscriptsVector(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getCDSFromAnnotation(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getCDSVector(TxDb, GR, leftFlank = 0, rightFlank = 0, verbose = FALSE) getAnnotationDataFrame( GR, strand = "+", annotationType = NULL, OrgDb = NULL, TxDb = NULL, verbose = FALSE )
OrgDb |
An |
GR |
A |
leftFlank |
An |
rightFlank |
An |
getUCSC |
A |
verbose |
A |
TxDb |
A |
strand |
Two options,'+' or '-' |
annotationType |
select one or more from 'gene', 'exon', 'transcript', 'cds'. |
These functions retrieve regional annotation from OrgDb or TxDb objects, when given GRanges objects.
GRanges object
with ranges over the genes in the region.
The getGenesVector
function will return a character vector where each
element are gene symbols separated by comma
GRanges object
with ranges over the exons in the region.
The getTranscriptsFromAnnotation
function will return a GRanges
object
with ranges over the transcripts in the region.
The getCDSFromAnnotation
function will return a GRanges object
with ranges over the CDSFs in the region.
The getExonsVector
function will return a character vector where each
element are exons separated by comma
The getTranscriptsVector
function will return a character vector
where each element are transcripts separated by comma
The getCDSVector
function will return a character vector where each
element are CDSs separated by comma
The getAnnotationDataFrame
function will return a data.frame with
annotations. This function is used internally by i.e. the barplot-function
Jesper R. Gadin, Lasse Folkersen
data(ASEset) require(org.Hs.eg.db) require(TxDb.Hsapiens.UCSC.hg19.knownGene) OrgDb <- org.Hs.eg.db TxDb <- TxDb.Hsapiens.UCSC.hg19.knownGene #use for example BcfFiles as the source for SNPs of interest GR <- rowRanges(ASEset) #get annotation g <- getGenesFromAnnotation(OrgDb,GR) e <- getExonsFromAnnotation(TxDb,GR) t <- getTranscriptsFromAnnotation(TxDb,GR) c <- getCDSFromAnnotation(TxDb,GR)
data(ASEset) require(org.Hs.eg.db) require(TxDb.Hsapiens.UCSC.hg19.knownGene) OrgDb <- org.Hs.eg.db TxDb <- TxDb.Hsapiens.UCSC.hg19.knownGene #use for example BcfFiles as the source for SNPs of interest GR <- rowRanges(ASEset) #get annotation g <- getGenesFromAnnotation(OrgDb,GR) e <- getExonsFromAnnotation(TxDb,GR) t <- getTranscriptsFromAnnotation(TxDb,GR) c <- getCDSFromAnnotation(TxDb,GR)
adds a customizable annotation functionality for AllelicImbalance barplots.
annotationBarplot( strand, snp, lowerLeftCorner, annDfPlus, annDfMinus, cex = 0.7, ypos = 0, interspace = 1 )
annotationBarplot( strand, snp, lowerLeftCorner, annDfPlus, annDfMinus, cex = 0.7, ypos = 0, interspace = 1 )
strand |
strand, "+", "-", "*" or "both" |
snp |
integer for the described snp |
lowerLeftCorner |
position of the plot to add legend to (default c(0,0)) |
annDfPlus |
annotation dataframe plus strand |
annDfMinus |
annotation dataframe minus strand |
cex |
size of annotation text |
ypos |
relative y-axis position for the annotation text |
interspace |
space between each annotation block |
the function is preferably called from within the AllelicImbalance barplot method.
Jesper R. Gadin
#code placeholders #< create a barplot without annotation > #< add annotation >
#code placeholders #< create a barplot without annotation > #< add annotation >
Generates barplots for ASEset objects. Two levels of plotting detail are provided: a detailed barplot of read counts by allele useful for fewer samples and SNPs, and a less detailed barplot of the fraction of imbalance, useful for more samples and SNPs.
barplot(height, ...) ## S4 method for signature 'ASEset' barplot( height, type = "count", sampleColour.top = NULL, sampleColour.bot = NULL, legend = TRUE, pValue = TRUE, strand = "*", testValue = NULL, testValue2 = NULL, OrgDb = NULL, TxDb = NULL, annotationType = c("gene", "exon", "transcript"), main = NULL, ylim = NULL, yaxis = TRUE, xaxis = FALSE, ylab = TRUE, ylab.text = NULL, xlab.text = "samples", xlab = TRUE, legend.colnames = "", las.ylab = 1, las.xlab = 2, cex.main = 1, cex.pValue = 0.7, cex.ylab = 0.7, cex.xlab = 0.7, cex.legend = 0.6, add = FALSE, lowerLeftCorner = c(0, 0), size = c(1, 1), addHorizontalLine = 0.5, add.frame = TRUE, filter.pValue.fraction = 0.99, legend.fill.size = 1, legend.interspace = 1, verbose = FALSE, top.fraction.criteria = "maxcount", cex.annotation = 0.7, ypos.annotation = 0, annotation.interspace = 1, ... )
barplot(height, ...) ## S4 method for signature 'ASEset' barplot( height, type = "count", sampleColour.top = NULL, sampleColour.bot = NULL, legend = TRUE, pValue = TRUE, strand = "*", testValue = NULL, testValue2 = NULL, OrgDb = NULL, TxDb = NULL, annotationType = c("gene", "exon", "transcript"), main = NULL, ylim = NULL, yaxis = TRUE, xaxis = FALSE, ylab = TRUE, ylab.text = NULL, xlab.text = "samples", xlab = TRUE, legend.colnames = "", las.ylab = 1, las.xlab = 2, cex.main = 1, cex.pValue = 0.7, cex.ylab = 0.7, cex.xlab = 0.7, cex.legend = 0.6, add = FALSE, lowerLeftCorner = c(0, 0), size = c(1, 1), addHorizontalLine = 0.5, add.frame = TRUE, filter.pValue.fraction = 0.99, legend.fill.size = 1, legend.interspace = 1, verbose = FALSE, top.fraction.criteria = "maxcount", cex.annotation = 0.7, ypos.annotation = 0, annotation.interspace = 1, ... )
height |
An |
... |
for simpler generics when extending function |
type |
'count' or 'fraction' |
sampleColour.top |
User specified colours for top fraction |
sampleColour.bot |
User specified colours for bottom fraction |
legend |
Display legend |
pValue |
Display p-value |
strand |
four options, '+', '-', 'both' or '*' |
testValue |
if set, a matrix or vector with user p-values |
testValue2 |
if set, a matrix or vector with user p-values |
OrgDb |
an OrgDb object which provides annotation |
TxDb |
a TxDb object which provides annotation |
annotationType |
select one or more from 'gene','exon','transcript','cds'. |
main |
text to use as main label |
ylim |
set plot y-axis limit |
yaxis |
wheter the y-axis is to be displayed or not |
xaxis |
wheter the x-axis is to be displayed or not |
ylab |
showing labels for the tic marks |
ylab.text |
ylab text |
xlab.text |
xlab text |
xlab |
showing labels for the tic marks |
legend.colnames |
gives colnames to the legend matrix |
las.ylab |
orientation of ylab text |
las.xlab |
orientation of xlab text |
cex.main |
set main label size (max 2) |
cex.pValue |
set pValue label size |
cex.ylab |
set ylab label size |
cex.xlab |
set xlab label size |
cex.legend |
set legend label size |
add |
|
lowerLeftCorner |
integer that is only useful when |
size |
Used internally by locationplot. Rescales each small barplot window |
addHorizontalLine |
adds a horizontal line that marks the default fraction of 0.5 - 0.5 |
add.frame |
|
filter.pValue.fraction |
|
legend.fill.size |
size of the fill/boxes in the legend (default:NULL) |
legend.interspace |
set legend space between fills and text |
verbose |
Makes function more talkative |
top.fraction.criteria |
'maxcount', 'ref' or 'phase' |
cex.annotation |
size of annotation text |
ypos.annotation |
relative ypos for annotation text |
annotation.interspace |
space between annotation text |
filter.pValue.fraction
is intended to remove p-value annotation with
very large difference in frequency, which could just be a sequencing
mistake. This is to avoid p-values like 1e-235 or similar.
sampleColour
User specified colours, either given as named colours
('red', 'blue', etc) or as hexadecimal code. Can be either length 1 for all
samples, or else of a length corresponding to the number of samples for
individual colouring.
Jesper R. Gadin, Lasse Folkersen
The ASEset
class which the barplot
function can be called up on.
data(ASEset) barplot(ASEset[1])
data(ASEset) barplot(ASEset[1])
Object that holds allele counts, genomic positions and map-bias for a set of SNPs
alleleCounts(x, strand = "*", return.class = "list") ## S4 method for signature 'ASEset' alleleCounts(x, strand = "*", return.class = "list") alleleCounts(x, ...) <- value ## S4 replacement method for signature 'ASEset' alleleCounts(x, strand = "*", return.class = "array", ...) <- value mapBias(x, ...) ## S4 method for signature 'ASEset' mapBias(x, return.class = "list") fraction(x, ...) ## S4 method for signature 'ASEset' fraction( x, strand = "*", top.fraction.criteria = "maxcount", verbose = FALSE, ... ) arank(x, return.type = "names", return.class = "list", strand = "*", ...) frequency(x, ...) genotype(x, ...) ## S4 method for signature 'ASEset' genotype(x, return.class = "matrix") genotype(x) <- value ## S4 replacement method for signature 'ASEset' genotype(x) <- value countsPerSnp(x, ...) ## S4 method for signature 'ASEset' countsPerSnp(x, return.class = "matrix", return.type = "mean", strand = "*") countsPerSample(x, ...) ## S4 method for signature 'ASEset' countsPerSample(x, return.class = "matrix", return.type = "mean", strand = "*") phase(x, ...) ## S4 method for signature 'ASEset' phase(x, return.class = "matrix") phase(x) <- value ## S4 replacement method for signature 'ASEset' phase(x) <- value mapBias(x) <- value ## S4 replacement method for signature 'ASEset' mapBias(x) <- value refExist(x) ## S4 method for signature 'ASEset' refExist(x) ref(x) ## S4 method for signature 'ASEset' ref(x) ref(x) <- value ## S4 replacement method for signature 'ASEset,ANY' ref(x) <- value altExist(x) ## S4 method for signature 'ASEset' altExist(x) alt(x) ## S4 method for signature 'ASEset' alt(x) alt(x) <- value ## S4 replacement method for signature 'ASEset,ANY' alt(x) <- value aquals(x, ...) ## S4 method for signature 'ASEset' aquals(x) aquals(x) <- value ## S4 replacement method for signature 'ASEset' aquals(x) <- value maternalAllele(x, ...) ## S4 method for signature 'ASEset' maternalAllele(x) paternalAllele(x, ...) ## S4 method for signature 'ASEset' paternalAllele(x)
alleleCounts(x, strand = "*", return.class = "list") ## S4 method for signature 'ASEset' alleleCounts(x, strand = "*", return.class = "list") alleleCounts(x, ...) <- value ## S4 replacement method for signature 'ASEset' alleleCounts(x, strand = "*", return.class = "array", ...) <- value mapBias(x, ...) ## S4 method for signature 'ASEset' mapBias(x, return.class = "list") fraction(x, ...) ## S4 method for signature 'ASEset' fraction( x, strand = "*", top.fraction.criteria = "maxcount", verbose = FALSE, ... ) arank(x, return.type = "names", return.class = "list", strand = "*", ...) frequency(x, ...) genotype(x, ...) ## S4 method for signature 'ASEset' genotype(x, return.class = "matrix") genotype(x) <- value ## S4 replacement method for signature 'ASEset' genotype(x) <- value countsPerSnp(x, ...) ## S4 method for signature 'ASEset' countsPerSnp(x, return.class = "matrix", return.type = "mean", strand = "*") countsPerSample(x, ...) ## S4 method for signature 'ASEset' countsPerSample(x, return.class = "matrix", return.type = "mean", strand = "*") phase(x, ...) ## S4 method for signature 'ASEset' phase(x, return.class = "matrix") phase(x) <- value ## S4 replacement method for signature 'ASEset' phase(x) <- value mapBias(x) <- value ## S4 replacement method for signature 'ASEset' mapBias(x) <- value refExist(x) ## S4 method for signature 'ASEset' refExist(x) ref(x) ## S4 method for signature 'ASEset' ref(x) ref(x) <- value ## S4 replacement method for signature 'ASEset,ANY' ref(x) <- value altExist(x) ## S4 method for signature 'ASEset' altExist(x) alt(x) ## S4 method for signature 'ASEset' alt(x) alt(x) <- value ## S4 replacement method for signature 'ASEset,ANY' alt(x) <- value aquals(x, ...) ## S4 method for signature 'ASEset' aquals(x) aquals(x) <- value ## S4 replacement method for signature 'ASEset' aquals(x) <- value maternalAllele(x, ...) ## S4 method for signature 'ASEset' maternalAllele(x) paternalAllele(x, ...) ## S4 method for signature 'ASEset' paternalAllele(x)
x |
ASEset object |
strand |
which strand of '+', '-' or '*' |
return.class |
return 'list' or 'array' |
... |
additional arguments |
value |
replacement variable |
top.fraction.criteria |
'maxcount', 'ref' or 'phase' |
verbose |
makes function more talkative |
return.type |
return 'names', rank or 'counts' |
An ASEset object differs from a regular RangedSummarizedExperiment object in that the assays contains an array instead of matrix. This array has ranges on the rows, sampleNames on the columns and variants in the third dimension.
It is possible to use the commands barplot and locationplot on an ASEset
object see more details in barplot
and
locationplot
.
Three different alleleCount options are available. The simples one is the * option, and is for experiments where the strand information is not known e.g. non-stranded data. The unknown strand could also be for strand specific data when the aligner could not find any strand associated with the read, but this should normally not happen, and if it does probably having an extremely low mapping quality. Then there are an option too add plus and minus stranded data. When using this, it is essential to make sure that the RNA-seq experiment under analysis has in fact been created so that correct strand information was obtained. The most functions will by default have their strand argument set to '*'.
The phase information is stored by the convention of 'maternal chromosome|paternal chromosome', with 0 as reference allele and 1 as alternative allele. '|' when the phase is known and '/' when the phase is unknown. Internally the information will be stored as an three dimensional array, dim 1 for SNPs, dim 2 for Samples and dim 3 which is fixed and stores maternal chromosome, paternal chromosome and phased (1 equals TRUE).
An object of class ASEset containing location information and allele counts for a number of SNPs measured in a number of samples on various strand, as well as mapBias information. All data is stored in a manner similar to the SummarizedExperiment class.
table(...)
Arguments:
An ASEset object
that contains the
variants of interest
The generics for table does not easily allow more than one argument
so in respect to the different strand options, table
will
return a SimpleList with length 3, one element for each strand.
frequency(x, return.class = "list", strand = "*", threshold.count.sample = 15)
Arguments:
An ASEset object
that contains the
variants of interest
threshold.count.samples
if sample has fewer counts the function return NA.
ASEsetFromCountList(rowRanges, countListNonStranded = NULL, countListPlus = NULL, countListMinus = NULL, countListUnknown = NULL, colData = NULL, mapBiasExpMean = array(), verbose=FALSE, ...)
Arguments:
A GenomicRanges object
that contains the
variants of interest
A list
where each
entry is a matrix with allele counts as columns and sample counts as rows
A list
where each entry is a matrix with allele
counts as columns and sample counts as rows
A
list
where each entry is a matrix with allele counts as columns and
sample counts as rows
A list
where each
entry is a matrix with allele counts as columns and sample counts as rows
A DataFrame
object containing sample specific data
A 3D array
describing mapping bias. The SNPs
are in the 1st dimension, samples in the 2nd dimension and variants in the
3rd dimension.
Makes function more talkative
arguments passed on to SummarizedExperiment constructor
Jesper R. Gadin, Lasse Folkersen
#make example countList set.seed(42) countListPlus <- list() snps <- c('snp1','snp2','snp3','snp4','snp5') for(snp in snps){ count<-matrix(rep(0,16),ncol=4,dimnames=list( c('sample1','sample2','sample3','sample4'), c('A','T','G','C'))) #insert random counts in two of the alleles for(allele in sample(c('A','T','G','C'),2)){ count[,allele]<-as.integer(rnorm(4,mean=50,sd=10)) } countListPlus[[snp]] <- count } #make example rowRanges rowRanges <- GRanges( seqnames = Rle(c('chr1', 'chr2', 'chr1', 'chr3', 'chr1')), ranges = IRanges(1:5, width = 1, names = head(letters,5)), snp = paste('snp',1:5,sep='') ) #make example colData colData <- DataFrame(Treatment=c('ChIP', 'Input','Input','ChIP'), row.names=c('ind1','ind2','ind3','ind4')) #make ASEset a <- ASEsetFromCountList(rowRanges, countListPlus=countListPlus, colData=colData) #example phase matrix (simple form) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #generate ASEset from array snps <- 999 samples <-5 ar <-array(rep(unlist(lapply(1:snps, function(x){(sample(c(TRUE,FALSE,TRUE,FALSE), size = 4))})), samples), dim=c(4,snps,samples)) ar2 <- array(sample(50:300, 4*snps*samples,replace=TRUE), dim=c(4,snps,samples)) ar2[ar] <- 0 ar2 <- aperm(ar2, c(2, 3, 1)) dimnames(ar2) <- list(paste("snp",1:snps,sep=""),paste("sample",1:samples,sep=""), c("A","C","G","T")) gr <- GRanges(seqnames=c("chr2"), ranges=IRanges(start=1:dim(ar2)[1], width=1), strand="*") a <- ASEsetFromArrays(gr, countsUnknown=ar2)
#make example countList set.seed(42) countListPlus <- list() snps <- c('snp1','snp2','snp3','snp4','snp5') for(snp in snps){ count<-matrix(rep(0,16),ncol=4,dimnames=list( c('sample1','sample2','sample3','sample4'), c('A','T','G','C'))) #insert random counts in two of the alleles for(allele in sample(c('A','T','G','C'),2)){ count[,allele]<-as.integer(rnorm(4,mean=50,sd=10)) } countListPlus[[snp]] <- count } #make example rowRanges rowRanges <- GRanges( seqnames = Rle(c('chr1', 'chr2', 'chr1', 'chr3', 'chr1')), ranges = IRanges(1:5, width = 1, names = head(letters,5)), snp = paste('snp',1:5,sep='') ) #make example colData colData <- DataFrame(Treatment=c('ChIP', 'Input','Input','ChIP'), row.names=c('ind1','ind2','ind3','ind4')) #make ASEset a <- ASEsetFromCountList(rowRanges, countListPlus=countListPlus, colData=colData) #example phase matrix (simple form) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #generate ASEset from array snps <- 999 samples <-5 ar <-array(rep(unlist(lapply(1:snps, function(x){(sample(c(TRUE,FALSE,TRUE,FALSE), size = 4))})), samples), dim=c(4,snps,samples)) ar2 <- array(sample(50:300, 4*snps*samples,replace=TRUE), dim=c(4,snps,samples)) ar2[ar] <- 0 ar2 <- aperm(ar2, c(2, 3, 1)) dimnames(ar2) <- list(paste("snp",1:snps,sep=""),paste("sample",1:samples,sep=""), c("A","C","G","T")) gr <- GRanges(seqnames=c("chr2"), ranges=IRanges(start=1:dim(ar2)[1], width=1), strand="*") a <- ASEsetFromArrays(gr, countsUnknown=ar2)
useful genotype filters
hetFilt(x, ...) ## S4 method for signature 'ASEset' hetFilt(x, source = "genotype", ...)
hetFilt(x, ...) ## S4 method for signature 'ASEset' hetFilt(x, source = "genotype", ...)
x |
ASEset object |
... |
internal param |
source |
'genotype' or 'alleleCounts' |
hetFilt returns TRUE if the samples is heterozygote, based on stored genotype information present in the phase data.
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) a <- ASEset genotype(a) <- inferGenotypes(a) hets <- hetFilt(a)
#load example data data(ASEset) a <- ASEset genotype(a) <- inferGenotypes(a) hets <- hetFilt(a)
Generates gbarplots for ASEset objects. Two levels of plotting detail are provided: a detailed gbarplot of read counts by allele useful for fewer samples and SNPs, and a less detailed gbarplot of the fraction of imbalance, useful for more samples and SNPs.
gbarplot(x, type = "count", strand = "*", verbose = FALSE, ...)
gbarplot(x, type = "count", strand = "*", verbose = FALSE, ...)
x |
An |
type |
'count' or 'fraction' |
strand |
four options, '+', '-', 'both' or '*' |
verbose |
Makes function more talkative |
... |
for simpler generics when extending function |
This function serves the same purpose as the normal barplot, but with trellis graphics using lattice, to be able to integrate well with Gviz track functionality.
Jesper R. Gadin
data(ASEset) gbarplot(ASEset[1])
data(ASEset) gbarplot(ASEset[1])
plotting ASE effects over a specific genomic region using Gviz functionality
glocationplot( x, type = "fraction", strand = "*", BamGAL = NULL, GenomeAxisTrack = FALSE, trackNameDeAn = paste("deTrack", type), TxDb = NULL, sizes = NULL, add = FALSE, verbose = FALSE, ... )
glocationplot( x, type = "fraction", strand = "*", BamGAL = NULL, GenomeAxisTrack = FALSE, trackNameDeAn = paste("deTrack", type), TxDb = NULL, sizes = NULL, add = FALSE, verbose = FALSE, ... )
x |
an ASEset object. |
type |
'fraction' or 'count' |
strand |
'+','-','*' or 'both'. This argument determines which strand is
plotted. See |
BamGAL |
GAlignmentsList covering the same genomic region as the ASEset |
GenomeAxisTrack |
include an genomic axis track |
trackNameDeAn |
trackname for deAnnotation track |
TxDb |
a TxDb object which provides annotation |
sizes |
vector with the sum 1. Describes the size of the tracks |
add |
add to existing plot |
verbose |
if set to TRUE it makes function more talkative |
... |
arguments passed on to barplot function |
The glocationplot methods visualises the distribution of ASE over a larger
region on one chromosome. It takes and ASEset object as well as additional
information on plot type (see gbarplot
), strand type (see
getAlleleCounts
), Annotation tracks are created from the Gviz
packageh. It is obviously important to make sure that the genome build used
is set correctly, e.g. 'hg19'.
sizes has to be of the same length as the number of tracks used.
Jesper R. Gadin
The ASEset
class which the
glocationplot function can be called up on.
data(ASEset) genome(ASEset) <- 'hg19' glocationplot(ASEset,strand='+') #for ASEsets with fewer SNPs the 'count' type plot is useful glocationplot(ASEset,type='count',strand='+')
data(ASEset) genome(ASEset) <- 'hg19' glocationplot(ASEset,strand='+') #for ASEsets with fewer SNPs the 'count' type plot is useful glocationplot(ASEset,type='count',strand='+')
plotting ASE effects over a specific genomic region
ASEDAnnotationTrack( x, GR = rowRanges(x), type = "fraction", strand = "*", trackName = paste("deTrack", type), verbose = TRUE, ... ) ## S4 method for signature 'ASEset' ASEDAnnotationTrack( x, GR = rowRanges(x), type = "fraction", strand = "*", trackName = paste("deTrack", type), verbose = TRUE, ... ) CoverageDataTrack( x, GR = rowRanges(x), BamList = NULL, strand = NULL, start = NULL, end = NULL, trackNameVec = NULL, meanCoverage = FALSE, verbose = TRUE, ... )
ASEDAnnotationTrack( x, GR = rowRanges(x), type = "fraction", strand = "*", trackName = paste("deTrack", type), verbose = TRUE, ... ) ## S4 method for signature 'ASEset' ASEDAnnotationTrack( x, GR = rowRanges(x), type = "fraction", strand = "*", trackName = paste("deTrack", type), verbose = TRUE, ... ) CoverageDataTrack( x, GR = rowRanges(x), BamList = NULL, strand = NULL, start = NULL, end = NULL, trackNameVec = NULL, meanCoverage = FALSE, verbose = TRUE, ... )
x |
an ASEset object. |
GR |
genomic range of plotting |
type |
'fraction' or 'count' |
strand |
'+','-'. This argument determines which strand is plotted. |
trackName |
name of track (ASEDAnnotationTrack) |
verbose |
Setting |
... |
arguments passed on to barplot function |
BamList |
GAlignmnentsList object of reads from the same genomic region as the ASEset |
start |
start position of reads to be plotted |
end |
end position of reads to be plotted |
trackNameVec |
names of tracks (CoverageDataTrack) |
meanCoverage |
mean of coverage over samples (CoverageGataTrack) |
For information of how to use these tracks in more ways, visit the Gviz package manual.
Jesper R. Gadin
The ASEset
class which the functions
can be called up on.
data(ASEset) x <- ASEset[,1:2] r <- reads[1:2] genome(x) <- 'hg19' seqlevels(r) <- seqlevels(x) GR <- GRanges(seqnames=seqlevels(x), ranges=IRanges(start=min(start(x)),end=max(end(x))), strand='+', genome=genome(x)) deTrack <- ASEDAnnotationTrack(x, GR=GR, type='fraction',strand='+') covTracks <- CoverageDataTrack(x,BamList=r,strand='+') lst <- c(deTrack,covTracks) sizes <- c(0.5,rep(0.5/length(covTracks),length(covTracks))) #temporarily do not run this function #plotTracks(lst, from=min(start(x)), to=max(end(x)), #sizes=sizes, col.line = NULL, showId = FALSE, main='mainText', #cex.main=1, title.width=1, type='histogram')
data(ASEset) x <- ASEset[,1:2] r <- reads[1:2] genome(x) <- 'hg19' seqlevels(r) <- seqlevels(x) GR <- GRanges(seqnames=seqlevels(x), ranges=IRanges(start=min(start(x)),end=max(end(x))), strand='+', genome=genome(x)) deTrack <- ASEDAnnotationTrack(x, GR=GR, type='fraction',strand='+') covTracks <- CoverageDataTrack(x,BamList=r,strand='+') lst <- c(deTrack,covTracks) sizes <- c(0.5,rep(0.5/length(covTracks),length(covTracks))) #temporarily do not run this function #plotTracks(lst, from=min(start(x)), to=max(end(x)), #sizes=sizes, col.line = NULL, showId = FALSE, main='mainText', #cex.main=1, title.width=1, type='histogram')
plotting ASE effects over a specific genomic region
locationplot(x, ...) ## S4 method for signature 'ASEset' locationplot( x, type = "fraction", strand = "*", yaxis = TRUE, xaxis = FALSE, xlab = FALSE, ylab = TRUE, xlab.text = "", ylab.text = "", legend.colnames = "", size = c(0.8, 1), main = NULL, pValue = FALSE, cex.main = 0.7, cex.ylab = 0.6, cex.legend = 0.5, OrgDb = NULL, TxDb = NULL, verbose = TRUE, top.fraction.criteria = "maxcount", allow.whole.chromosome = FALSE, ... )
locationplot(x, ...) ## S4 method for signature 'ASEset' locationplot( x, type = "fraction", strand = "*", yaxis = TRUE, xaxis = FALSE, xlab = FALSE, ylab = TRUE, xlab.text = "", ylab.text = "", legend.colnames = "", size = c(0.8, 1), main = NULL, pValue = FALSE, cex.main = 0.7, cex.ylab = 0.6, cex.legend = 0.5, OrgDb = NULL, TxDb = NULL, verbose = TRUE, top.fraction.criteria = "maxcount", allow.whole.chromosome = FALSE, ... )
x |
an ASEset object. |
... |
arguments passed on to barplot function |
type |
'fraction' or 'count' |
strand |
'+','-','*' or 'both'. This argument determines
which strand is plotted. See |
yaxis |
wheter the y-axis is to be displayed or not |
xaxis |
wheter the x-axis is to be displayed or not |
xlab |
showing labels for the tic marks |
ylab |
showing labels for the tic marks |
xlab.text |
xlab text |
ylab.text |
ylab text |
legend.colnames |
gives colnames to the legend matrix |
size |
will give extra space in the margins of the inner plots |
main |
text to use as main label |
pValue |
Display p-value |
cex.main |
set main label size |
cex.ylab |
set ylab label size |
cex.legend |
set legend label size |
OrgDb |
an OrgDb object from which to plot a gene map. If given together with argument TxDb this will only be used to extract genesymbols. |
TxDb |
a TxDb object from which to plot an exon map. |
verbose |
Setting |
top.fraction.criteria |
'maxcount', 'ref' or 'phase' |
allow.whole.chromosome |
logical, overrides 200kb region limit, defaults to FALSE |
The locationplot methods visualises how fractions are distributed over a
larger region of genes on one chromosome. It takes and ASEset object as well
as additional information on plot type (see barplot
), strand
type (see getAlleleCounts
), colouring, as well as annotation.
The annotation is taken either from the bioconductor OrgDb-sets, the TxDb
sets or both. It is obviously important to make sure that the genome build
used is the same as used in aligning the RNA-seq data.
Jesper R. Gadin, Lasse Folkersen
The ASEset
class which the
locationplot function can be called up on.
data(ASEset) locationplot(ASEset) #SNPs are plotted in the order in which they are found. #This can be sorted according to location as follows: locationplot(ASEset[order(start(rowRanges(ASEset))),]) #for ASEsets with fewer SNPs the 'count' type plot is # useful for detailed visualization. locationplot(ASEset,type='count',strand='*')
data(ASEset) locationplot(ASEset) #SNPs are plotted in the order in which they are found. #This can be sorted according to location as follows: locationplot(ASEset[order(start(rowRanges(ASEset))),]) #for ASEsets with fewer SNPs the 'count' type plot is # useful for detailed visualization. locationplot(ASEset,type='count',strand='*')
Identifies the positions of SNPs found in BamGR reads.
scanForHeterozygotes(BamList, ...) ## S4 method for signature 'GAlignmentsList' scanForHeterozygotes( BamList, minimumReadsAtPos = 20, maximumMajorAlleleFrequency = 0.9, minimumMinorAlleleFrequency = 0.1, minimumBiAllelicFrequency = 0.9, verbose = TRUE, ... )
scanForHeterozygotes(BamList, ...) ## S4 method for signature 'GAlignmentsList' scanForHeterozygotes( BamList, minimumReadsAtPos = 20, maximumMajorAlleleFrequency = 0.9, minimumMinorAlleleFrequency = 0.1, minimumBiAllelicFrequency = 0.9, verbose = TRUE, ... )
BamList |
A |
... |
argument to pass on |
minimumReadsAtPos |
minimum number of reads required to call a SNP at a given position |
maximumMajorAlleleFrequency |
maximum frequency allowed for the most common allele. Setting this parameter lower will minimise the SNP calls resulting from technical read errors, at the cost of missing loci with potential strong ASE |
minimumMinorAlleleFrequency |
minimum frequency allowed for the second most common allele. Setting this parameter higher will minimise the SNP calls resulting from technical read errors, at the cost of missing loci with potential strong ASE |
minimumBiAllelicFrequency |
minimum frequency allowed for the first and second most common allele. Setting a Lower value for this parameter will minimise the identification of loci with three or more alleles in one sample. This is useful if sequencing errors are suspected to be common. |
verbose |
logical indicating if process information should be displayed |
This function scans all reads stored in a GAlignmentsList
for
possible heterozygote positions. The user can balance the sensitivity of the
search by modifying the minimumReadsAtPos, maximumMajorAlleleFrequency and
minimumBiAllelicFrequency arguments.
scanForHeterozygotes
returns a GRanges object with the SNPs
for the BamList object that was used as input.
Jesper R. Gadin, Lasse Folkersen
The getAlleleCounts
which is a
function that count the number of reads overlapping a site.
data(reads) s <- scanForHeterozygotes(reads,verbose=FALSE)
data(reads) s <- scanForHeterozygotes(reads,verbose=FALSE)
old version of an ASEset which needs to be updated
Jesper R. Gadin, Lasse Folkersen
##load eample data (Not Run) #data(ASEset.old)
##load eample data (Not Run) #data(ASEset.old)
ASEset with simulated data with SNPs within the first 200bp of chromosome 17, which is required to have example data for the refAllele function.
Jesper R. Gadin, Lasse Folkersen
##load eample data (Not Run) #data(ASEset.sim)
##load eample data (Not Run) #data(ASEset.sim)
count alleles and create an ASEset direct from bam file instead of reading into R first.
ASEsetFromBam(gr, ...) ## S4 method for signature 'GRanges' ASEsetFromBam( gr, pathToDir, PE = TRUE, flagsMinusStrand = c(83, 163), flagsPlusStrand = c(99, 147), strandUnknown = FALSE, ... )
ASEsetFromBam(gr, ...) ## S4 method for signature 'GRanges' ASEsetFromBam( gr, pathToDir, PE = TRUE, flagsMinusStrand = c(83, 163), flagsPlusStrand = c(99, 147), strandUnknown = FALSE, ... )
gr |
GenomicRanges of SNPs to create ASEset for |
... |
passed on to ASEsetFromBam function |
pathToDir |
Directory of bam files with index in same directory |
PE |
if paired end or not (default: TRUE) |
flagsMinusStrand |
flags that mark reads coming from minus strand |
flagsPlusStrand |
flags that mark reads coming from plus strand |
strandUnknown |
default: FALSE |
counts the alleles in a bam file based on GRanges positions.
Jesper R. Gadin
data(GRvariants) gr <- GRvariants ##no execution at the moment #pathToDir <- system.file('inst/extdata/ERP000101_subset', package='AllelicImbalance') #a <- ASEsetFromBam(gr, pathToDir)
data(GRvariants) gr <- GRvariants ##no execution at the moment #pathToDir <- system.file('inst/extdata/ERP000101_subset', package='AllelicImbalance') #a <- ASEsetFromBam(gr, pathToDir)
Generates lattice barplots for ASEset objects. Two levels of plotting detail are provided: a detailed barplot of read counts by allele useful for fewer samples and SNPs, and a less detailed barplot of the fraction of imbalance, useful for more samples and SNPs.
barplotLatticeFraction(identifier, ...) barplotLatticeCounts(identifier, ...)
barplotLatticeFraction(identifier, ...) barplotLatticeCounts(identifier, ...)
identifier |
the single snp name to plot |
... |
used to pass on variables |
filter.pValue.fraction
is intended to remove p-value annotation with
very large difference in frequency, which could just be a sequencing
mistake. This is to avoid p-values like 1e-235 or similar.
sampleColour
User specified colours, either given as named colours
('red', 'blue', etc) or as hexadecimal code. Can be either length 1 for all
samples, or else of a length corresponding to the number of samples for
individual colouring.
Jesper R. Gadin, Lasse Folkersen
The ASEset
class which the barplot
function can be called up on.
a <- ASEset name <- rownames(a)[1] barplotLatticeFraction(identifier=name, x=a, astrand="+") barplotLatticeCounts(identifier=name, x=a, astrand="+")
a <- ASEset name <- rownames(a)[1] barplotLatticeFraction(identifier=name, x=a, astrand="+") barplotLatticeCounts(identifier=name, x=a, astrand="+")
Performs a binomial test on an ASEset object.
## S4 method for signature 'ASEset' binom.test(x, n = "*")
## S4 method for signature 'ASEset' binom.test(x, n = "*")
x |
|
n |
strand option |
the test can only be applied to one strand at the time.
binom.test
returns a matrix
Jesper R. Gadin, Lasse Folkersen
The chisq.test
which is another test
that can be applied on an ASEset object.
#load example data data(ASEset) #make a binomial test binom.test(ASEset,'*')
#load example data data(ASEset) #make a binomial test binom.test(ASEset,'*')
Performs a chisq.test on an ASEset object.
## S4 method for signature 'ASEset' chisq.test(x, y = "*")
## S4 method for signature 'ASEset' chisq.test(x, y = "*")
x |
|
y |
strand option |
The test is performed on one strand in an ASEset object.
chisq.test
returns a matrix with the chisq.test P-value for
each SNP and sample
Jesper R. Gadin, Lasse Folkersen
The binom.test
which is another test
that can be applied on an ASEset object.
#load example data data(ASEset) #make a chi-square test on default non-stranded strand chisq.test(ASEset)
#load example data data(ASEset) #make a chi-square test on default non-stranded strand chisq.test(ASEset)
count alleles before creating ASEse.
countAllelesFromBam(gr, ...) ## S4 method for signature 'GRanges' countAllelesFromBam( gr, pathToDir, flag = NULL, scanBamFlag = NULL, return.class = "array", verbose = TRUE, ... )
countAllelesFromBam(gr, ...) ## S4 method for signature 'GRanges' countAllelesFromBam( gr, pathToDir, flag = NULL, scanBamFlag = NULL, return.class = "array", verbose = TRUE, ... )
gr |
GRanges that contains SNPs of interest |
... |
arguments to pass on |
pathToDir |
path to directory of bam files |
flag |
specify one flag to use as filter, default is no filtering. allowed flags are 99, 147, 83 and 163 |
scanBamFlag |
set a custom flag to use as filter |
return.class |
type of class for the returned object |
verbose |
makes funciton more talkative |
counts the alleles in a bam file based on GRanges positions.
Important excerpt from the details section of the internal applyPileups function: Regardless of 'param' values, the algorithm follows samtools by excluding reads flagged as unmapped, secondary, duplicate, or failing quality control.
Jesper R. Gadin
data(GRvariants) gr <- GRvariants ##not run at the moment #pathToDir <- system.file('inst/extdata/ERP000101_subset', package='AllelicImbalance') #ar <- countAllelesFromBam(gr, pathToDir)
data(GRvariants) gr <- GRvariants ##not run at the moment #pathToDir <- system.file('inst/extdata/ERP000101_subset', package='AllelicImbalance') #ar <- countAllelesFromBam(gr, pathToDir)
Get coverage per nucleotide for reads covering a region
coverageMatrixListFromGAL(BamList, ...) ## S4 method for signature 'GAlignmentsList' coverageMatrixListFromGAL(BamList, strand = "*", ignore.empty.bam.row = TRUE)
coverageMatrixListFromGAL(BamList, ...) ## S4 method for signature 'GAlignmentsList' coverageMatrixListFromGAL(BamList, strand = "*", ignore.empty.bam.row = TRUE)
BamList |
GAlignmentsList containing reads over the region to calculate coverage |
... |
arguments to pass on |
strand |
strand has to be '+' or '-' |
ignore.empty.bam.row |
argument not in use atm |
a convenience function to get the coverage from a list of reads stored in GAlignmnetsList, and returns by default a list with one matrix, and information about the genomic start and stop positions.
Jesper R. Gadin
r <- reads seqlevels(r) <- '17' covMatList <- coverageMatrixListFromGAL(BamList=r, strand='+')
r <- reads seqlevels(r) <- '17' covMatList <- coverageMatrixListFromGAL(BamList=r, strand='+')
Create mapbias array from genotype matrix requires genotype information
defaultMapBias(x, ...) ## S4 method for signature 'ASEset' defaultMapBias(x, return.class = "array")
defaultMapBias(x, ...) ## S4 method for signature 'ASEset' defaultMapBias(x, return.class = "array")
x |
|
... |
internal arguments |
return.class |
"array" or "ASEset" |
Default mapbias will be 0.5 for bi-allelic snps and 1 for homozygots. For genotypes with NA, 0.5 will be placed on all four alleles. Therefore tri-allelic can not be used atm. Genotype information has to be placed in the genotype(x) assay.
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset.sim) fasta <- system.file('extdata/hg19.chr17.subset.fa', package='AllelicImbalance') refAllele(ASEset.sim,fasta=fasta) a <- refAllele(ASEset.sim,fasta=fasta)
#load example data data(ASEset.sim) fasta <- system.file('extdata/hg19.chr17.subset.fa', package='AllelicImbalance') refAllele(ASEset.sim,fasta=fasta) a <- refAllele(ASEset.sim,fasta=fasta)
used to populate the phase slot in an ASEset object
defaultPhase(i, ...) ## S4 method for signature 'numeric' defaultPhase(i, j, ...)
defaultPhase(i, ...) ## S4 method for signature 'numeric' defaultPhase(i, j, ...)
i |
number of rows |
... |
arguments to forward to internal functions |
j |
number of columns |
will set everything to 0
Jesper R. Gadin, Lasse Folkersen
i <- 5 j <- 10 defaultPhase(i,j)
i <- 5 j <- 10 defaultPhase(i,j)
detection of AllelicImbalance
detectAI(x, ...) ## S4 method for signature 'ASEset' detectAI( x, return.class = "DetectedAI", strand = "*", threshold.frequency = 0, threshold.count.sample = 1, threshold.delta.frequency = 0, threshold.pvalue = 0.05, inferGenotype = FALSE, random.ref = FALSE, function.test = "binom.test", verbose = TRUE, gc = FALSE, biasMatrix = FALSE )
detectAI(x, ...) ## S4 method for signature 'ASEset' detectAI( x, return.class = "DetectedAI", strand = "*", threshold.frequency = 0, threshold.count.sample = 1, threshold.delta.frequency = 0, threshold.pvalue = 0.05, inferGenotype = FALSE, random.ref = FALSE, function.test = "binom.test", verbose = TRUE, gc = FALSE, biasMatrix = FALSE )
x |
ASEset |
... |
internal arguments |
return.class |
class to return (atm only class 'logical') |
strand |
strand to infer from |
threshold.frequency |
least fraction to classify (see details) |
threshold.count.sample |
least amount of counts to try to infer allele |
threshold.delta.frequency |
minimum of frequency difference from 0.5 (or mapbias adjusted value) |
threshold.pvalue |
pvalue over this number will be filtered out |
inferGenotype |
infer genotypes based on count data in ASEset object |
random.ref |
set the reference as random if you dont know. Affects interpretation of results. |
function.test |
At the moment the only available option is 'binomial.test' |
verbose |
makes function more talkative |
gc |
use garbage collection when possible to save space |
biasMatrix |
use biasMatrix in ASEset, or use default expected frequency of 0.5 for all sites |
threshold.frequency is the least fraction needed to classify as bi tri or quad allelic SNPs. If 'all' then all of bi tri and quad allelic SNPs will use the same threshold. Everything under the treshold will be regarded as noise. 'all' will return a matrix with snps as rows and uni bi tri and quad will be columns. For this function Anything that will return TRUE for tri-allelicwill also return TRUE for uni and bi-allelic for the same SNP an Sample.
return.type 'ref' return only AI when reference allele is more expressed. 'alt' return only AI when alternative allele is more expressed or 'all' for both 'ref' and 'alt' alleles. Reference allele is the one present in the reference genome on the forward strand.
threshold.delta.frequency and function.test will use the value in mapBias(x) as expected value.
function.test will use the two most expressed alleles for testing. Make therefore sure there are no tri-allelic SNPs or somatic mutations among the SNPs in the ASEset.
inferGenotype(), set TRUE it should be used with as much samples as possible. If you split up the samples and run detectAI() on each sample separately, please make sure you have inferred the genotypes in before hand, alternatively used the genotypes detected by another variantCaller or chip-genotypes. Use ONLY biallelic genotypes.
Jesper R. Gadin
#load example data data(ASEset) a <- ASEset dai <- detectAI(a)
#load example data data(ASEset) a <- ASEset dai <- detectAI(a)
Object that holds results from AI detection.
referenceFrequency(x, ...) ## S4 method for signature 'DetectedAI' referenceFrequency(x, return.class = "array") thresholdFrequency(x, ...) ## S4 method for signature 'DetectedAI' thresholdFrequency(x, return.class = "array") thresholdCountSample(x, ...) ## S4 method for signature 'DetectedAI' thresholdCountSample(x, return.class = "array") thresholdDeltaFrequency(x, ...) ## S4 method for signature 'DetectedAI' thresholdDeltaFrequency(x, return.class = "array") thresholdPvalue(x, ...) ## S4 method for signature 'DetectedAI' thresholdPvalue(x, return.class = "array")
referenceFrequency(x, ...) ## S4 method for signature 'DetectedAI' referenceFrequency(x, return.class = "array") thresholdFrequency(x, ...) ## S4 method for signature 'DetectedAI' thresholdFrequency(x, return.class = "array") thresholdCountSample(x, ...) ## S4 method for signature 'DetectedAI' thresholdCountSample(x, return.class = "array") thresholdDeltaFrequency(x, ...) ## S4 method for signature 'DetectedAI' thresholdDeltaFrequency(x, return.class = "array") thresholdPvalue(x, ...) ## S4 method for signature 'DetectedAI' thresholdPvalue(x, return.class = "array")
x |
ASEset object or list of ASEsets |
... |
pass arguments to internal functions |
return.class |
type of class returned eg. "list or ""array". |
The DetectedAI-class contains
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset dai <- detectAI(a) #summary(gba) #write.tables(dai)
data(ASEset) a <- ASEset dai <- detectAI(a) #summary(gba) #write.tables(dai)
plot functions for the DetectedAI-class
frequency_vs_threshold_variable_plot(x, ...) ## S4 method for signature 'DetectedAI' frequency_vs_threshold_variable_plot( x, var = "threshold.count.sample", hetOverlay = TRUE, smoothscatter = FALSE ) detectedAI_vs_threshold_variable_plot(x, ...) ## S4 method for signature 'DetectedAI' detectedAI_vs_threshold_variable_plot( x, var = "threshold.count.sample", summaryOverSamples = "sum", hetOverlay = TRUE, smoothscatter = FALSE ) reference_frequency_density_vs_threshold_variable_plot(x, ...) ## S4 method for signature 'DetectedAI' reference_frequency_density_vs_threshold_variable_plot( x, var = "threshold.count.sample" ) detectedAI_vs_threshold_variable_multigraph_plot(x, ...) ## S4 method for signature 'DetectedAI' detectedAI_vs_threshold_variable_multigraph_plot(x, ncol = 2, ...) frequency_vs_threshold_variable_multigraph_plot(x, ...) ## S4 method for signature 'DetectedAI' frequency_vs_threshold_variable_multigraph_plot(x, ncol = 2, ...) reference_frequency_density_vs_threshold_variable_multigraph_plot(x, ...) ## S4 method for signature 'DetectedAI' reference_frequency_density_vs_threshold_variable_multigraph_plot( x, ncol = 2, ... )
frequency_vs_threshold_variable_plot(x, ...) ## S4 method for signature 'DetectedAI' frequency_vs_threshold_variable_plot( x, var = "threshold.count.sample", hetOverlay = TRUE, smoothscatter = FALSE ) detectedAI_vs_threshold_variable_plot(x, ...) ## S4 method for signature 'DetectedAI' detectedAI_vs_threshold_variable_plot( x, var = "threshold.count.sample", summaryOverSamples = "sum", hetOverlay = TRUE, smoothscatter = FALSE ) reference_frequency_density_vs_threshold_variable_plot(x, ...) ## S4 method for signature 'DetectedAI' reference_frequency_density_vs_threshold_variable_plot( x, var = "threshold.count.sample" ) detectedAI_vs_threshold_variable_multigraph_plot(x, ...) ## S4 method for signature 'DetectedAI' detectedAI_vs_threshold_variable_multigraph_plot(x, ncol = 2, ...) frequency_vs_threshold_variable_multigraph_plot(x, ...) ## S4 method for signature 'DetectedAI' frequency_vs_threshold_variable_multigraph_plot(x, ncol = 2, ...) reference_frequency_density_vs_threshold_variable_multigraph_plot(x, ...) ## S4 method for signature 'DetectedAI' reference_frequency_density_vs_threshold_variable_multigraph_plot( x, ncol = 2, ... )
x |
detectedAI object |
... |
pass on variables internally |
var |
string, see details for available options |
hetOverlay |
logical, if TRUE show nr of het SNPs used to calculate the reference allele frequency mean |
smoothscatter |
boolean, smoothscatter over the means |
summaryOverSamples |
'mean' or 'sum' |
ncol |
nr of columns for multiplots |
plot helper functions. The documentation will be improved before next release.
Jesper R. Gadin, Lasse Folkersen
#some example code here #generate example data(ASEset) a <- ASEset dai <- detectAI(a, threshold.count.sample=1:50, threshold.frequency=seq(0,0.5,by=0.01), threshold.delta.frequency=seq(0,0.5,by=0.01), threshold.pvalue=rev(seq(0.001,0.05, by=0.005)) ) frequency_vs_threshold_variable_plot(dai) detectedAI_vs_threshold_variable_plot(dai) detectedAI_vs_threshold_variable_multigraph_plot(dai) frequency_vs_threshold_variable_multigraph_plot(dai)
#some example code here #generate example data(ASEset) a <- ASEset dai <- detectAI(a, threshold.count.sample=1:50, threshold.frequency=seq(0,0.5,by=0.01), threshold.delta.frequency=seq(0,0.5,by=0.01), threshold.pvalue=rev(seq(0.001,0.05, by=0.005)) ) frequency_vs_threshold_variable_plot(dai) detectedAI_vs_threshold_variable_plot(dai) detectedAI_vs_threshold_variable_multigraph_plot(dai) frequency_vs_threshold_variable_multigraph_plot(dai)
Summary helper functions for the DetectedAI-class
frequency_vs_threshold_variable_summary(x, ...) ## S4 method for signature 'DetectedAI' frequency_vs_threshold_variable_summary( x, var = "threshold.count.sample", return.class = "matrix", ... ) detectedAI_vs_threshold_variable_summary(x, ...) ## S4 method for signature 'DetectedAI' detectedAI_vs_threshold_variable_summary(x, var = "threshold.count.sample") usedSNPs_vs_threshold_variable_summary(x, ...) ## S4 method for signature 'DetectedAI' usedSNPs_vs_threshold_variable_summary(x, var = "threshold.count.sample")
frequency_vs_threshold_variable_summary(x, ...) ## S4 method for signature 'DetectedAI' frequency_vs_threshold_variable_summary( x, var = "threshold.count.sample", return.class = "matrix", ... ) detectedAI_vs_threshold_variable_summary(x, ...) ## S4 method for signature 'DetectedAI' detectedAI_vs_threshold_variable_summary(x, var = "threshold.count.sample") usedSNPs_vs_threshold_variable_summary(x, ...) ## S4 method for signature 'DetectedAI' usedSNPs_vs_threshold_variable_summary(x, var = "threshold.count.sample")
x |
detectedAI object |
... |
pass on variables internally |
var |
string, see details for available options |
return.class |
'matrix' or 'array' |
Summary helper functions. The documentation will be improved before next release.
Jesper R. Gadin, Lasse Folkersen
#some example code here #generate example data(ASEset) a <- ASEset dai <- detectAI(a, threshold.count.sample=1:50, threshold.frequency=seq(0,0.5,by=0.01), threshold.delta.frequency=seq(0,0.5,by=0.01), threshold.pvalue=rev(seq(0.001,0.05, by=0.005)) ) frequency_vs_threshold_variable_summary(dai)
#some example code here #generate example data(ASEset) a <- ASEset dai <- detectAI(a, threshold.count.sample=1:50, threshold.frequency=seq(0,0.5,by=0.01), threshold.delta.frequency=seq(0,0.5,by=0.01), threshold.pvalue=rev(seq(0.001,0.05, by=0.005)) ) frequency_vs_threshold_variable_summary(dai)
Summarizes information to ease creating plots
fractionPlotDf(x, snp, strand = "*", top.fraction.criteria = "maxcount", ...) ## S4 method for signature 'ASEset' fractionPlotDf(x, snp, strand = "*", top.fraction.criteria = "maxcount", ...)
fractionPlotDf(x, snp, strand = "*", top.fraction.criteria = "maxcount", ...) ## S4 method for signature 'ASEset' fractionPlotDf(x, snp, strand = "*", top.fraction.criteria = "maxcount", ...)
x |
ASEset |
snp |
rownames identifier for ASEset or row number |
strand |
'+', '-' or '*' |
top.fraction.criteria |
'maxcount', 'ref' or 'phase' |
... |
arguments to forward to internal functions |
Main purpose is to reduce the amount of overall code and ease maintenance.
top.fraction.criteria can take three options, maxcount, ref and phase. The top allele will be every second row in the data frame, with start from row 2. The maxcount argument will put the allele with most reads on top of the bivariate fraction. Similarly the ref argument will put always the reference allele on top. The phase arguments puts the maternal phase always on top. The top.fraction.criteria for the ref or phase arguments requires that both ref and alt is set in mcols(ASEset).
Jesper R. Gadin, Lasse Folkersen
#test on example ASEset data(ASEset) a <- ASEset df <- fractionPlotDf(a, 1, strand="+")
#test on example ASEset data(ASEset) a <- ASEset df <- fractionPlotDf(a, 1, strand="+")
A wrapper to make a global analysis based on paths for BAM, VCF and GFF files
gba(pathBam, ...) ## S4 method for signature 'character' gba(pathBam, pathVcf, pathGFF = NULL, verbose)
gba(pathBam, ...) ## S4 method for signature 'character' gba(pathBam, pathVcf, pathGFF = NULL, verbose)
pathBam |
path to bam file |
... |
arguments to pass on |
pathVcf |
path to vcf file |
pathGFF |
path to gff file |
verbose |
makes function more talkative |
Jesper R. Gadin
#empty as function doesn't exist
#empty as function doesn't exist
genomatrix is an example of a matrix with genotypes
Jesper R. Gadin, Lasse Folkersen
##load eample data (Not Run) #data(genomatrix)
##load eample data (Not Run) #data(genomatrix)
used to convert the genomatrix from the visually friendly matrix to phase array.
genotype2phase(x, ...) ## S4 method for signature 'matrix' genotype2phase( x, ref = NULL, return.class = "array", levels = c("A", "C", "G", "T"), ... )
genotype2phase(x, ...) ## S4 method for signature 'matrix' genotype2phase( x, ref = NULL, return.class = "array", levels = c("A", "C", "G", "T"), ... )
x |
matrix see examples |
... |
pass on additional param |
ref |
reference alleles |
return.class |
'array' or 'list' |
levels |
vector of expected alleles |
To not introduce redundant information in the ASEset object, the genotype matrix is translated to a phase matrix, containing the same information. Does not allow tri-allelic or multi-allelic SNPs, and if present the multi-allelic SNPs will lose the least occuring genotype.
This function can handle indels, but if the reference allele is not provided, the rank matrix which is temporary created might use lots of memory, depending on the amount of indels among the genotypes. As conclusion, it is preferable to send in reference genome when converting to phase.
levels information is only important if the reference allele has to be guessed, and so if reference information is provided, the levels argument can be ignored.
Jesper R. Gadin, Lasse Folkersen
#load example data data(genomatrix) data(ASEset) p <- genotype2phase(genomatrix, ref(ASEset))
#load example data data(genomatrix) data(ASEset) p <- genotype2phase(genomatrix, ref(ASEset))
Given the positions of known SNPs, this function returns allele counts from a BamGRL object
getAlleleCounts(BamList, ...) ## S4 method for signature 'GAlignmentsList' getAlleleCounts( BamList, GRvariants, strand = "*", return.class = "list", verbose = TRUE, ... )
getAlleleCounts(BamList, ...) ## S4 method for signature 'GAlignmentsList' getAlleleCounts( BamList, GRvariants, strand = "*", return.class = "list", verbose = TRUE, ... )
BamList |
A |
... |
parameters to pass on |
GRvariants |
A |
strand |
A length 1 |
return.class |
'list' or 'array' |
verbose |
Setting |
This function is used to retrieve the allele counts from specified positions
in a set of RNA-seq reads. The BamList
argument will typically have
been created using the impBamGAL
function on bam-files. The
GRvariants
is either a GRanges with user-specified locations or else
it is generated through scanning the same bam-files as in BamList
for
heterozygote locations (e.g. using scanForHeterozygotes
). The
GRvariants will currently only accept locations having width=1,
corresponding to bi-allelic SNPs. In the strand
argument, specifying
'*' is the same as retrieving the sum count of '+' and '-' reads
(and unknown strand reads in case these are found in the bam file). '*' is
the default behaviour and can be used when the RNA-seq experiments strand
information is not available.
getAlleleCounts
returns a list of several data.frame objects,
each storing the count data for one SNP.
Jesper R. Gadin, Lasse Folkersen
The scanForHeterozygotes
which is a
function to find possible heterozygote sites in a
GenomicAlignments object
#load example data data(reads) data(GRvariants) #get counts at the three positions specified in GRvariants alleleCount <- getAlleleCounts(BamList=reads,GRvariants, strand='*') #if the reads had contained stranded data, these two calls would #have given the correct input objects for getAlleleCounts alleleCountPlus <- getAlleleCounts(BamList=reads,GRvariants, strand='+') alleleCountMinus <- getAlleleCounts(BamList=reads,GRvariants, strand='-')
#load example data data(reads) data(GRvariants) #get counts at the three positions specified in GRvariants alleleCount <- getAlleleCounts(BamList=reads,GRvariants, strand='*') #if the reads had contained stranded data, these two calls would #have given the correct input objects for getAlleleCounts alleleCountPlus <- getAlleleCounts(BamList=reads,GRvariants, strand='+') alleleCountMinus <- getAlleleCounts(BamList=reads,GRvariants, strand='-')
Given the positions of known SNPs, this function returns allele quality from a BamGRL object
getAlleleQuality(BamList, ...) ## S4 method for signature 'GAlignmentsList' getAlleleQuality( BamList, GRvariants, fastq.format = "illumina.1.8", return.class = "array", verbose = TRUE, ... )
getAlleleQuality(BamList, ...) ## S4 method for signature 'GAlignmentsList' getAlleleQuality( BamList, GRvariants, fastq.format = "illumina.1.8", return.class = "array", verbose = TRUE, ... )
BamList |
A |
... |
parameters to pass on |
GRvariants |
A |
fastq.format |
default 'illumina.1.8' |
return.class |
'list' or 'array' |
verbose |
Setting |
This function is used to retrieve the allele quality strings from specified positions
in a set of RNA-seq reads. The BamList
argument will typically have
been created using the impBamGAL
function on bam-files. The
GRvariants
is either a GRanges with user-specified locations or else
it is generated through scanning the same bam-files as in BamList
for
heterozygote locations (e.g. using scanForHeterozygotes
). The
GRvariants will currently only accept locations having width=1,
corresponding to bi-allelic SNPs. The strand type information will be kept in the
returned object. If the strand is marked as unknown "*", it will be forced to the "+"
strand.
quaity information is extracted from the BamList object, and requires the presence of mcols(BamList)[["qual"]] to contain quality sequences.
getAlleleQuality
returns a list of several data.frame objects,
each storing the count data for one SNP.
Jesper R. Gadin, Lasse Folkersen
#load example data data(reads) data(GRvariants) #get counts at the three positions specified in GRvariants alleleQualityArray <- getAlleleQuality(BamList=reads,GRvariants) #place in ASEset object alleleCountsArray <- getAlleleCounts(BamList=reads,GRvariants, strand='*', return.class="array") a <- ASEsetFromArrays(GRvariants, countsUnknown = alleleCountsArray) aquals(a) <- alleleQualityArray
#load example data data(reads) data(GRvariants) #get counts at the three positions specified in GRvariants alleleQualityArray <- getAlleleQuality(BamList=reads,GRvariants) #place in ASEset object alleleCountsArray <- getAlleleCounts(BamList=reads,GRvariants, strand='*', return.class="array") a <- ASEsetFromArrays(GRvariants, countsUnknown = alleleCountsArray) aquals(a) <- alleleQualityArray
Given a character vector with genesymbols and an OrgDb object, this function returns a GRanges giving the coordinates of the genes.
getAreaFromGeneNames(genesymbols, ...) ## S4 method for signature 'character' getAreaFromGeneNames( genesymbols, OrgDb, leftFlank = 0, rightFlank = 0, na.rm = FALSE, verbose = TRUE )
getAreaFromGeneNames(genesymbols, ...) ## S4 method for signature 'character' getAreaFromGeneNames( genesymbols, OrgDb, leftFlank = 0, rightFlank = 0, na.rm = FALSE, verbose = TRUE )
genesymbols |
A character vector that contains genesymbols of genes from which we wish to retrieve the coordinates |
... |
arguments to pass on |
OrgDb |
An |
leftFlank |
A |
rightFlank |
A |
na.rm |
A |
verbose |
Setting |
This function is a convenience function that can be used to determine which
genomic coordinates to specify to e.g. impBamGAL
when retrieving
reads.
The function cannot handle genes that do not exist in the annotation. To remove these please set the na.rm=TRUE.
getAreaFromGeneNames
returns a GRanges object with genomic
coordinates around the specified genes
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) #get counts at the three positions specified in GRvariants library(org.Hs.eg.db ) searchArea<-getAreaFromGeneNames(c('PAX8','TLR7'), org.Hs.eg.db)
#load example data data(ASEset) #get counts at the three positions specified in GRvariants library(org.Hs.eg.db ) searchArea<-getAreaFromGeneNames(c('PAX8','TLR7'), org.Hs.eg.db)
an allele frequency array
getDefaultMapBiasExpMean(alleleCountList, ...) getDefaultMapBiasExpMean3D(alleleCountList, ...) ## S4 method for signature 'list' getDefaultMapBiasExpMean(alleleCountList) ## S4 method for signature 'ANY' getDefaultMapBiasExpMean3D(alleleCountList)
getDefaultMapBiasExpMean(alleleCountList, ...) getDefaultMapBiasExpMean3D(alleleCountList, ...) ## S4 method for signature 'list' getDefaultMapBiasExpMean(alleleCountList) ## S4 method for signature 'ANY' getDefaultMapBiasExpMean3D(alleleCountList)
alleleCountList |
A |
... |
parameters to pass on |
This function will assume there is no bias that comes from the mapping of reads, and therefore create a matrix with expected frequency of 0.5 for each allele.
getDefaultMapBiasExpMean
returns a matrix with a default
expected mean of 0.5 for every element.
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) #access SnpAfList alleleCountList <- alleleCounts(ASEset) #get default map bias exp mean matExpMean <- getDefaultMapBiasExpMean(alleleCountList)
#load example data data(ASEset) #access SnpAfList alleleCountList <- alleleCounts(ASEset) #get default map bias exp mean matExpMean <- getDefaultMapBiasExpMean(alleleCountList)
Given a GRanges object of SNPs and a SNPlocs annotation, this function attempts to replace the names of the GRanges object entries with rs-IDs.
getSnpIdFromLocation(GR, ...) ## S4 method for signature 'GRanges' getSnpIdFromLocation(GR, SNPloc, return.vector = FALSE, verbose = TRUE)
getSnpIdFromLocation(GR, ...) ## S4 method for signature 'GRanges' getSnpIdFromLocation(GR, SNPloc, return.vector = FALSE, verbose = TRUE)
GR |
A |
... |
arguments to pass on |
SNPloc |
A |
return.vector |
Setting |
verbose |
Setting |
This function is used to try to identify the rs-IDs of SNPs in a GRanges object.
getSnpIdFromLocation
returns the same GRanges object it was
given with, but with updated with rs.id information.
Jesper R. Gadin, Lasse Folkersen
is_32bit_windows <- .Platform$OS.type == "windows" && .Platform$r_arch == "i386" if (!is_32bit_windows && require(SNPlocs.Hsapiens.dbSNP144.GRCh37)) { #load example data data(ASEset) #get counts at the three positions specified in GRvariants updatedGRanges <- getSnpIdFromLocation(rowRanges(ASEset), SNPlocs.Hsapiens.dbSNP144.GRCh37) }
is_32bit_windows <- .Platform$OS.type == "windows" && .Platform$r_arch == "i386" if (!is_32bit_windows && require(SNPlocs.Hsapiens.dbSNP144.GRCh37)) { #load example data data(ASEset) #get counts at the three positions specified in GRvariants updatedGRanges <- getSnpIdFromLocation(rowRanges(ASEset), SNPlocs.Hsapiens.dbSNP144.GRCh37) }
Object that holds results from a global AI analysis including reference bias estimations and AI detection.
x |
ASEset object or list of ASEsets |
TxDb |
A |
... |
pass arguments to internal functions |
The GlobalAnalysis-class contains summaries and "pre-configured and pre-calculated lattice plots" needed to create an AI-report
An object of class GlobalAnalysis containing all data to make report.
Jesper R. Gadin, Lasse Folkersen
data(ASEset) #a <- ASEset #gba <- gba(a) #report(gba) #write.tables(gba) #graphs(gba) #as.list(gba)
data(ASEset) #a <- ASEset #gba <- gba(a) #report(gba) #write.tables(gba) #graphs(gba) #as.list(gba)
this data is a GRanges
object that contains the ranges for three
example SNPs.
Jesper R. Gadin, Lasse Folkersen
The reads
which is another example
object
#load example data data(GRvariants)
#load example data data(GRvariants)
uses base graphics hist plot
## S4 method for signature 'ASEset' hist(x, strand = "*", type = "mean", log = 1, ...)
## S4 method for signature 'ASEset' hist(x, strand = "*", type = "mean", log = 1, ...)
x |
|
strand |
'+','-' or '*' |
type |
'mean' (only one option atm) |
log |
an integer to log each value (integer 10 for log10) |
... |
arguments to forward to interal boxplots function |
The histogram will show the density over frequencies for each sample
Jesper R. Gadin, Lasse Folkersen
##load example data #data(ASEset) #a <- ASEset #hist(a)
##load example data #data(ASEset) #a <- ASEset #hist(a)
apply on list of variables to be put in the local environment
implodeList.old(x)
implodeList.old(x)
x |
list of variables |
help the propagation of e.g. graphical paramters
Jesper R. Gadin
lst <- list(hungry='yes', thirsty='no') implodeList.old(lst) #the check ls() ls()
lst <- list(hungry='yes', thirsty='no') implodeList.old(lst) #the check ls() ls()
Imports a specified genomic region from a bam file using a GRanges object as search area.
impBamGAL(UserDir, ...) ## S4 method for signature 'character' impBamGAL( UserDir, searchArea, files = NULL, XStag = FALSE, verbose = TRUE, ... )
impBamGAL(UserDir, ...) ## S4 method for signature 'character' impBamGAL( UserDir, searchArea, files = NULL, XStag = FALSE, verbose = TRUE, ... )
UserDir |
The relative or full path of folder containing bam files. |
... |
arguments to pass on |
searchArea |
A |
files |
use character vector to specify one or more files to import. The default imports all bam files from the directory. |
XStag |
Setting |
verbose |
makes the function more talkative. |
If the sequence data is strand-specific you may want to set XStag=TRUE. The strand specific information has then to be stored in the meta columns with column name 'XS'. If the aligner did not set the XS-tag and the data is strand- specific it is still be possible to infer the strand from the bit flags after importing the reads to R. Depending on the strand-specific protocol different combinations of the flags will have to be used. For illumina fr-secondstrand, 83 and 163 are minus strand reads and 99 and 147 are plus strand reads.
Jesper R. Gadin, Lasse Folkersen
#Declare searchArea searchArea <- GRanges(seqnames=c('17'), ranges=IRanges(79478301,79478361)) #Relative or full path pathToFiles <- system.file('extdata/ERP000101_subset', package='AllelicImbalance') #all files in directory reads <- impBamGAL(pathToFiles,searchArea,verbose=FALSE) #specified files in directory reads <- impBamGAL(pathToFiles,searchArea, files=c("ERR009160.bam", "ERR009167.bam"),verbose=FALSE)
#Declare searchArea searchArea <- GRanges(seqnames=c('17'), ranges=IRanges(79478301,79478361)) #Relative or full path pathToFiles <- system.file('extdata/ERP000101_subset', package='AllelicImbalance') #all files in directory reads <- impBamGAL(pathToFiles,searchArea,verbose=FALSE) #specified files in directory reads <- impBamGAL(pathToFiles,searchArea, files=c("ERR009160.bam", "ERR009167.bam"),verbose=FALSE)
Imports bla bal bal a specified genomic region from a bam file using a GenomicRanges object as search area.
impBamGRL.old(UserDir, searchArea, verbose = TRUE)
impBamGRL.old(UserDir, searchArea, verbose = TRUE)
UserDir |
The relative or full path of folder containing bam files. |
searchArea |
A |
verbose |
Setting |
These functions are right on tahea wrappers to import bam files into R and store them into either GRanges, GAlignments or GappedAlignmentpairs objects.
It is recommended to use the impBamGAL() which takes information of gaps into account. It is also possible to use the other variants as well, but then pre-filtering becomes important keps to understand because gapped, intron-spanning reads will cause problems. This is because the GRanges objects can not handle if gaps are present and will then give a wrong result when calculating the allele (SNP) count table.
impBamGRL
returns a GRangesList object containing the RNA-seq
reads in the region defined by the searchArea
argument.
impBamGAL
returns a list with GAlignments objects containing the
RNA-seq reads in the region defined by the searchArea
argument.
funImpBamGAPL
returns a list with GappedAlignmentPairs object
containing the RNA-seq reads in the region defined by the searchArea
argument.
Jesper R. Gadin, Lasse Folkersen
#Declare searchArea searchArea <- GRanges(seqnames=c('17'), ranges=IRanges(79478301,79478361)) #Relative or full path pathToFiles <- system.file('extdata/ERP000101_subset', package='AllelicImbalance')
#Declare searchArea searchArea <- GRanges(seqnames=c('17'), ranges=IRanges(79478301,79478361)) #Relative or full path pathToFiles <- system.file('extdata/ERP000101_subset', package='AllelicImbalance')
Imports a selection of a bcf file or files specified by a GenomicRanges object as search area.
impBcfGRL(UserDir, ...) ## S4 method for signature 'character' impBcfGRL(UserDir, searchArea = NULL, verbose = TRUE, ...) impBcfGR(UserDir, ...) ## S4 method for signature 'character' impBcfGR(UserDir, searchArea = NULL, verbose = TRUE, ...)
impBcfGRL(UserDir, ...) ## S4 method for signature 'character' impBcfGRL(UserDir, searchArea = NULL, verbose = TRUE, ...) impBcfGR(UserDir, ...) ## S4 method for signature 'character' impBcfGR(UserDir, searchArea = NULL, verbose = TRUE, ...)
UserDir |
The relative or full path of folder containing bam files. |
... |
parameters to pass on |
searchArea |
A |
verbose |
Setting |
A wrapper to import bcf files into R in the form of GenomicRanges objects.
BcfImpGRList
returns a GRangesList object. BcfImpGR
returns one GRanges object of all unique entries from one or more bcf files.
Make sure there is a complementary index file *.bcf.csi
for
each bcf file in UserDir
. If there is not, then the functions
impBcfGRL
and impBcfGR
will try to create them.
Jesper R. Gadin, Lasse Folkersen
The impBamGRL for importing bam files
The getAlleleCounts
for how to get allele(SNP) counts
The scanForHeterozygotes
for how to find possible
heterozygote positions
#Declare searchArea searchArea <- GRanges(seqnames=c('17'), ranges=IRanges(79478301,79478361)) #Relative or full path pathToFiles <- system.file('extdata/ERP000101_subset', package='AllelicImbalance') #import reads <- impBcfGRL(pathToFiles, searchArea, verbose=FALSE)
#Declare searchArea searchArea <- GRanges(seqnames=c('17'), ranges=IRanges(79478301,79478361)) #Relative or full path pathToFiles <- system.file('extdata/ERP000101_subset', package='AllelicImbalance') #import reads <- impBcfGRL(pathToFiles, searchArea, verbose=FALSE)
inference of SNPs
inferAlleles( x, strand = "*", return.type = "bi", threshold.frequency = 0, threshold.count.sample = 1, inferOver = "eachSample", allow.NA = FALSE )
inferAlleles( x, strand = "*", return.type = "bi", threshold.frequency = 0, threshold.count.sample = 1, inferOver = "eachSample", allow.NA = FALSE )
x |
ASEset |
strand |
strand to infer from |
return.type |
'uni' 'bi' 'tri' 'quad' 'all' |
threshold.frequency |
least fraction to classify (see details) |
threshold.count.sample |
least amount of counts to try to infer allele |
inferOver |
'eachSample' or 'allSamples' |
allow.NA |
treat NA as zero when TRUE |
threshold.frequency is the least fraction needed to classify as bi tri or quad allelic SNPs. If 'all' then all of bi tri and quad allelic SNPs will use the same threshold. Everything under the treshold will be regarded as noise. 'all' will return a matrix with snps as rows and uni bi tri and quad will be columns. For this function Anything that will return TRUE for tri-allelicwill also return TRUE for uni and bi-allelic for the same SNP an Sample.
Jesper R. Gadin
data(ASEset) i <- inferAlleles(ASEset)
data(ASEset) i <- inferAlleles(ASEset)
inference of the alternate allele based on count data
x |
|
return.class |
class of returned object |
allele.source |
'arank' |
verbose |
make function more talkative |
... |
arguments to forward to internal functions |
The inference essentially ranks all alleles and the most expressed allele not declared as reference will be inferred as the alternative allele. At the moment only inference of bi-allelic alternative alleles are available.
Jesper R. Gadin, Lasse Folkersen
#load data data(ASEset) alt <- inferAltAllele(ASEset)
#load data data(ASEset) alt <- inferAltAllele(ASEset)
inference of genotypes
inferGenotypes( x, strand = "*", return.class = "matrix", return.allele.allowed = "bi", threshold.frequency = 0, threshold.count.sample = 1 )
inferGenotypes( x, strand = "*", return.class = "matrix", return.allele.allowed = "bi", threshold.frequency = 0, threshold.count.sample = 1 )
x |
ASEset |
strand |
strand to infer from |
return.class |
'matrix' or 'vector' |
return.allele.allowed |
vector with 'bi' 'tri' or 'quad'. 'uni' Always gets returned |
threshold.frequency |
least fraction to classify (see details) |
threshold.count.sample |
least amount of counts to try to infer allele |
Oftern necessary information to link AI to SNPs outside coding region
Jesper R. Gadin
data(ASEset) g <- inferGenotypes(ASEset)
data(ASEset) g <- inferGenotypes(ASEset)
Functions to construct ASEset objects
ASEsetFromCountList( rowRanges, countListUnknown = NULL, countListPlus = NULL, countListMinus = NULL, colData = NULL, mapBiasExpMean = NULL, phase = NULL, aquals = NULL, verbose = FALSE, ... ) ASEsetFromArrays( rowRanges, countsUnknown = NULL, countsPlus = NULL, countsMinus = NULL, colData = NULL, mapBiasExpMean = NULL, phase = NULL, genotype = NULL, aquals = NULL, verbose = FALSE, ... )
ASEsetFromCountList( rowRanges, countListUnknown = NULL, countListPlus = NULL, countListMinus = NULL, colData = NULL, mapBiasExpMean = NULL, phase = NULL, aquals = NULL, verbose = FALSE, ... ) ASEsetFromArrays( rowRanges, countsUnknown = NULL, countsPlus = NULL, countsMinus = NULL, colData = NULL, mapBiasExpMean = NULL, phase = NULL, genotype = NULL, aquals = NULL, verbose = FALSE, ... )
rowRanges |
A |
countListUnknown |
A |
countListPlus |
A |
countListMinus |
A |
colData |
A |
mapBiasExpMean |
A 3D |
phase |
A |
aquals |
A 4-D array containing the countinformation, see details |
verbose |
Makes function more talkative |
... |
arguments passed on to SummarizedExperiment constructor |
countsUnknown |
An array containing the countinformation |
countsPlus |
An array containing the countinformation |
countsMinus |
An array containing the countinformation |
genotype |
matrix |
The resulting ASEset object is based on the RangedSummarizedExperiment class, and will therefore inherit the same accessors and ranges operations.
If both countListPlus and countListMinus are given they will be used to calculate countListUnknown, which is the sum of the plus and minus strands.
countListPlus, countListMinus and countListUnknown are i.e. the outputs from the getAlleleCounts function.
aquals is new for the devel branch and will be changed slighly before the relase to include better granularity.
ASEsetFromCountList
returns an ASEset
object.
ASEsetFromCountList
requires the same input data as a
RangedSummarizedExperiment, but with minimum one assay for the allele counts.
Jesper R. Gadin, Lasse Folkersen
#make example alleleCountListPlus set.seed(42) countListPlus <- list() snps <- c('snp1','snp2','snp3','snp4','snp5') for(snp in snps){ count<-matrix(rep(0,16),ncol=4,dimnames=list( c('sample1','sample2','sample3','sample4'), c('A','T','G','C'))) #insert random counts in two of the alleles for(allele in sample(c('A','T','G','C'),2)){ count[,allele]<-as.integer(rnorm(4,mean=50,sd=10)) } countListPlus[[snp]] <- count } #make example alleleCountListMinus countListMinus <- list() snps <- c('snp1','snp2','snp3','snp4','snp5') for(snp in snps){ count<-matrix(rep(0,16),ncol=4,dimnames=list( c('sample1','sample2','sample3','sample4'), c('A','T','G','C'))) #insert random counts in two of the alleles for(allele in sample(c('A','T','G','C'),2)){ count[,allele]<-as.integer(rnorm(4,mean=50,sd=10)) } countListMinus[[snp]] <- count } #make example rowRanges rowRanges <- GRanges( seqnames = Rle(c('chr1', 'chr2', 'chr1', 'chr3', 'chr1')), ranges = IRanges(1:5, width = 1, names = head(letters,5)), snp = paste('snp',1:5,sep='') ) #make example colData colData <- DataFrame(Treatment=c('ChIP', 'Input','Input','ChIP'), row.names=c('ind1','ind2','ind3','ind4')) #make ASEset a <- ASEsetFromCountList(rowRanges, countListPlus=countListPlus, countListMinus=countListMinus, colData=colData)
#make example alleleCountListPlus set.seed(42) countListPlus <- list() snps <- c('snp1','snp2','snp3','snp4','snp5') for(snp in snps){ count<-matrix(rep(0,16),ncol=4,dimnames=list( c('sample1','sample2','sample3','sample4'), c('A','T','G','C'))) #insert random counts in two of the alleles for(allele in sample(c('A','T','G','C'),2)){ count[,allele]<-as.integer(rnorm(4,mean=50,sd=10)) } countListPlus[[snp]] <- count } #make example alleleCountListMinus countListMinus <- list() snps <- c('snp1','snp2','snp3','snp4','snp5') for(snp in snps){ count<-matrix(rep(0,16),ncol=4,dimnames=list( c('sample1','sample2','sample3','sample4'), c('A','T','G','C'))) #insert random counts in two of the alleles for(allele in sample(c('A','T','G','C'),2)){ count[,allele]<-as.integer(rnorm(4,mean=50,sd=10)) } countListMinus[[snp]] <- count } #make example rowRanges rowRanges <- GRanges( seqnames = Rle(c('chr1', 'chr2', 'chr1', 'chr3', 'chr1')), ranges = IRanges(1:5, width = 1, names = head(letters,5)), snp = paste('snp',1:5,sep='') ) #make example colData colData <- DataFrame(Treatment=c('ChIP', 'Input','Input','ChIP'), row.names=c('ind1','ind2','ind3','ind4')) #make ASEset a <- ASEsetFromCountList(rowRanges, countListPlus=countListPlus, countListMinus=countListMinus, colData=colData)
Functions to construct DetectedAI objects
DetectedAIFromArray( x = "ASEset", strand = "*", reference.frequency = NULL, threshold.frequency = NULL, threshold.count.sample = NULL, threshold.delta.frequency = NULL, threshold.pvalue = NULL, threshold.frequency.names = NULL, threshold.count.sample.names = NULL, threshold.delta.frequency.names = NULL, threshold.pvalue.names = NULL, ... )
DetectedAIFromArray( x = "ASEset", strand = "*", reference.frequency = NULL, threshold.frequency = NULL, threshold.count.sample = NULL, threshold.delta.frequency = NULL, threshold.pvalue = NULL, threshold.frequency.names = NULL, threshold.count.sample.names = NULL, threshold.delta.frequency.names = NULL, threshold.pvalue.names = NULL, ... )
x |
|
strand |
set strand to detectAI over "+","-","*" |
reference.frequency |
frequencies of reference alleles based allele counts |
threshold.frequency |
logical array for frequency thresholds |
threshold.count.sample |
logical array for per sample allele count thresholds |
threshold.delta.frequency |
logical array for delta frequency thresholds. |
threshold.pvalue |
logical array for pvalue thresholds (max 1, min 0) |
threshold.frequency.names |
character vector |
threshold.count.sample.names |
character vector |
threshold.delta.frequency.names |
character vector |
threshold.pvalue.names |
character vector |
... |
internal arguments |
produces a class container for reference bias calculations
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset dai <- detectAI(a)
data(ASEset) a <- ASEset dai <- detectAI(a)
Functions to construct GlobalAnalysis objects
GAnalysis(x = "ASEset", ...)
GAnalysis(x = "ASEset", ...)
x |
|
... |
internal arguments |
produces a class container for a global analysis
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset # gba <- gba(a)
data(ASEset) a <- ASEset # gba <- gba(a)
Functions to construct RiskVariant objects
RiskVariantFromGRangesAndPhaseArray(x, phase, ...)
RiskVariantFromGRangesAndPhaseArray(x, phase, ...)
x |
GRanges object for the SNPs |
phase |
array with phaseinfo |
... |
internal arguments |
produces a class container for reference bias calculations
Jesper R. Gadin, Lasse Folkersen
data(ASEset) #p <- getPhaseFromSomewhere #rv <- RiskVariantFromGRangesAndPhaseArray(x=GRvariants, phase=p)
data(ASEset) #p <- getPhaseFromSomewhere #rv <- RiskVariantFromGRangesAndPhaseArray(x=GRvariants, phase=p)
adds a very customizable legend function for AllelicImbalance barplots.
legendBarplot( lowerLeftCorner, size, rownames, colnames, boxsize = 1, boxspace = 1, fgCol, bgCol, ylegendPos = 1, xlegendPos = 0.96, cex = 1 )
legendBarplot( lowerLeftCorner, size, rownames, colnames, boxsize = 1, boxspace = 1, fgCol, bgCol, ylegendPos = 1, xlegendPos = 0.96, cex = 1 )
lowerLeftCorner |
position of the plot to add legend to (default c(0,0)) |
size |
scale the plot, default is 1 |
rownames |
rownames in legend |
colnames |
colnames in legend |
boxsize |
size of each box fill |
boxspace |
space inbetween the box fill |
fgCol |
color for allele1 |
bgCol |
color for allele2 |
ylegendPos |
placement of the legend within the plot for y |
xlegendPos |
placement of the legend within the plot for x |
cex |
size of legend text |
the function is preferably called from within the AllelicImbalance barplot method.
Jesper R. Gadin
#code placeholders #< create a barplot with legend > #< add legend >
#code placeholders #< create a barplot with legend > #< add legend >
Object that holds results from AI detection.
pvalue(x, ...) ## S4 method for signature 'LinkVariantAlmlof' pvalue(x)
pvalue(x, ...) ## S4 method for signature 'LinkVariantAlmlof' pvalue(x)
x |
LinkVariantAlmlof object |
... |
pass arguments to internal functions |
The LinkVariantAlmlof-class contains
Jesper R. Gadin, Lasse Folkersen
#some code
#some code
plot an object of type LinkVariantAlmlof
plot(x, y, ...) ## S4 method for signature 'LinkVariantAlmlof,ANY' plot(x, y, ...)
plot(x, y, ...) ## S4 method for signature 'LinkVariantAlmlof,ANY' plot(x, y, ...)
x |
LinkVariantAlmlof object |
y |
not used |
... |
pass on arguments to internal methods |
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) #init risk variants p.ar <- phaseMatrix2Array(p) rv <- RiskVariantFromGRangesAndPhaseArray(x=GRvariants, phase=p.ar) #colnames has to be samea and same order in ASEset and RiskVariant colnames(a) <- colnames(rv) # in this example each and every snp in the ASEset defines a region r1 <- granges(a) # in this example two overlapping subsets of snps in the ASEset defines the region r2 <- split(granges(a)[c(1,2,2,3)],c(1,1,2,2)) # link variant almlof (lva) lv1 <- lva(a, rv, r1) lv2 <- lva(a, rv, r2) plot(lv2[1])
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) #init risk variants p.ar <- phaseMatrix2Array(p) rv <- RiskVariantFromGRangesAndPhaseArray(x=GRvariants, phase=p.ar) #colnames has to be samea and same order in ASEset and RiskVariant colnames(a) <- colnames(rv) # in this example each and every snp in the ASEset defines a region r1 <- granges(a) # in this example two overlapping subsets of snps in the ASEset defines the region r2 <- split(granges(a)[c(1,2,2,3)],c(1,1,2,2)) # link variant almlof (lva) lv1 <- lva(a, rv, r1) lv2 <- lva(a, rv, r2) plot(lv2[1])
make an almlof regression for arrays
lva(x, ...) ## S4 method for signature 'ASEset' lva( x, rv, region, settings = list(), return.class = "LinkVariantAlmlof", type = "lm", verbose = FALSE, covariates = matrix(), ... )
lva(x, ...) ## S4 method for signature 'ASEset' lva( x, rv, region, settings = list(), return.class = "LinkVariantAlmlof", type = "lm", verbose = FALSE, covariates = matrix(), ... )
x |
ASEset object with phase and 'ref'/'alt' allele information |
... |
arguments to forward to internal functions |
rv |
RiskVariant object with phase and 'ref'/'alt' allele information |
region |
RiskVariant object with phase and alternative allele information |
settings |
RiskVariant object with phase and alternative allele information |
return.class |
'LinkVariantAlmlof' (more options in future) |
type |
"lm" or "nlme", "nlme" needs subject information |
verbose |
logical, if set TRUE, then function will be more talkative |
covariates |
add data.frame with covariates (only integers and numeric) |
internal method that takes one array with results from regionSummary and one matrix with group information for each risk SNP (based on phase)
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) #init risk variants p.ar <- phaseMatrix2Array(p) rv <- RiskVariantFromGRangesAndPhaseArray(x=GRvariants, phase=p.ar) #colnames has to be samea and same order in ASEset and RiskVariant colnames(a) <- colnames(rv) # in this example each and every snp in the ASEset defines a region r1 <- granges(a) #use GRangesList to merge and use regions defined by each element of the #GRangesList r1b <- GRangesList(r1) r1c <- GRangesList(r1, r1) # in this example two overlapping subsets of snps in the ASEset defines the region r2 <- split(granges(a)[c(1,2,2,3)],c(1,1,2,2)) # link variant almlof (lva) lva(a, rv, r1) lva(a, rv, r1b) lva(a, rv, r1c) lva(a, rv, r2) # Use covariates (integers or nuemric) cov <- data.frame(age=sample(20:70, ncol(a)), sex=rep(c(1,2), each=ncol(a)/2), row.names=colnames(a)) lva(a, rv, r1, covariates=cov) lva(a, rv, r1b, covariates=cov) lva(a, rv, r1c, covariates=cov) lva(a, rv, r2, covariates=cov) # link variant almlof (lva), using nlme a2 <- a ac <- assays(a2)[["countsPlus"]] jit <- sample(c(seq(-0.10,0,length=5), seq(0,0.10,length=5)), size=length(ac) , replace=TRUE) assays(a2, withDimnames=FALSE)[["countsPlus"]] <- round(ac * (1+jit),0) ab <- cbind(a, a2) colData(ab)[["subject.group"]] <- c(1:ncol(a),1:ncol(a)) rv2 <- rv[,c(1:ncol(a),1:ncol(a))] colnames(ab) <- colnames(rv2) lva(ab, rv2, r1, type="nlme") lva(ab, rv2, r1b, type="nlme") lva(ab, rv2, r1c, type="nlme") lva(ab, rv2, r2, type="nlme")
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) #init risk variants p.ar <- phaseMatrix2Array(p) rv <- RiskVariantFromGRangesAndPhaseArray(x=GRvariants, phase=p.ar) #colnames has to be samea and same order in ASEset and RiskVariant colnames(a) <- colnames(rv) # in this example each and every snp in the ASEset defines a region r1 <- granges(a) #use GRangesList to merge and use regions defined by each element of the #GRangesList r1b <- GRangesList(r1) r1c <- GRangesList(r1, r1) # in this example two overlapping subsets of snps in the ASEset defines the region r2 <- split(granges(a)[c(1,2,2,3)],c(1,1,2,2)) # link variant almlof (lva) lva(a, rv, r1) lva(a, rv, r1b) lva(a, rv, r1c) lva(a, rv, r2) # Use covariates (integers or nuemric) cov <- data.frame(age=sample(20:70, ncol(a)), sex=rep(c(1,2), each=ncol(a)/2), row.names=colnames(a)) lva(a, rv, r1, covariates=cov) lva(a, rv, r1b, covariates=cov) lva(a, rv, r1c, covariates=cov) lva(a, rv, r2, covariates=cov) # link variant almlof (lva), using nlme a2 <- a ac <- assays(a2)[["countsPlus"]] jit <- sample(c(seq(-0.10,0,length=5), seq(0,0.10,length=5)), size=length(ac) , replace=TRUE) assays(a2, withDimnames=FALSE)[["countsPlus"]] <- round(ac * (1+jit),0) ab <- cbind(a, a2) colData(ab)[["subject.group"]] <- c(1:ncol(a),1:ncol(a)) rv2 <- rv[,c(1:ncol(a),1:ncol(a))] colnames(ab) <- colnames(rv2) lva(ab, rv2, r1, type="nlme") lva(ab, rv2, r1b, type="nlme") lva(ab, rv2, r1c, type="nlme") lva(ab, rv2, r2, type="nlme")
make an almlof regression for arrays (internal function)
lva.internal(x, ...) ## S4 method for signature 'array' lva.internal( x, grp, element = 3, type = "lm", subject = NULL, covariates = matrix(), ... )
lva.internal(x, ...) ## S4 method for signature 'array' lva.internal( x, grp, element = 3, type = "lm", subject = NULL, covariates = matrix(), ... )
x |
regionSummary array phased for maternal allele |
... |
arguments to forward to internal functions |
grp |
group 1-3 (1 for 0:0, 2 for 1:0 or 0:1, and 3 for 1:1) |
element |
which column in x contains the values to use with lm. |
type |
which column in x contains the values to use with lm. |
subject |
which samples belongs to the same individual |
covariates |
add data.frame with covariates (only integers and numeric) |
internal method that takes one array with results from regionSummary and one matrix with group information for each risk SNP (based on phase). Input and output objects can change format slightly in future.
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) # in this example two overlapping subsets of snps in the ASEset defines the region region <- split(granges(a)[c(1,2,2,3)], c(1,1,2,2)) rs <- regionSummary(a, region, return.class="array", return.meta=FALSE) # use (change to generated riskSNP phase later) phs <- array(c(phase(a,return.class="array")[1,,c(1, 2)], phase(a,return.class="array")[2,,c(1, 2)]), dim=c(20,2,2)) grp <- matrix(2, nrow=dim(phs)[1], ncol=dim(phs)[2]) grp[(phs[,,1] == 0) & (phs[,,2] == 0)] <- 1 grp[(phs[,,1] == 1) & (phs[,,2] == 1)] <- 3 #only use mean.fr at the moment, which is col 3 lva.internal(x=assays(rs)[["rs1"]],grp=grp, element=3)
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) # in this example two overlapping subsets of snps in the ASEset defines the region region <- split(granges(a)[c(1,2,2,3)], c(1,1,2,2)) rs <- regionSummary(a, region, return.class="array", return.meta=FALSE) # use (change to generated riskSNP phase later) phs <- array(c(phase(a,return.class="array")[1,,c(1, 2)], phase(a,return.class="array")[2,,c(1, 2)]), dim=c(20,2,2)) grp <- matrix(2, nrow=dim(phs)[1], ncol=dim(phs)[2]) grp[(phs[,,1] == 0) & (phs[,,2] == 0)] <- 1 grp[(phs[,,1] == 1) & (phs[,,2] == 1)] <- 3 #only use mean.fr at the moment, which is col 3 lva.internal(x=assays(rs)[["rs1"]],grp=grp, element=3)
Replaces all selected positions in a fasta file with the character N
makeMaskedFasta(fastaIn, ...) ## S4 method for signature 'character' makeMaskedFasta( fastaIn, fastaOut, posToReplace, splitOnSeqlevels = TRUE, verbose = TRUE )
makeMaskedFasta(fastaIn, ...) ## S4 method for signature 'character' makeMaskedFasta( fastaIn, fastaOut, posToReplace, splitOnSeqlevels = TRUE, verbose = TRUE )
fastaIn |
character string of the path for the fasta file to be used |
... |
arguments to pass on |
fastaOut |
character string of the path for the masked fasta file (no extension) |
posToReplace |
GRanges object with the genomic ranges to replace |
splitOnSeqlevels |
write on file for each seqlevel to save memory |
verbose |
makes function more talkative |
Jesper R. Gadin
data(ASEset.sim) gr <- rowRanges(ASEset.sim) fastaIn <- system.file('extdata/hg19.chr17.subset.fa', package='AllelicImbalance') makeMaskedFasta(fastaIn=fastaIn, fastaOut="fastaOut",posToReplace=gr)
data(ASEset.sim) gr <- rowRanges(ASEset.sim) fastaIn <- system.file('extdata/hg19.chr17.subset.fa', package='AllelicImbalance') makeMaskedFasta(fastaIn=fastaIn, fastaOut="fastaOut",posToReplace=gr)
Create a matrix of bias for the reference allele
mapBiasRef(x, ...) ## S4 method for signature 'ASEset' mapBiasRef(x)
mapBiasRef(x, ...) ## S4 method for signature 'ASEset' mapBiasRef(x)
x |
|
... |
internal arguments |
select the expected frequency for the reference allele
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) a <- ASEset mat <- mapBiasRef(a)
#load example data data(ASEset) a <- ASEset mat <- mapBiasRef(a)
filter on minCountFilt snps
minCountFilt(x, ...) ## S4 method for signature 'ASEset' minCountFilt( x, strand = "*", threshold.counts = 1, sum = "all", replace.with = "zero", return.class = "ASEset" )
minCountFilt(x, ...) ## S4 method for signature 'ASEset' minCountFilt( x, strand = "*", threshold.counts = 1, sum = "all", replace.with = "zero", return.class = "ASEset" )
x |
|
... |
internal param |
strand |
strand to infer from |
threshold.counts |
cutoff for read counts (see details) |
sum |
'each' or 'all' |
replace.with |
only option 'zero' |
return.class |
'ASEset', 'array' or 'matrix' |
Description info here
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) a <- ASEset minCountFilt(a)
#load example data data(ASEset) a <- ASEset minCountFilt(a)
filter on minFreqFilt snps
minFreqFilt(x, ...) ## S4 method for signature 'ASEset' minFreqFilt( x, strand = "*", threshold.frequency = 0.1, replace.with = "zero", return.class = "ASEset", sum = "all" )
minFreqFilt(x, ...) ## S4 method for signature 'ASEset' minFreqFilt( x, strand = "*", threshold.frequency = 0.1, replace.with = "zero", return.class = "ASEset", sum = "all" )
x |
|
... |
internal param |
strand |
strand to infer from |
threshold.frequency |
least fraction to classify (see details) |
replace.with |
only option 'zero' |
return.class |
'ASEset', 'array' or 'matrix' |
sum |
'each' or 'all' |
Description info here
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) a <- ASEset minFreqFilt(a)
#load example data data(ASEset) a <- ASEset minFreqFilt(a)
filter on multiallelic snps
multiAllelicFilt(x, ...) ## S4 method for signature 'ASEset' multiAllelicFilt( x, strand = "*", threshold.count.sample = 10, threshold.frequency = 0.1, filterOver = "eachSample" )
multiAllelicFilt(x, ...) ## S4 method for signature 'ASEset' multiAllelicFilt( x, strand = "*", threshold.count.sample = 10, threshold.frequency = 0.1, filterOver = "eachSample" )
x |
|
... |
internal param |
strand |
strand to infer from |
threshold.count.sample |
least amount of counts to try to infer allele |
threshold.frequency |
least fraction to classify (see details) |
filterOver |
'eachSample' or 'allSamples' |
based on the allele counts for all four variants A, T, G and C and returns true if there is counts enough suggesting a third or more alleles. The sensitivity can be specified using 'threshold.count.sample' and 'threshold.frequency'.
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) a <- ASEset multiAllelicFilt(a)
#load example data data(ASEset) a <- ASEset multiAllelicFilt(a)
Convert the phase from the internally stored phase, ref and alt information
phase2genotype(x, ...) ## S4 method for signature 'array' phase2genotype(x, ref, alt, return.class = "matrix", ...)
phase2genotype(x, ...) ## S4 method for signature 'array' phase2genotype(x, ref, alt, return.class = "matrix", ...)
x |
array see examples |
... |
pass on additional param |
ref |
reference allele vector |
alt |
alternative allele vector |
return.class |
'matrix' or 'array' |
To not introduce redundant information in the ASEset object, the genotype matrix is accessed from the phase matrix, which together with ref and alt allele information contains the same information(not taken into account three-allelic or more SNPs).
The genotype matrix retrieved from an ASEset object can differ from the genotype matrix stored in the object if reference and alternative alleles were not used or has changed since the phase genotype matrix was stored. Basically, it is preferable to provide reference and alternative information when storing the genotype matrix.
If possible, it is better to not use a genotype matrix, but instead relying completely on storing a phase matrix(or array) together with reference and alternative allele information.
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset) data(genomatrix) p <- genotype2phase(genomatrix, ref(ASEset), return.class="array") ref <- ref(ASEset) alt <- inferAltAllele(ASEset) gt <- phase2genotype(p, ref, alt, return.class="matrix")
#load example data data(ASEset) data(genomatrix) p <- genotype2phase(genomatrix, ref(ASEset), return.class="array") ref <- ref(ASEset) alt <- inferAltAllele(ASEset) gt <- phase2genotype(p, ref, alt, return.class="matrix")
used to convert the phase from the visually friendly matrix to array.
phaseArray2phaseMatrix(x, ...) ## S4 method for signature 'array' phaseArray2phaseMatrix(x, ...)
phaseArray2phaseMatrix(x, ...) ## S4 method for signature 'array' phaseArray2phaseMatrix(x, ...)
x |
array see examples |
... |
arguments to forward to internal functions |
A more effectice way of store the phase data in the ASEset object
Jesper R. Gadin, Lasse Folkersen
#load data data(ASEset) a <- ASEset #example phase matrix p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) ar <- phaseMatrix2Array(p) #Convert back mat <- phaseArray2phaseMatrix(ar)
#load data data(ASEset) a <- ASEset #example phase matrix p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) ar <- phaseMatrix2Array(p) #Convert back mat <- phaseArray2phaseMatrix(ar)
used to convert the phase from the visually friendly matrix to array.
phaseMatrix2Array(x, ...) ## S4 method for signature 'matrix' phaseMatrix2Array(x, dimnames = NULL, ...)
phaseMatrix2Array(x, ...) ## S4 method for signature 'matrix' phaseMatrix2Array(x, dimnames = NULL, ...)
x |
matrix see examples |
... |
arguments to forward to internal functions |
dimnames |
list with dimnames |
A more effectice way of store the phase data in the ASEset object
Jesper R. Gadin, Lasse Folkersen
#load data data(ASEset) a <- ASEset #example phase matrix p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) ar <- phaseMatrix2Array(p)
#load data data(ASEset) a <- ASEset #example phase matrix p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) ar <- phaseMatrix2Array(p)
Create a vector of random reference alleles
randomRef(x, ...) ## S4 method for signature 'ASEset' randomRef(x, source = "alleleCounts", ...)
randomRef(x, ...) ## S4 method for signature 'ASEset' randomRef(x, source = "alleleCounts", ...)
x |
|
... |
internal arguments |
source |
'alleleCounts' |
Randomly shuffles which of the two alleles for each genotype that is indicated as reference allele, based on either allele count information or previous ref and alt alleles.
When the source is 'alleleCounts', the two most expressed alleles are taken as reference and alternative allele.
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset.sim) a <- ASEset.sim ref(a) <- randomRef(a, source = 'alleleCounts')
#load example data data(ASEset.sim) a <- ASEset.sim ref(a) <- randomRef(a, source = 'alleleCounts')
This data set corresponds to the BAM-file data import illustrated in the vignette. The data set consists of a chromosome 17 region from 20 RNA-seq experiments of HapMap samples.
Jesper R. Gadin, Lasse Folkersen
Montgomery SB et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010 Apr 1;464(7289):773-7.
The GRvariants
which is another
example object
##load eample data (Not Run) #data(reads)
##load eample data (Not Run) #data(reads)
Extract the allele based on SNP location from the reference fasta file
refAllele(x, fasta)
refAllele(x, fasta)
x |
|
fasta |
path to fasta file, index should be located in the same folder |
The alleles will be placed in the rowRanges() meta column 'ref'
Jesper R. Gadin, Lasse Folkersen
#load example data data(ASEset.sim) fasta <- system.file('extdata/hg19.chr17.subset.fa', package='AllelicImbalance') a <- refAllele(ASEset.sim,fasta=fasta)
#load example data data(ASEset.sim) fasta <- system.file('extdata/hg19.chr17.subset.fa', package='AllelicImbalance') a <- refAllele(ASEset.sim,fasta=fasta)
Gives a summary of AI-consistency for a transcript
regionSummary(x, ...) ## S4 method for signature 'ASEset' regionSummary(x, region, strand = "*", return.class = "RegionSummary", ...)
regionSummary(x, ...) ## S4 method for signature 'ASEset' regionSummary(x, region, strand = "*", return.class = "RegionSummary", ...)
x |
ASEset object |
... |
arguments to forward to internal functions |
region |
to summmarize over, the object can be a GRanges, GRangesList |
strand |
can be "+", "-" or "*" |
return.class |
"array" or "list". |
From a given set of e.g. transcripts exon ranges the function will return a summary for the sum of all exons. Phase information, reference and alternative allele is required.
A limitation comes to the strand-specificness. At the moment it is not possible to call over more than one strand type using the strands in region. This will be improved before going to release.
to calculate the direction and binomial p-values of AI the mapbias stored in the ASEset is used. see '?mapBias'.
Jesper R. Gadin, Lasse Folkersen
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) # in this example each and all snps in the ASEset defines the region region <- granges(a) t <- regionSummary(a, region) # in this example two overlapping subsets of snps in the ASEset defines the region region <- split(granges(a)[c(1,2,2,3)],c(1,1,2,2)) t <- regionSummary(a, region)
data(ASEset) a <- ASEset # Add phase set.seed(1) p1 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p2 <- matrix(sample(c(1,0),replace=TRUE, size=nrow(a)*ncol(a)),nrow=nrow(a), ncol(a)) p <- matrix(paste(p1,sample(c("|","|","/"), size=nrow(a)*ncol(a), replace=TRUE), p2, sep=""), nrow=nrow(a), ncol(a)) phase(a) <- p #add alternative allele information mcols(a)[["alt"]] <- inferAltAllele(a) # in this example each and all snps in the ASEset defines the region region <- granges(a) t <- regionSummary(a, region) # in this example two overlapping subsets of snps in the ASEset defines the region region <- split(granges(a)[c(1,2,2,3)],c(1,1,2,2)) t <- regionSummary(a, region)
Object that holds results from the regionSummary method
sumnames(x, ...) ## S4 method for signature 'RegionSummary' sumnames(x) basic(x, ...) ## S4 method for signature 'RegionSummary' basic(x)
sumnames(x, ...) ## S4 method for signature 'RegionSummary' sumnames(x) basic(x, ...) ## S4 method for signature 'RegionSummary' basic(x)
x |
RegionSummary object |
... |
pass arguments to internal functions |
The RegionSummary-class objects contains summaries for specified regions
Jesper R. Gadin, Lasse Folkersen
#some code
#some code
Object that holds results from AI detection.
## S4 method for signature 'RiskVariant' ref(x) ## S4 replacement method for signature 'RiskVariant,ANY' ref(x) <- value ## S4 method for signature 'RiskVariant' alt(x) ## S4 replacement method for signature 'RiskVariant,ANY' alt(x) <- value ## S4 method for signature 'RiskVariant' phase(x, return.class = "matrix") ## S4 replacement method for signature 'RiskVariant' phase(x) <- value
## S4 method for signature 'RiskVariant' ref(x) ## S4 replacement method for signature 'RiskVariant,ANY' ref(x) <- value ## S4 method for signature 'RiskVariant' alt(x) ## S4 replacement method for signature 'RiskVariant,ANY' alt(x) <- value ## S4 method for signature 'RiskVariant' phase(x, return.class = "matrix") ## S4 replacement method for signature 'RiskVariant' phase(x) <- value
x |
RiskVariant object or list of RiskVariants |
value |
argument used for replacement |
return.class |
type of class returned eg. "list or ""array". |
The RiskVariant-class contains
Jesper R. Gadin, Lasse Folkersen
#some code
#some code
Identifies the positions of SNPs found in BamGR reads.
scanForHeterozygotes.old( BamList, minimumReadsAtPos = 20, maximumMajorAlleleFrequency = 0.9, minimumBiAllelicFrequency = 0.9, maxReads = 15000, verbose = TRUE )
scanForHeterozygotes.old( BamList, minimumReadsAtPos = 20, maximumMajorAlleleFrequency = 0.9, minimumBiAllelicFrequency = 0.9, maxReads = 15000, verbose = TRUE )
BamList |
A |
minimumReadsAtPos |
minimum number of reads required to call a SNP at a given position |
maximumMajorAlleleFrequency |
maximum frequency allowed for the most common allele. Setting this parameter lower will minimise the SNP calls resulting from technical read errors, at the cost of missing loci with potential strong ASE |
minimumBiAllelicFrequency |
minimum frequency allowed for the first and second most common allele. Setting a Lower value for this parameter will minimise the identification of loci with three or more alleles in one sample. This is useful if sequencing errors are suspected to be common. |
maxReads |
max number of reads of one list-element allowed |
verbose |
logical indicating if process information should be displayed |
This function scans all reads stored in a GAlignmentsList
for
possible heterozygote positions. The user can balance the sensitivity of the
search by modifying the minimumReadsAtPos, maximumMajorAlleleFrequency and
minimumBiAllelicFrequency arguments.
scanForHeterozygotes.old
returns a GRanges object with the SNPs
for the BamList object that was used as input.
Jesper R. Gadin, Lasse Folkersen
The getAlleleCounts
which is a
function that count the number of reads overlapping a site.
data(reads) s <- scanForHeterozygotes.old(reads,verbose=FALSE)
data(reads) s <- scanForHeterozygotes.old(reads,verbose=FALSE)