Title: | iASeq: integrating multiple sequencing datasets for detecting allele-specific events |
---|---|
Description: | It fits correlation motif model to multiple RNAseq or ChIPseq studies to improve detection of allele-specific events and describe correlation patterns across studies. |
Authors: | Yingying Wei, Hongkai Ji |
Maintainer: | Yingying Wei <[email protected]> |
License: | GPL-2 |
Version: | 1.51.0 |
Built: | 2024-11-29 06:24:41 UTC |
Source: | https://github.com/bioc/iASeq |
In diploid organisms, certain genes can be expressed, methylated or regulated in an allele-specific manner, corresponding to allele-specific expression, allele-specific methylation and allele-specific binding. These allele-specific events (AS) are of high interest for phenotypic diversity and disease susceptibility. Next generation sequencing technologies provide opportunities to study AS globally. However, little is known about the mechanism of AS. For instance, the patterns of allele-specific binding across different Transcription Factors (TFs) and histone modifications (HMs) are unclear. Moreover, the limited number of reads on heterozygotic SNPs results in low-signal-to-noise ratio when calling AS. Here, we propose a Bayes hierarchical model to study AS by jointly analyzing multiple ChIPseq studies, RNAseq studies or MeDIPseq studies. The model is able to learn the patterns of AS across studies and make substantial improvement in calling AS.
Package: | iASeq |
Type: | Package |
Version: | 0.99.0 |
Date: | 2012-02-13 |
License: | GPL-2 |
Yingying Wei, Hongkai Ji
Maintainer: Yingying Wei <[email protected]>
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji (2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding.
iASeqmotif
, plotBIC
, plotMotif
, sampleASE
, ASErawfit
, singleEMfit
, sampleASE
This function produces standard statistics for allele-specific events based on a single RNAseq or ChIPseq study. It first pools replicates within a given study to sum the read counts for the reference allele and the non-reference allele. Then based on the pooled read counts, it calculates naive z statistic, naive Bayes statistic and empirical Bayes statistic.
ASErawfit(exprs,studyid,repid,refid)
ASErawfit(exprs,studyid,repid,refid)
exprs |
A matrix, each row of the matrix corresponds to a heterozygotic SNP and each column of the matrix corresponds to the reads count for either the reference allele or non-reference allele in a replicate of a study. |
studyid |
The group label for each column of exprs matrix. all columns in the same study have the same studyid. |
repid |
The sample label for each column of exprs matrix. The two columns within the same sample, one for reference allele and the other for non-reference allele, have the same repid. In other words, repid discriminates the different replicates within the same study. |
refid |
The reference allele label for each column of exprs matrix. Please code 0 for reference allele columns and 1 for non-reference allele columns to make the interpretation of over expressed (or bound) to be skewing to the reference allele. Otherwise, just interpret the other way round. |
One should indicate the studyid, repid and refid for each column clearly.
z |
Naive z statistic. A matrix, each row of the matrix corresponds to a heteroygpotic SNP of the input matrix ('exprs') and each column corresponds to a study. |
b |
Naive Bayes statistic. A matrix, each row of the matrix corresponds to a heteroygpotic SNP of the input matrix ('exprs') and each column corresponds to a study. |
B |
Empirical Bayes statistic. A matrix, each row of the matrix corresponds to a heteroygpotic SNP of the input matrix ('exprs') and each column corresponds to a study. |
c0d |
|
d0d |
|
p0d |
Mean of the null beta prior distribution for pooled counts for each study. A vector whose length equals to the number of studies. |
p0dz |
Raw mean of the reference allele proportion. A vector whose length equals to the number of studies. |
Yingying Wei
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji (2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding.
data(sampleASE) raw.fitted<-ASErawfit(sampleASE_exprs,sampleASE_studyid,sampleASE_repid,sampleASE_refid)
data(sampleASE) raw.fitted<-ASErawfit(sampleASE_exprs,sampleASE_studyid,sampleASE_repid,sampleASE_refid)
These functions are not part of the package application programming interface and are not recommended to be used by the users.
f0.loglike fup.loglike fdown.loglike iASeqmotiffit
f0.loglike fup.loglike fdown.loglike iASeqmotiffit
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji (2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding.
This function fits the Correlation Motif model to multiple RNAseq or ChIPseq studies. It gives the fitted values for the probability distribution of each motif, the fitted values of the given correlation matrix and the posterior probability for each SNP to be allele-specific events (allele-specific expression or allele-specific binding).
iASeqmotif(exprs,studyid,repid,refid,K,iter.max=100,tol=1e-3)
iASeqmotif(exprs,studyid,repid,refid,K,iter.max=100,tol=1e-3)
exprs |
A matrix, each row of the matrix corresponds to a heterozygotic SNP and each column of the matrix corresponds to the reads count for either the reference allele or non-reference allele in a replicate of a study. |
studyid |
The group label for each column of exprs matrix. all columns in the same study have the same studyid. |
repid |
The sample label for each column of exprs matrix. The two columns within the same sample, one for reference allele and the other for non-reference allele, have the same repid. In other words, repid discriminates the different replicates within the same study. |
refid |
The reference allele label for each column of exprs matrix. Please code 0 for reference allele columns and 1 for non-reference allele columns to make the interpretation of over expressed(or bound) to be skewing to the reference allele. Otherwise, just interpret the other way round. |
K |
A vector, each element specifing the number of non-null motifs a model wants to fit. |
tol |
The relative tolerance level of error. |
iter.max |
Maximun number of iterations. |
For the i^th element of , the function fits total number of
motifs,
non-null motifs and the null motif, to the data. Each SNP can belong to one of the
possible motifs according to prior probability distribution,
. For SNPs in motif
, the probability that they are over expressed (or bound) for the reference allele in study
is
and the probability that they are under expressed (or bound) is
. One should indicate the studyid, repid and refid for each column clearly.
bestmotif$p.post |
The posterior probability for each SNP to be allele-specific event. A vector whose length correpsonds to the number of SNPs. |
bestmotif$motif.prior |
Fitted values of the probability distribution of the |
bestmotif$motif.qup |
Fitted values of the over expressed (or bound) correlation motif matrix for the best fitted model. Each row corresponds to a non-null motif and each column corresponds to a study. |
bestmotif$motif.qdown |
Fitted values of the under expressed (or bound) correlation motif matrix for the best fitted model. Each row corresponds to a non-null motif and each column corresponds to a study. |
bestmotif$clustlike |
Posterior probability for a SNP to belong to a specific motif based on the best fitted model. Each row corresponds to a SNP and each column corresponds to a motif class. |
bestmotif$c0j |
|
bestmotif$d0j |
|
bestmotif$loglike |
The log-likelihood for the best fitted model. |
bic |
The BIC values of all fitted models. A matrix whose first column is the same as input motif number vector ('K') and the second column corresponds to the BIC value of model given by the motif number in the first column in the same row. |
loglike |
The log-likelihood of all fitted models. A matrix whose first column is the same as input motif number vector ('K') and the second column corresponds to the log likelihood value of the model given by the motif number in the first column in the same row. |
Yingying Wei, Hongkai Ji
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji(2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding.
data(sampleASE) #fit 1 to 2 non-null correlation motifs to the data motif.fitted<-iASeqmotif(sampleASE_exprs,sampleASE_studyid,sampleASE_repid,sampleASE_refid, K=1:2,iter.max=2,tol=1e-3)
data(sampleASE) #fit 1 to 2 non-null correlation motifs to the data motif.fitted<-iASeqmotif(sampleASE_exprs,sampleASE_studyid,sampleASE_repid,sampleASE_refid, K=1:2,iter.max=2,tol=1e-3)
This function plots BIC values for all fitted motif models.
plotBIC(fitted_cormotif)
plotBIC(fitted_cormotif)
fitted_cormotif |
The object obtained from iASeq. |
Yingying Wei
iASeqmotif
, plotMotif
, sampleASE
example(iASeqmotif) # compute 'motif.fitted' plotBIC(motif.fitted)
example(iASeqmotif) # compute 'motif.fitted' plotBIC(motif.fitted)
This function plots the Correlation Motif patterns, the associated prior probability distributions and the number of SNPs called for each motif based on posterior probability.
plotMotif(bestmotif,title="",cutoff)
plotMotif(bestmotif,title="",cutoff)
bestmotif |
The bestmotif obtained from iASeqmotif. |
title |
The title on the figure. |
cutoff |
The posterior probability cutoff for calling a SNP belonging to certain motif. |
Each row in all graphs corresponds to one motif pattern. The first graph shows , the correlation motif pattern of over expression (binding). The second graph shows
, the correlation motif pattern of under expression (binding). The grey color scale of cell
indicates the probability that motif
is over or under expressed in study
. Each row of the two bar charts corresponds
to the motif pattern in the same row of the left two pattern graphs. The length of
the bar in the first bar chart estimates the number of SNPs of the given pattern in the
dataset according to motif frequency, which is equal to
multiplying
the number of total SNPs. The length of the bar in the second bar chart shows the number of SNPs called for the given pattern according to the
of posterior probability.
Yingying Wei, Hongkai Ji
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji (2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding.
iASeqmotif
, plotBIC
, sampleASE
example(iASeqmotif) # compute 'motif.fitted' plotMotif(motif.fitted$bestmotif,cutoff=0.9)
example(iASeqmotif) # compute 'motif.fitted' plotMotif(motif.fitted$bestmotif,cutoff=0.9)
Here we present four files needed for the various iASeq fit functions.
sampleASE consists of five ChIP-seq studies from ENCODE GM12878 cell lines with 5504 heterozygotic SNPs. Each study has two replicates. Each replicate's fastq reads file was aligned to hg18 whole genome using MAQ (Version 0.7.1) with default parameters. Uniquely alignments were extracted following the mapping quality above 0. Alignment can also be down using other alignment tools such as Bowtie. The GM12878 genotype data was downloaded from the website http://alleleseq.gersteinlab.org/downloads.html [Rozowsky J et al.]. The reads aligned to each allele of a heterozygotic SNP were counted correspondingly. sampleASE_exprs saves the read counts. sampleASE_studyid prepares the study label for each sample; sample_repid describes the sample label for each column; sample_refid shows whether each column corresponds to the reference allele or the non-reference allele.
sampleASE_exprs |
The read count matrix for the example dataset used by iASeq package. Each row of the matrix corresponds to a heterozygotic SNP and each column of the matrix corresponds to the reads count for either the reference allele or non-reference allele in a replicate of a study. |
sampleASE_studyid |
The group label for each column of sampleASE_exprs matrix. All columns in the same study have the same studyid and there are five ChIP-seq studies in this example. |
sampleASE_repid |
The sample label for each column of sampleASE_exprs matrix. The two columns within the same sample, one for reference allele and the other for non-reference allele, have the same repid. In other words, repid discriminates the different replicates within the same study. Here each study has two replicates. |
sampleASE_refid |
The reference allele label for each column of sampleASE_exprs matrix. 0 is coded for reference allele columns and 1 is coded for non-reference allele columns. |
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji (2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding. Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N, et al (2011) AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol 7:522.
iASeqmotif
, plotBIC
, plotMotif
, ASErawfit
, singleEMfit
data(sampleASE) #fit 1 to 2 non-null correlation motifs to the data motif.fitted<-iASeqmotif(sampleASE_exprs,sampleASE_studyid,sampleASE_repid,sampleASE_refid, K=1:2,iter.max=2,tol=1e-3) plotBIC(motif.fitted) plotMotif(motif.fitted$bestmotif,cutoff=0.9)
data(sampleASE) #fit 1 to 2 non-null correlation motifs to the data motif.fitted<-iASeqmotif(sampleASE_exprs,sampleASE_studyid,sampleASE_repid,sampleASE_refid, K=1:2,iter.max=2,tol=1e-3) plotBIC(motif.fitted) plotMotif(motif.fitted$bestmotif,cutoff=0.9)
This function runs an EM algorithm for allele-specific events based on a single RNAseq or ChIPseq study. It first pools replicates within a given study to sum the read counts for the reference allele and the non-reference allele. Then based on the pooled read counts, it fits an EM algorithm with three mixture components, the null distribution, the reference allele over expressed (bound) and under expressed (bound) distributions to the data.
singleEMfit(exprs,studyid,repid,refid,iter.max=100,tol=1e-3)
singleEMfit(exprs,studyid,repid,refid,iter.max=100,tol=1e-3)
exprs |
A matrix, each row of the matrix corresponds to a heterozygotic SNP and each column of the matrix corresponds to the reads count for either the reference allele or non-reference allele in a replicate of a study. |
studyid |
The group label for each column of exprs matrix. All columns in the same study have the same studyid. |
repid |
The sample label for each column of exprs matrix. The two columns within the same sample, one for reference allele and the other for non-reference allele, have the same repid. In other words, repid discriminates the different replicates within the same study. |
refid |
The reference allele label for each column of exprs matrix. Please code 0 for reference allele columns and 1 for non-reference allele columns to make the interpretation of over expressed(or bound) to be skewing to the reference allele. Otherwise, just interpret the other way round. |
tol |
The relative tolerance level of error. |
iter.max |
Maximun number of iterations. |
p.study |
The posterior probability for each SNP to be allele-specific event within each study. A matrix where each row corresponds to a SNP and each column corresponds to a study. |
motif.qup |
Fitted values of probability for the reference allele of each SNP to be over expressed (or bound) within each study. A matrix where each row corresponds to a SNP and each column corresponds to a study. |
motif.qdown |
Fitted values of probability for the reference allele of each SNP to be under expressed (or bound) within each study. A matrix where each row corresponds to a SNP and each column corresponds to a study. |
condlike |
A list where each element is a matrix and corresponds to a study. Each row of each matrix corresponds to a SNP. The three column of each matrix represents the posterior probability for a SNP to belong to the null distribution, the over expressed distribution and the under expressed distribution within the given study. |
Yingying Wei
Yingying Wei, Xia Li, Qianfei Wang, Hongkai Ji (2012) iASeq: integrating multiple ChIP-seq datasets for detecting allele-specific binding.
data(sampleASE) singleEM.fitted<-singleEMfit(sampleASE_exprs,sampleASE_studyid,sampleASE_repid, sampleASE_refid,iter.max=2,tol=1e-3)
data(sampleASE) singleEM.fitted<-singleEMfit(sampleASE_exprs,sampleASE_studyid,sampleASE_repid, sampleASE_refid,iter.max=2,tol=1e-3)