Title: | Multiple Beta t-Tests |
---|---|
Description: | MBttest method was developed from beta t-test method of Baggerly et al(2003). Compared to baySeq (Hard castle and Kelly 2010), DESeq (Anders and Huber 2010) and exact test (Robinson and Smyth 2007, 2008) and the GLM of McCarthy et al(2012), MBttest is of high work efficiency,that is, it has high power, high conservativeness of FDR estimation and high stability. MBttest is suit- able to transcriptomic data, tag data, SAGE data (count data) from small samples or a few replicate libraries. It can be used to identify genes, mRNA isoforms or tags differentially expressed between two conditions. |
Authors: | Yuan-De Tan |
Maintainer: | Yuan-De Tan <[email protected]> |
License: | GPL-3 |
Version: | 1.35.0 |
Built: | 2024-10-30 08:19:52 UTC |
Source: | https://github.com/bioc/MBttest |
This package is used to peform multiple beta t-test analyses of real data and gives heatmap of differential expressions of genes or differential splicings. The results listing geneid or isoformid, gene name, the other information, t-value, p-value, adjusted p-value, adjusted alpha value, rho, and symb are saved in csv file.
Package: | MBttest |
Type: | Package |
Version: | 1.0 |
Date: | 2015-01-02 |
License: | GPL-3 |
Yuan-De Tan
Maintainer: Yuan-De Tan [email protected]
Baggerly KA, Deng L, Morris JS, Aldaz CM (2003) Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics, 19: 1477-1483.
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One,10.1371/journal.pone.0123658.
betaparametab
, betaparametVP
, betaparametw
, betattest
, mbetattest
, maplot
, myheatmap
, oddratio
, pratio
, simulat
, smbetattest
, mtprocedure
, mtpvadjust
data(jkttcell) mbetattest(X=jkttcell[1:500,],na=3,nb=3,W=1,alpha=0.05,file="jurkat_NS_48h_tag_mbetattest.csv")
data(jkttcell) mbetattest(X=jkttcell[1:500,],na=3,nb=3,W=1,alpha=0.05,file="jurkat_NS_48h_tag_mbetattest.csv")
parameters alpha(a) and beta (b) in beta distribution are estimated by using an iteration algorithm.
betaparametab(xn, w, P, V)
betaparametab(xn, w, P, V)
xn |
column vector, a set of library sizes. |
w |
column vector, a set of weights |
P |
proportion of counts of a gene or an isoform |
V |
variance for proportions of counts of a gene or an isoform over m replicate libraries in a condition |
return parameters a and b.
Yuan-De Tan [email protected]
Baggerly KA, Deng L, Morris JS, Aldaz CM (2003) Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics 19: 1477-1483.
XX<-c(2000,2000,2000) p<-0.15 V=0.004 w<-c(0.3,0.3,0.3) betaparametab(xn=XX,w=w,P=p,V=V) #[1] 1.145868 6.493254
XX<-c(2000,2000,2000) p<-0.15 V=0.004 w<-c(0.3,0.3,0.3) betaparametab(xn=XX,w=w,P=p,V=V) #[1] 1.145868 6.493254
This function is used to estimate parameters P and V by optimalizing estimation of parameters: alpha and beta.
betaparametVP(X, NX)
betaparametVP(X, NX)
X |
count dataset derived from m replicate libraries in one condition. |
NX |
vector of m library sizes. Library size is sum of counts over the whole library. |
Count data of RNA reads are assumed to follow binomial distribution with parameters (P
) and (V
), while P
is assumed to follow beta distribution with parameters alpha
(a
) and beta
(b
). Parameters P and V
are estimated by optimal estimation of parameters a and b. The optimal method is an iteration method drived by weighting proportion of gene or isoform in each replicate library. This is a large-scale method for estimating these parameters. Estimation of parameters P and V
is core of the multiple beta t-test method because P
and V
will be used to calculate t-value.
return a list:
P |
N proportions estimated. |
V |
N variances estimated. |
betaparametVP requres functions betaparametab and betaparametw.
Yuan-DE Tan [email protected]
Baggerly KA, Deng L, Morris JS, Aldaz CM (2003) Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics, 19: 1477-1483.
Yuan-De Tan, Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One,10.1371/journal.pone.0123658.
data(jkttcell) X<-jkttcell[1:500,] na<-3 nb<-3 cn<-length(X[1,]) rn<-length(X[,1]) XC<-X[,1:(cn-na-nb)] XX<-X[,(cn-na-nb+1):cn] n<-na+nb XA<-XX[,1:na] SA<-apply(XA,2,sum) PA<-betaparametVP(XA,SA)
data(jkttcell) X<-jkttcell[1:500,] na<-3 nb<-3 cn<-length(X[1,]) rn<-length(X[,1]) XC<-X[,1:(cn-na-nb)] XX<-X[,(cn-na-nb+1):cn] n<-na+nb XA<-XX[,1:na] SA<-apply(XA,2,sum) PA<-betaparametVP(XA,SA)
Function betaparametw is used to calculate weight.
betaparametw(xn, a, b)
betaparametw(xn, a, b)
xn |
vector of m library sizes. Library size is sum of counts over the whole library. |
a |
parameter alpha in beta distribution derived from output of function betaparametab |
b |
parameter beta in beta distribution derived from output of function betaparametab |
alpha and beta are used to calculate weight. Then weight is in turn used to correct bias of estimation of alpha and beta in betaparametab function.
return weight(W)
Yuan-De Tan [email protected]
Baggerly KA, Deng L, Morris JS, Aldaz CM (2003) Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics, 19: 1477-1483.
Yuan-De Tan, Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One. 2015 DOI: 10.1371/journal.pone.0123658.
XX<-c(2000,2000,2000) a<-1.1458 b<-6.4932 betaparametw(xn=XX,a=a,b=b) #[1] 0.3333333 0.3333333 0.3333333
XX<-c(2000,2000,2000) a<-1.1458 b<-6.4932 betaparametw(xn=XX,a=a,b=b) #[1] 0.3333333 0.3333333 0.3333333
Beta t-test and degree of freedom for each gene or isoform are calculated in this function.
betattest(X, na, nb)
betattest(X, na, nb)
X |
count data of RNA reads containing N genes (or isoforms). |
na |
number of replicate libraries in condition A |
nb |
number of replicate libraries in condition B |
In beta t-test,
where and
are proportions of a gene or an isoform in conditions A and B,
and
are variances estimated in conditions A and B. They are outputted by betaparametVP.
return two lists:
t |
t-value list. |
df |
df list. df is degree of freedom. |
If pooled standard error is zero, then the t-value is not defined and set to be zero.
Yuan-De Tan [email protected]
Baggerly KA, Deng L, Morris JS, Aldaz CM (2003) Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics, 19: 1477-1483.
Yuan-De Tan, Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One. 2015 DOI: 10.1371/journal.pone.0123658.
data(jkttcell) X<-jkttcell[1:1000,] na<-3 nb<-3 cn<-ncol(X) rn<-nrow(X) XC<-X[,1:(cn-na-nb)] XX<-X[,(cn-na-nb+1):cn] betattest<-betattest(XX,na=3,nb=3)
data(jkttcell) X<-jkttcell[1:1000,] na<-3 nb<-3 cn<-ncol(X) rn<-nrow(X) XC<-X[,1:(cn-na-nb)] XX<-X[,(cn-na-nb+1):cn] betattest<-betattest(XX,na=3,nb=3)
t-value and rho are results ouputed by mbttest.
data("dat")
data("dat")
A data frame with 13409 observations on the following 16 variables.
tagid
a numeric vector
geneid
a numeric vector
name
a string vector
chr
a string vector
strand
a character vector
pos
a numeric vector
anno
a string vector
Jurk.NS.A
a numeric vector
Jurk.NS.B
a numeric vector
Jurk.NS.C
a numeric vector
Jurk.48h.A
a numeric vector
Jurk.48h.B
a numeric vector
Jurk.48h.C
a numeric vector
beta_t
a numeric vector
rho
a numeric vector
symb
a character vector
t-values (beta_t)
and means over all replicate libraries in two conditions are used to make MA plot. The count data of DE isoforms are selected by symb ="+"
and W(omega) and used to make heatmap using myheatmap function.
ID, information, count data of RNA reads,t-value
and rho-value
, symbol.
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One. DOI: 10.1371/journal.pone.0123658.
data(dat) ## maybe str(dat) ; plot(dat) ...
data(dat) ## maybe str(dat) ; plot(dat) ...
The data are transcriptomic count data of RNA reads generated by next generation sequencing from Jurkat T-cells.
data("jkttcell")
data("jkttcell")
A data frame with 13409 observations on the following 13 variables.
tagid
a numeric vector
geneid
a numeric vector
name
a string vector
chr
a string vector
strand
a charactor vector
pos
a numeric vector
anno
a string vector
Jurk.NS.A
a numeric vector
Jurk.NS.B
a numeric vector
Jurk.NS.C
a numeric vector
Jurk.48h.A
a numeric vector
Jurk.48h.B
a numeric vector
Jurk.48h.C
a numeric vector
The data are count data generated by next generation sequencing from Jurkat T-cells. The T-cells were treated by resting and stimulating with CD3/CD28 for 48 hours. The data have 7 columns for the information of poly(A) site: tagid, geneid, gene name, chromosome, strand,poly(A) site position, poly(A) site annotation and 6 columns for data: Jurk.NS.A
, Jurk.NS.B
, Jurk.NS.C
, Jurk.48h.A
, Jurk.48h.B
, Jurk.48h.C
. where NS means Normal state and 48h means 48 hours after CD3/CD28 stimulatuin of T-cells. 13409 RNA isoforms were detected to have alternative poly(A) sites.
ID, information, count data of RNA reads
Real transcriptomic count data
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One. DOI: 10.1371/journal.pone.0123658.
data(jkttcell) ## maybe str(jkttcell) ; plot(jkttcell) ...
data(jkttcell) ## maybe str(jkttcell) ; plot(jkttcell) ...
This function is to display MA plot of t-value against log mean.
maplot(dat, r1, r2, TT, matitle)
maplot(dat, r1, r2, TT, matitle)
dat |
object outputted by mbetattest containing data ordered by absolution of t-value and rho ( |
r1 |
number of replicate libraries in condition 1. |
r2 |
number of replicate libraries in condition 2. |
TT |
a numeric parameter that gives truncate value of t-values. |
matitle |
string for MA plot title. |
In MA plot, t-value is in y-axis and log mean in x-axis; Black points gathered nearby zero along log mean are genes without differential expressions or differential splicings while red points scattered out of black points are those of being differentially expressed or differentially spliced.
no return value
Yuan-De Tan [email protected]
data(dat) maplot(dat=dat,r1=3,r2=3,TT=350,matitle="MA plot") maplot(dat=dat,r1=3,r2=3,TT=50,matitle="MA plot")
data(dat) maplot(dat=dat,r1=3,r2=3,TT=350,matitle="MA plot") maplot(dat=dat,r1=3,r2=3,TT=50,matitle="MA plot")
This function is to peform multiple beta t-test method on real data. The result lists geneid or isoformid, gene name, the other information, t-value, p-value, adjusted p-value, adjusted alpha value, rho (), and symb. All these lists are ordered by absolution of t-values.
mbetattest(X, na, nb, W, alpha=0.05, file)
mbetattest(X, na, nb, W, alpha=0.05, file)
X |
count data of RNA reads with na replicates in condition A ans nb replicates in condition B. |
na |
number of replicate libraries in condition A. |
nb |
number of replicate libraries in condition B. |
W |
numeric parameter, called omega ( |
alpha |
the probabilistic threshold. User can set alpha ( |
file |
a csv file. User needs to give file name and specify direction path. But if user uses setwd function, drive is not necessarily specified in file. |
t-statistic is defined as t-statistic multiplied by (rho/omega), that is,
where
where
is a constant as threshold estimated from null data.
return a dat list: the data ordered by abs(t) contain information cloumns, data columns, t-values, rho and symb that are used to make heatmap and MAplot.
Yuan-De Tan [email protected]
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One, 10.1371/journal.pone.0123658.
data(jkttcell) dat<-mbetattest(X=jkttcell[1:1000,],na=3,nb=3,W=1,alpha=0.05,file="jurkat_NS_48h_tag_mbetattest.csv")
data(jkttcell) dat<-mbetattest(X=jkttcell[1:1000,],na=3,nb=3,W=1,alpha=0.05,file="jurkat_NS_48h_tag_mbetattest.csv")
Similiar to Benjamini-Hochberg multiple-test procedure, alpha is adjusted to be a set of values.
mtprocedure(alpha, N, C)
mtprocedure(alpha, N, C)
alpha |
probabilistic threshold and is usually set to be 0.05 or 0.01. Default value is 0.05 |
N |
numeric constant, number of genes to be detected in transcriptome. |
C |
numeric constant, it can be taken from 0 to N. C is used to choose multiple-test procedure. Default value is 0.01. This procedure is single test with C=0, Benjamini-Hochberg procedure with C=1.22 and Bonfroni procedure with C=N. |
This is a multiple-test procedure family including Benjamini-Hochberg procedure, Bonferroni procedure and single-test procedure. By choosing C-value, it can generat a multiple-test procedure for controling the false discovery rate, the expected proportion of false discoveries amongst the rejected hypotheses.
return a list of adjusted alpha values.
Yuan-De Tan [email protected]
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.
Yuan-De Tan and Hongyan Xu A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. Bioinformatics (2014) 30 (14): 2018-2025. doi: 10.1093/bioinformatics/btu124.
mtprocedure(alpha=0.5,N=200,C=1.22) # [1] 0.007501404 0.011906423 0.015914688 0.019682621 0.023284917 0.026763656 # [7] 0.030145311 0.033447843 0.036684127 0.039863779 0.042994217 0.046081313 # ..... #[175] 0.444073506 0.446322519 0.448570478 0.450817390 0.453063265 0.455308110 #[181] 0.457551933 0.459794741 0.462036542 0.464277343 0.466517153 0.468755977 #[187] 0.470993825 0.473230701 0.475466614 0.477701571 0.479935578 0.482168642 #[193] 0.484400770 0.486631969 0.488862244 0.491091603 0.493320052 0.495547597 #[199] 0.497774244 0.500000000
mtprocedure(alpha=0.5,N=200,C=1.22) # [1] 0.007501404 0.011906423 0.015914688 0.019682621 0.023284917 0.026763656 # [7] 0.030145311 0.033447843 0.036684127 0.039863779 0.042994217 0.046081313 # ..... #[175] 0.444073506 0.446322519 0.448570478 0.450817390 0.453063265 0.455308110 #[181] 0.457551933 0.459794741 0.462036542 0.464277343 0.466517153 0.468755977 #[187] 0.470993825 0.473230701 0.475466614 0.477701571 0.479935578 0.482168642 #[193] 0.484400770 0.486631969 0.488862244 0.491091603 0.493320052 0.495547597 #[199] 0.497774244 0.500000000
Given a set of N p-values, it returns a set of N p-values adjusted by choosing C-value
mtpvadjust(pv, C)
mtpvadjust(pv, C)
pv |
numeric vector of p-values. |
C |
numeric constant, the value can be taken from any number > 0 or equal to 0. C is used to choose multiple-test procedure. |
This is a multiple-test procedure family including Benjamini-Hochberg procedure, Bonferroni procedure and single-test procedure. By choosing C-value, it can generate a multiple-test procedure for controling the false discovery rate, the expected proportion of false discoveries amongst the rejected hypotheses. Benjamini-Hochberg procedure is given with C=1.22, Bonferroni procedure is given with C = N and single-test procedure can be given with C=0.
return a list of adjusted p-values.
p-value must be ordered from the largest value to the smallest value before executing tan_pvadjust.
Yuan-De Tan [email protected]
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.
Yuan-De Tan and Hongyan Xu A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. Bioinformatics (2014) 30 (14): 2018-2025. doi: 10.1093/bioinformatics/btu124.
set.seed(123) x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25))) p <- 2*pnorm(sort(-abs(x))) round(mtpvadjust(pv=p, C=1.22),4) # [1] 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 #[11] 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.6875 0.6174 0.4588 #[21] 0.4115 0.3644 0.2216 0.1554 0.1443 0.1249 0.1027 0.0964 0.0763 0.0319 #[31] 0.0166 0.0135 0.0123 0.0096 0.0091 0.0068 0.0045 0.0041 0.0020 0.0007 #[41] 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000
set.seed(123) x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25))) p <- 2*pnorm(sort(-abs(x))) round(mtpvadjust(pv=p, C=1.22),4) # [1] 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 #[11] 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.6875 0.6174 0.4588 #[21] 0.4115 0.3644 0.2216 0.1554 0.1443 0.1249 0.1027 0.0964 0.0763 0.0319 #[31] 0.0166 0.0135 0.0123 0.0096 0.0091 0.0068 0.0045 0.0041 0.0020 0.0007 #[41] 0.0004 0.0003 0.0002 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000
This function is used to display heatmap of differential expressions of genes or isoforms or differential splicings of genes detected by the multiple beta t-test method in the real data.
myheatmap(dat, r1, r2, W, colrs, tree, method, rwangle, clangle, maptitle)
myheatmap(dat, r1, r2, W, colrs, tree, method, rwangle, clangle, maptitle)
dat |
data outputted by mbetattest, includes information columns, data columns, t-value, rho and symbol columns; |
r1 |
numeric argument: number of replicate libraries in condition 1. |
r2 |
numeric argument: number of replicate libraries in condition 2 |
W |
numeric argument: threshold for choosing genes or isoforms for heatmap. W value can be set to be 0 to any large number. If user sets W = 0, then the function will select all differentially expressed genes with symb="+". To choose a appropriate W, user needs to refere to rho values in the result file. Default W=1. |
colrs |
heatmap colors. User has 5 options: "redgreen", "greenred", "redblue", "bluered" and "heat.colors". Default colrs="redgreen". |
tree |
object of heatmap. User has four options: "both" for row and column trees,"row" for only row tree,"column" for only column tree, and "none" for no tree specified. Default tree="both". |
method |
method to be chosen to calculate distance between columns or rows. It has four options: "euclidean", "pearson","spearman" and "kendall". The latter three are d=1-cc where cc is correlation coefficients. Default="euclidean". |
rwangle |
angle of xlab under heatmap. Default value is 30. |
clangle |
angle of ylab. Default value is 30 |
maptitle |
string for heatmap title. |
This function uses W (omega) and "symb" to choose genes or isoforms in the data ordered by t-values and then to normalize the selected data by using z-scale. This function has multiple options to select map color, distance, cluster and x- and y-lab angles. The heatmap was designed for publication and presentation, that is, zoom of the figure can be reduced without impacting solution.
no return value but create a heatmap.
myheatmap requres gplots
Yuan-De Tan [email protected]
#require(gplots) data(dat) #dat<-mbetattest(X=jkttcell,na=3,nb=3,W=1,alpha=0.05, #file="C:/mBeta_ttest/R_package/jurkat_NS_48h_tag_mbetattest.csv") # data(mtcars) #x <-as.matrix(mtcars) #myheatmap(dat=x,r1=3,r2=3, maptitle="mtcars_heatmap") myheatmap(dat=dat,r1=3,r2=3,maptitle="Jurkat T-cell heatmap2") myheatmap(dat=dat,r1=3,r2=3,tree="none",maptitle="Jurkat T-cell heatmap")
#require(gplots) data(dat) #dat<-mbetattest(X=jkttcell,na=3,nb=3,W=1,alpha=0.05, #file="C:/mBeta_ttest/R_package/jurkat_NS_48h_tag_mbetattest.csv") # data(mtcars) #x <-as.matrix(mtcars) #myheatmap(dat=x,r1=3,r2=3, maptitle="mtcars_heatmap") myheatmap(dat=dat,r1=3,r2=3,maptitle="Jurkat T-cell heatmap2") myheatmap(dat=dat,r1=3,r2=3,tree="none",maptitle="Jurkat T-cell heatmap")
)
Zeta () is used to measure homogeneity intensity of two subdatasets. If
, these two subdatasets have good homogeneity; otherwise,
indicates that two subdatasets have poor homogeneity (big noise).
oddratio(XX, na, nb)
oddratio(XX, na, nb)
XX |
count data of RNA reads generated by next generation sequencing. |
na |
number of replicate libraries in condition A. |
nb |
number of replicate libraries in condition B. |
Zeta is defined as
where is different from
. If two subdatasets have big a gap and good homogeneity, then
value has much larger than 1.
oddrat |
list of zeta values |
Yuan-De Tan [email protected]
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One. 2015 DOI: 10.1371/journal.pone.0123658.
XX<-matrix(NA,2,8) XX[1,]<-c(112,122, 108,127,302, 314, 322, 328) XX[2,]<-c(511, 230, 754, 335,771, 842, 1014,798) #XX # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #[1,] 112 122 108 127 302 314 322 328 #[2,] 511 230 754 335 771 842 1014 798 oddratio(XX=XX,na=4,nb=4) #[1] 3.9432676 0.8762017 # see example in mbetattest
XX<-matrix(NA,2,8) XX[1,]<-c(112,122, 108,127,302, 314, 322, 328) XX[2,]<-c(511, 230, 754, 335,771, 842, 1014,798) #XX # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #[1,] 112 122 108 127 302 314 322 328 #[2,] 511 230 754 335 771 842 1014 798 oddratio(XX=XX,na=4,nb=4) #[1] 3.9432676 0.8762017 # see example in mbetattest
)
Psi is also called polar ratio.
.
pratio(xx, na, nb)
pratio(xx, na, nb)
xx |
count data of RNA reads generated by next generation sequencing. |
na |
number of replicate libraries in condition A. |
nb |
number of replicate libraries in condition B. |
Psi is defined as
It is used to measure overlap of two subdatasets. , these two subdatasets have a gap, not overlap.
indicates that two subdatasets overlap.
pratio |
pratio list |
Yuan-De Tan [email protected]
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One. 2015 DOI: 10.1371/journal.pone.0123658.
XX<-matrix(NA,2,8) XX[1,]<-c(112,122, 108,127,302, 314, 322, 328) XX[2,]<-c(511, 230, 754, 335,771, 842, 1014,798) #XX # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #[1,] 112 122 108 127 302 314 322 328 #[2,] 511 230 754 335 771 842 1014 798 pratio(xx=XX,na=4,nb=4)
XX<-matrix(NA,2,8) XX[1,]<-c(112,122, 108,127,302, 314, 322, 328) XX[2,]<-c(511, 230, 754, 335,771, 842, 1014,798) #XX # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #[1,] 112 122 108 127 302 314 322 328 #[2,] 511 230 754 335 771 842 1014 798 pratio(xx=XX,na=4,nb=4)
This function uses negative binomial (NB) pseudorandom generator to create any count datasets of RNA isoform reads based on real data.
simulat(yy, nci, r1, r2, p, q, A)
simulat(yy, nci, r1, r2, p, q, A)
yy |
real count data |
nci |
numeric argument: column number of information related to genes or isoforms. |
r1 |
numeric argument: number of replicate libraries in condition 1. |
r2 |
numeric argument: number of replicate libraries in condition 2. |
p |
numeric argument: proportion of genes or isoforms differentially expressed. The value is in range of 0 ~1. Default value is 0. |
q |
numeric argument: proportion of genes or isoforms artificially noised. The value is in range of 0 ~1. Default value is 0. |
A |
numeric argument: conditional effect value. The value is larger than or equal to 0. Default value is 0. |
Null count data are created by using R negative binomial pseudorandom generator rnbinom with mu and size. Parameters mu and size are given by mean and variance drawn from real read counts of a gene or an isoforms in a condition. Condition (or treatment) effect on differential transcription of isoforms is linearly and randomly assigned to genes or isoforms. The conditional effect = AU where U is uniform variable and A is an input constant. P percent of genes or isoforms are set to be differentially expressed or differentially spliced. Q percent of genes or isoforms have technical noise. If P = 0, then simulation is null simulation, the data are null data or baseline data.
Return count data.
Yuan-De Tan [email protected]
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One, 10.1371/journal.pone.0123658.
data(jkttcell) jknull<-simulat(yy=jkttcell[1:500,],nci=7,r1=3,r2=3,p=0,q=0.2,A=0)
data(jkttcell) jknull<-simulat(yy=jkttcell[1:500,],nci=7,r1=3,r2=3,p=0,q=0.2,A=0)
The dataset generated by using R negative binomial pseudorandom generator rnbinom is used as an example for calculating omega.
data("skjt")
data("skjt")
A data frame with 13409 observations on the following 14 variables.
geneid
a string vector
tagid
a numeric vector
geneid.1
a numeric vector
name
a string vector
chr
a string vector
strand
a character vector
pos
a numeric vector
anno
a string vector
Jurk.NS.A
a numeric vector
Jurk.NS.B
a numeric vector
Jurk.NS.C
a numeric vector
Jurk.48h.A
a numeric vector
Jurk.48h.B
a numeric vector
Jurk.48h.C
a numeric vector
The dataset skjt was generated by using R negative binomial pseudorandom generator rnbinom with mu and size. Parameters mu and size are given by mean and variance drawn from real Jurkat T cell transcriptomic count data . Condition (or treatment) effect on differential transcription of isoforms was set to zero. The data have 13409 genes and 7 information columns: geneid tagid name chr
,strand
,pos
,anno
, and 6 data columns: Jurk.NS.A
,Jurk.NS.B
,Jurk.NS.C
,Jurk.48h.A
,Jurk.48h.B
,Jurk.48h.C
.
ID, information, count data of RNA reads
Simulation.
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis. Plos One. DOI: 10.1371/journal.pone.0123658.
data(skjt) ## maybe str(skjt) ; plot(skjt) ...
data(skjt) ## maybe str(skjt) ; plot(skjt) ...
This function is to peform mBeta t-test with rho=1 and omega=1 on simulated data. The result lists differentially expressed genes or isoforms marked by symbol="+" and their rho values. The rho values are used to calculate omega value for performance of mBeta t-tests on the real data.
smbetattest(X, na, nb, alpha)
smbetattest(X, na, nb, alpha)
X |
simulated count data with N genes or isoforms. |
na |
number of replicate libraries in condition A. |
nb |
number of replicate libraries in condition B. |
alpha |
statistical probabilistic threshold, default value is 0.05. |
Before performing mbeta t-test on real data, user needs omega (w) value for the threshold of rho(). To determine omega value, user is requred to simulate null data having the same gene or isoform number and the same numbers of replicate libraries in two conditions and then performs mbeta t-test on the simulated null data by setting rho =1 and omega =1. To calculate accurately omega value, user needs such performance on 4-6 simulated null
datasets. Manual provides method for omega calculation.
Return results from multple beta t-tests on simulated data.
Yuan-De Tan [email protected]
Yuan-De Tan Anita M. Chandler, Arindam Chaudhury, and Joel R. Neilson(2015) A Powerful Statistical Approach for Large-scale Differential Transcription Analysis.Plos One,10.1371/journal.pone.0123658.
See Also as mbetattest
data(skjt) mysim<-smbetattest(X=skjt[1:500,],na=3,nb=3,alpha=0.05)
data(skjt) mysim<-smbetattest(X=skjt[1:500,],na=3,nb=3,alpha=0.05)