Title: | Global Test for Counts |
---|---|
Description: | The method may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size. Useful for testing for association between RNA-Seq and high-dimensional data. |
Authors: | Armin Rauschenberger [aut, cre] |
Maintainer: | Armin Rauschenberger <[email protected]> |
License: | GPL-3 |
Version: | 1.35.0 |
Built: | 2024-12-01 05:25:55 UTC |
Source: | https://github.com/bioc/globalSeq |
Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter.
Using the negative binomial distribution and a random effects model, we developed an omnibus test that overcomes both difficulties. It may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size.
The proposed method can detect genetic and epigenetic alterations that affect gene expression. It can examine complex regulatory mechanisms of gene expression.
omnibus
tests entire covariate sets proprius
shows individual contributions cursus
analyses the whole genome
The following command opens the vignette: utils::vignette("globalSeq")
A Rauschenberger, MA Jonker, MA van de Wiel, and RX Menezes (2016).
"Testing for association between RNA-Seq and high-dimensional data",
BMC Bioinformatics. 17:118.
html
pdf
(open access)
This function tests for associations between gene expression
or exon abundance (Y
)
and genetic or epigenetic alterations (X
).
Using the locations of genes (Yloc
),
and the locations of genetic
or epigenetic alterations (Xloc
),
the expression of each gene is tested for associations with
alterations on the same chromosome that are closer to the gene
than a given distance (window
).
cursus(Y, Yloc, X, Xloc, window, Ychr = NULL, Xchr = NULL, offset = NULL, group = NULL, perm = 1000, nodes = 2, phi = NULL, kind = 0.01)
cursus(Y, Yloc, X, Xloc, window, Ychr = NULL, Xchr = NULL, offset = NULL, group = NULL, perm = 1000, nodes = 2, phi = NULL, kind = 0.01)
Y |
RNA-Seq data:
numeric matrix with |
Yloc |
location RNA-Seq:
numeric vector of length |
X |
genomic profile:
numeric matrix with |
Xloc |
location covariates:
numeric vector of length |
window |
maximum distance: non-negative real number |
Ychr |
chromosome RNA-Seq:
factor of length |
Xchr |
chromosome covariates:
factor of length |
offset |
numeric vector of length |
group |
confounding variable:
factor of length |
perm |
number of iterations: positive integer |
nodes |
number of cluster nodes for parallel computation |
phi |
dispersion parameters: vector of length |
kind |
computation : number between 0 and 1 |
Note that Yloc
, Xloc
and window
must
be given in the same unit, usually in base pairs.
If Yloc
indicates interval locations,
and window
is zero,
then only covariates between the start and end location
of the gene are of interest.
Typically window
is larger than one million base pairs.
If Y
and X
include data from a single chromosome,
Ychr
and Xchr
are redundant.
If Y
or X
include data
from multiple chromosomes,
Ychr
and Xchr
should be specified
in order to prevent confusion between chromosomes.
For the simultaneous analysis of
multiple genomic profiles
X
should be a list of numeric matrices with
n
columns (samples),
Xloc
a list of numeric vectors,
and window
a list of non-negative real numbers.
If provided, Xchr
should be alist of of numeric vectors.
The offset
is meant to account for
different libary sizes.
By default the offset
is calculated based on Y
.
Different library sizes can be ignored by
setting the offset
to rep(1,n)
.
The user can provide the confounding variable group
.
Note that each level of group
must appear at least twice
in order to allow stratified permutations.
Efficient alternatives to classical permutation (kind=1
)
are the method of control variates (kind=0
)
and permutation in chunks (0 < kind
< 1)
details.
The function returns a dataframe, with the p-values in the first row and the test statistics in the second row.
A Rauschenberger, MA Jonker, MA van de Wiel, and RX Menezes (2016). "Testing for association between RNA-Seq and high-dimensional data", BMC Bioinformatics. 17:118. html pdf (open access)
RX Menezes, M Boetzer, M Sieswerda, GJB van Ommen, and JM Boer (2009). "Integrated analysis of DNA copy number and gene expression microarray data using gene sets", BMC Bioinformatics. 10:203. html pdf (open access)
The function omnibus
tests for associations
between an overdispersed response variable
and a high-dimensional covariate set.
The function proprius
calculates the contributions
of individual samples or covariates to the test statistic.
All other function of the R package
globalSeq
are internal
.
# simulate high-dimensional data n <- 30; q <- 10; p <- 100 Y <- matrix(rnbinom(q*n,mu=10, size=1/0.25),nrow=q,ncol=n) X <- matrix(rnorm(p*n),nrow=p,ncol=n) Yloc <- seq(0,1,length.out=q) Xloc <- seq(0,1,length.out=p) window <- 1 # hypothesis testing cursus(Y,Yloc,X,Xloc,window)
# simulate high-dimensional data n <- 30; q <- 10; p <- 100 Y <- matrix(rnbinom(q*n,mu=10, size=1/0.25),nrow=q,ncol=n) X <- matrix(rnorm(p*n),nrow=p,ncol=n) Yloc <- seq(0,1,length.out=q) Xloc <- seq(0,1,length.out=p) window <- 1 # hypothesis testing cursus(Y,Yloc,X,Xloc,window)
Test of association between a count response and
one or more covariate sets.
This test may be conceptualised as
a test of overall significance in regression analysis,
where the response variable is overdispersed, and where
the number of explanatory variables (p
)
exceeds the sample size (n
).
The negative binomial distribution accounts for overdispersion
and a random effect model accounts for high dimensionality
(p
>>n
).
omnibus(y, X, offset = NULL, group = NULL, mu = NULL, phi = NULL, perm = 1000, kind = 1)
omnibus(y, X, offset = NULL, group = NULL, mu = NULL, phi = NULL, perm = 1000, kind = 1)
y |
response variable:
numeric vector of length |
X |
one covariate set:
numeric matrix with |
offset |
numeric vector of length |
group |
confounding variable:
factor of length |
mu |
mean parameters:
numeric vector of length |
phi |
dispersion parameter: non-negative real number |
perm |
number of iterations: positive integer |
kind |
computation : number between 0 and 1 |
The user can provide a common mu
for all samples
or sample-specific mu
, and a common phi
.
Setting phi
equal to zero is equivalent
to using the Poisson model.
If mu
is missing, then mu
is estimated from y
.
If phi
is missing, then mu
and phi
are estimated from y
.
The offset
is only taken into account
for estimating mu
or phi
.
By default the offset is rep(1,n)
.
The user can provide the confounding variable group
.
Note that each level of group
must appear at least twice
in order to allow stratified permutations.
Efficient alternatives to classical permutation (kind=1
)
are the method of control variates (kind=0
)
and permutation in chunks (0 < kind
< 1)
details.
The function returns a dataframe, with the p-value in the first column, and the test statistic in the second column.
A Rauschenberger, MA Jonker, MA van de Wiel, and RX Menezes (2016). "Testing for association between RNA-Seq and high-dimensional data", BMC Bioinformatics. 17:118. html pdf (open access)
RX Menezes, L Mohammadi, JJ Goeman, and JM Boer (2016). "Analysing multiple types of molecular profiles simultaneously: connecting the needles in the haystack", BMC Bioinformatics. 17:77. html pdf (open access)
S le Cessie, and HC van Houwelingen (1995). "Testing the fit of a regression model via score tests in random effects models", Biometrics. 51:600-614. html pdf (restricted access)
The function proprius
calculates
the contributions of individual samples or covariates
to the test statistic.
The function cursus
tests for association
between RNA-Seq and local genetic or epigenetic alternations
across the whole genome.
All other functions of the R package globalSeq
are internal
.
# simulate high-dimensional data n <- 30; p <- 100 y <- rnbinom(n,mu=10,size=1/0.25) X <- matrix(rnorm(n*p),nrow=n,ncol=p) # hypothesis testing omnibus(y,X)
# simulate high-dimensional data n <- 30; p <- 100 y <- rnbinom(n,mu=10,size=1/0.25) X <- matrix(rnorm(n*p),nrow=n,ncol=p) # hypothesis testing omnibus(y,X)
Even though the function omnibus
tests
a single hypothesis on a whole covariate set,
this function allows to calculate
the individual contributions of n
samples or
p
covariates to the test statistic.
proprius(y, X, type, offset = NULL, group = NULL, mu = NULL, phi = NULL, alpha = NULL, perm = 1000, plot = TRUE)
proprius(y, X, type, offset = NULL, group = NULL, mu = NULL, phi = NULL, alpha = NULL, perm = 1000, plot = TRUE)
y |
response variable:
numeric vector of length |
X |
covariate set:
numeric matrix with |
type |
character 'covariates' or 'samples' |
offset |
numeric vector of length |
group |
confounding variable:
factor of length |
mu |
mean parameters:
numeric vector of length |
phi |
dispersion parameter: non-negative real number |
alpha |
significance level: real number between 0 and 1 |
perm |
number of iterations: positive integer |
plot |
plot of results: logical |
The user can provide a common mu
for all samples
or sample-specific mu
, and a common phi
.
Setting phi
equal to zero
is equivalent to using the Poisson model.
If mu
is missing, then mu
is estimated from y
.
If phi
is missing, then mu
and phi
are estimated from y
.
The offset
is only taken into account
for estimating mu
or phi
.
The user can provide the confounding variable group
.
Note that each level of group
must appear at least twice
in order to allow stratified permutations.
If alpha=NULL
, then the function returns a numeric vector,
and else a list of numeric vectors.
A Rauschenberger, MA Jonker, MA van de Wiel, and RX Menezes (2016). "Testing for association between RNA-Seq and high-dimensional data", BMC Bioinformatics. 17:118. html pdf (open access)
JJ Goeman, SA van de Geer, F de Kort, and HC van Houwelingen (2004). "A global test for groups of genes: testing association with a clinical outcome", Bioinformatics. 20:93-99. html pdf (open access)
The function omnibus
tests for associations
between an overdispersed response variable and a high-dimensional
covariate set.
The function cursus
tests for association
between RNA-Seq and local genetic or epigenetic alternations
across the whole genome.
All other functions of the R package globalSeq
are internal
.
# simulate high-dimensional data n <- 30; p <- 100 y <- rnbinom(n,mu=10,size=1/0.25) X <- matrix(rnorm(n*p),nrow=n,ncol=p) # decomposition proprius(y,X,type="samples") proprius(y,X,type="covariates")
# simulate high-dimensional data n <- 30; p <- 100 y <- rnbinom(n,mu=10,size=1/0.25) X <- matrix(rnorm(n*p),nrow=n,ncol=p) # decomposition proprius(y,X,type="samples") proprius(y,X,type="covariates")