Title: | Significant Gene Expression Profile Differences in Time Course Gene Expression Data |
---|---|
Description: | maSigPro is a regression based approach to find genes for which there are significant gene expression profile differences between experimental groups in time course microarray and RNA-Seq experiments. |
Authors: | Ana Conesa and Maria Jose Nueda |
Maintainer: | Maria Jose Nueda <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.79.0 |
Built: | 2024-12-18 03:08:04 UTC |
Source: | https://github.com/bioc/maSigPro |
average.rows
matches rownames of a matrix to a match
vector and performs averaging of the rows
by the index provided by an index
vector.
average.rows(x, index, match, r = 0.7)
average.rows(x, index, match, r = 0.7)
x |
a matrix |
index |
index vector indicating how rows must be averaged |
match |
match vector for indexing rows |
r |
minimal correlation value between rows to compute average |
rows will be averaged only if the pearson correlation coefficient between all rows of each given index is greater than r. If not, that group of rows is discarded in the result matrix.
a matrix of averaged rows
Ana Conesa and Maria Jose Nueda, [email protected]
## create data matrix for row averaging x <- matrix(rnorm(30), nrow = 6,ncol = 5) rownames(x) <- paste("ID", c(1, 2, 11, 12, 19, 20), sep = "") i <- paste("g", rep(c(1:10), each = 2), sep = "") # index vector m <- paste("ID", c(1:20), sep = "") # match vector average.rows(x, i, m, r = 0)
## create data matrix for row averaging x <- matrix(rnorm(30), nrow = 6,ncol = 5) rownames(x) <- paste("ID", c(1, 2, 11, 12, 19, 20), sep = "") i <- paste("g", rep(c(1:10), each = 2), sep = "") # index vector m <- paste("ID", c(1:20), sep = "") # match vector average.rows(x, i, m, r = 0)
data.abiotic
contains gene expression of a time course microarray experiment where potato plants were submitted to 3 different abiotic stresses.
data(data.abiotic)
data(data.abiotic)
A data frame with 1000 observations on the following 36 variables.
Control_3H_1
a numeric vector
Control_3H_2
a numeric vector
Control_3H_3
a numeric vector
Control_9H_1
a numeric vector
Control_9H_2
a numeric vector
Control_9H_3
a numeric vector
Control_27H_1
a numeric vector
Control_27H_2
a numeric vector
Control_27H_3
a numeric vector
Cold_3H_1
a numeric vector
Cold_3H_2
a numeric vector
Cold_3H_3
a numeric vector
Cold_9H_1
a numeric vector
Cold_9H_2
a numeric vector
Cold_9H_3
a numeric vector
Cold_27H_1
a numeric vector
Cold_27H_2
a numeric vector
Cold_27H_3
a numeric vector
Heat_3H_1
a numeric vector
Heat_3H_2
a numeric vector
Heat_3H_3
a numeric vector
Heat_9H_1
a numeric vector
Heat_9H_2
a numeric vector
Heat_9H_3
a numeric vector
Heat_27H_1
a numeric vector
Heat_27H_2
a numeric vector
Heat_27H_3
a numeric vector
Salt_3H_1
a numeric vector
Salt_3H_2
a numeric vector
Salt_3H_3
a numeric vector
Salt_9H_1
a numeric vector
Salt_9H_2
a numeric vector
Salt_9H_3
a numeric vector
Salt_27H_1
a numeric vector
Salt_27H_2
a numeric vector
Salt_27H_3
a numeric vector
This data set is part of a larger experiment in wich gene expression was monitored in both roots and leaves using a 11K cDNA potato chip. This example data set contains a ramdom subset of 1000 genes of the leave study.
Rensink WA, Iobst S, Hart A, Stegalkina S, Liu J, Buell CR. Gene expression profiling of potato responses to cold, heat, and salt stress. Funct Integr Genomics. 2005 Apr 22.
data(data.abiotic)
data(data.abiotic)
edesign.abiotic
contains experimental set up of a time course microarray experiment where potato plants were submitted to 3 different abiotic stresses.
data(edesign.abiotic)
data(edesign.abiotic)
A matrix with 36 rows and 6 columns
rows [1:36] "Control 3h 1" "Control 3h 2" "Control 3h 3" "Control 9h 1" ...
columns [1:6] "Time" "Replicates" "Control" "Cold" "Heat" "Salt"
Arrays are given in rows and experiment descriptors are given in columns. Row names contain array names.
"Time"
indicates the values that variable Time takes in each
hybridization.
"Replicates"
is an index indicating replicate hyridizations,
i.e. hybridizations are numbered, giving replicates the same number.
"Control"
, "Cold"
, "Heat"
and "Salt"
columns
indicate array assigment to experimental groups, coding with 1 and 0
whether each array belongs to that group or not.
Rensink WA, Iobst S, Hart A, Stegalkina S, Liu J, Buell CR. Gene expression profiling of potato responses to cold, heat, and salt stress. Funct Integr Genomics. 2005 Apr 22.
data(edesignCR)
data(edesignCR)
edesignCT
contains the experimental set up of a time course
microarray experiment where there is a common starting point for the
different experimental groups.
data(edesignCT)
data(edesignCT)
A matrix with 32 rows and 7 colums
rows [1:32] "Array1" "Array2" "Array3" "Array4" ...
columns [1:7] "Time" "Replicates" "Control" "Tissue1" "Tissue2" "Tissue3" "Tissue4"
Arrays are given in rows and experiment descriptors are given in columns. Row names contain array names.
"Time"
indicates the values that variable Time takes in each
hybridization. There are 4 time points, which allows an up to 3 degree
regression polynome.
"Replicates"
is an index indicating replicate hyridizations, i.e. hybridizations are numbered, giving replicates the same number.
"Control"
, "Tissue1"
, "Tissue2"
, "Tissue3"
and
"Tissue4"
columns indicate array assigment to experimental groups,
coding with 1 and 0 whether each array belongs to that group or not.
data(edesignCT)
data(edesignCT)
edesignDR
contains experimental set up of a replicated time course microarray experiment where rats were submitted to 3 different dosis of a toxic compound. A control and an placebo treatments are also present in the experiment.
data(edesignDR)
data(edesignDR)
A matrix with 54 rows and 7 columns
rows [1:54] "Array1" "Array2" "Array3" "Array4" ...
columns [1:7] "Time" "Replicates" "Control" "Placebo" "Low" "Medium" "High"
Arrays are given in rows and experiment descriptors are given in columns. Row names contain array names.
"Time"
indicates the values that variable Time takes in each hybridization.
"Replicates"
is an index indicating replicate hyridizations, i.e. hybridizations are numbered, giving replicates the same number.
"Control"
, "Placebo"
, "Low"
, "Medium"
and "High"
columns indicate array assigment to experimental groups, coding with 1 and 0 whether each array belongs to that group or not.
Heijne, W.H.M.; Stierum, R.; Slijper, M.; van Bladeren P.J. and van Ommen B.(2003). Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. Biochemical Pharmacology 65 857-875.
data(edesignDR)
data(edesignDR)
This function creates lists of significant genes for a set of variables whose significance value has been computed with the T.fit
function.
get.siggenes(tstep, rsq = 0.7, add.IDs = FALSE, IDs = NULL, matchID.col = 1, only.names = FALSE, vars = c("all", "each", "groups"), significant.intercept = "dummy", groups.vector = NULL, trat.repl.spots = "none", index = IDs[, (matchID.col + 1)], match = IDs[, matchID.col], r = 0.7)
get.siggenes(tstep, rsq = 0.7, add.IDs = FALSE, IDs = NULL, matchID.col = 1, only.names = FALSE, vars = c("all", "each", "groups"), significant.intercept = "dummy", groups.vector = NULL, trat.repl.spots = "none", index = IDs[, (matchID.col + 1)], match = IDs[, matchID.col], r = 0.7)
tstep |
a |
rsq |
cut-off level at the R-squared value for the stepwise regression fit. Only genes with R-squared more than rsq are selected |
add.IDs |
logical indicating whether to include additional gene id's in the result |
IDs |
matrix contaning additional gene id information (required when |
matchID.col |
number of matching column in matrix IDs for adding genes ids |
only.names |
logical. If |
vars |
variables for which to extract significant genes (see details) |
significant.intercept |
experimental groups for which significant intercept coefficients are considered (see details) |
groups.vector |
required when |
trat.repl.spots |
treatment given to replicate spots. Possible values are |
index |
argument of the |
match |
argument of the |
r |
minimun pearson correlation coefficient for replicated spots profiles to be averaged |
There are 3 possible values for the vars argument:
"all"
: generates one single matrix or gene list with all
significant genes.
"each"
: generates as many significant genes extractions as
variables in the general regression model. Each extraction contains the
significant genes for that variable.
"groups"
: generates a significant genes extraction for each
experimental group.
The difference between "each"
and "groups"
is that in the
first case the variables of the same group (e.g. "TreatmentA"
and
"time*TreatmentA"
) will be extracted separately and in the second
case jointly.
When add.IDs
is TRUE
, a matrix of gene ids must be provided
as argument of IDs, the matchID.col
column of which having same levels as in the row names of
sig.profiles
. The option only.names
is TRUE
will
generate a vector of significant genes or a matrix when add.IDs
is
set also to TRUE
.
When trat.repl.spots
is "average"
, match
and index
vectors are required for the average.rows
function.
In gene expression data context, the index
vector would contain geneIDs and indicate which spots
are replicates. The match
vector is used to match these genesIDs to rows in the significant genes
matrix, and must have the same levels as the row names of sig.profiles
.
The argument significant.intercept
modulates the treatment for intercept coefficients to apply for selecting significant genes
when vars
equals "groups"
. There are three possible values: "none"
, no significant intercept (differences) are
considered for significant gene selection, "dummy"
, includes genes with significant intercept differences between control and experimental
groups, and "all"
when both significant intercept coefficient for the control group and significant intercept
differences are considered for selecting significant genes.
add.IDs
= TRUE and trat.repl.spots
= "average"
are not compatible argumet values.
add.IDs
= TRUE and only.names
= TRUE
are compatible argumet values.
summary |
a vector or matrix listing significant genes for the variables given by the function parameters |
sig.genes |
a list with detailed information on the significant genes found for the variables given by the function parameters. Each element of the list is also a list containing:
|
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01) tc.tstep <- T.fit(data = tc.p , alfa = 0.05) ## This will obtain sigificant genes per experimental group ## which have a regression model Rsquared > 0.9 tc.sigs <- get.siggenes (tc.tstep, rsq = 0.9, vars = "groups") ## This will obtain all sigificant genes regardless the Rsquared value. ## Replicated genes are averaged. IDs <- rbind(paste("feature", c(1:300), sep = ""), rep(paste("gene", c(1:150), sep = ""), each = 2)) tc.sigs.ALL <- get.siggenes (tc.tstep, rsq = 0, vars = "all", IDs = IDs) tc.sigs.groups <- get.siggenes (tc.tstep, rsq = 0, vars = "groups", significant.intercept="dummy")
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01) tc.tstep <- T.fit(data = tc.p , alfa = 0.05) ## This will obtain sigificant genes per experimental group ## which have a regression model Rsquared > 0.9 tc.sigs <- get.siggenes (tc.tstep, rsq = 0.9, vars = "groups") ## This will obtain all sigificant genes regardless the Rsquared value. ## Replicated genes are averaged. IDs <- rbind(paste("feature", c(1:300), sep = ""), rep(paste("gene", c(1:150), sep = ""), each = 2)) tc.sigs.ALL <- get.siggenes (tc.tstep, rsq = 0, vars = "all", IDs = IDs) tc.sigs.groups <- get.siggenes (tc.tstep, rsq = 0, vars = "groups", significant.intercept="dummy")
getDS
creates lists of significant isoforms from Differentially Spliced Genes (DSG)
getDS(Model, vars="all", rsq=0.4)
getDS(Model, vars="all", rsq=0.4)
Model |
a |
vars |
argument of the |
rsq |
cut-off level at the R-squared value for the stepwise regression fit. Only isoforms with R-squared more than rsq are selected |
There are 3 possible values for the vars argument: "all", "each" and "groups". See get.siggenes
.
In the console a summary of the selection is printed.
Model |
a |
get2 |
a |
DSG |
Names of the selected genes: Differentially Spliced Genes |
DET |
Names of the selected Isoforms: Differentally Expressed Transcripts |
List0 |
a list with the names of Differentially Spliced Genes without Isoforms with R-squared higher than rsq |
NumIso.by.gene |
Number of selected Isoforms for each Differentially Spliced Gene |
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.
Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) Myget$DSG Myget$DET see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) Myget$DSG Myget$DET see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
getDSPatterns
is a function that makes a list with the names of genes identified with tableDS
function.
getDSPatterns(tableDS, Cluster.Major, Cluster.minor)
getDSPatterns(tableDS, Cluster.Major, Cluster.minor)
tableDS |
a |
Cluster.Major |
Number of the cluster where the major isoform belongs to |
Cluster.minor |
Number(s) of the cluster(s) where the minor isoform(s) belongs to (see details) |
When minor isoforms belong to different clusters, tableDS
codifies them using "&". For instance: clusters 1 and 2,
will be represented as "1&2". In such cases quotation marks must be used (see examples).
When minor isoforms are only in one cluster there is no need to use quotation marks.
A vector with the names of the genes.
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.
Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable getDSPatterns(table, 1, 4) getDSPatterns(table, "1", "4") #will give the same result. getDSPatterns(table, 1, "1&5")
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable getDSPatterns(table, 1, 4) getDSPatterns(table, "1", "4") #will give the same result. getDSPatterns(table, 1, "1&5")
Ranks the values in a vector to sucessive values. Ties are given the same value.
i.rank(x)
i.rank(x)
x |
vector |
Vector of ranked values
Ana Conesa and Maria Jose Nueda, [email protected]
i.rank(c(1, 1, 1, 3, 3, 5, 7, 7, 7))
i.rank(c(1, 1, 1, 3, 3, 5, 7, 7, 7))
ISOdata
contains an example of RNA-Seq data at Isoform level.
data(ISOdata)
data(ISOdata)
A data frame with 2782 rows and 37 columns with RNA-Seq data.
Rows correspond to 2782 isoforms belonging to 1000 gene.
First column is the name of the gene each isoform belongs to.
Remaining columns are the RNA-Seq data samples asociated to 3 replicates of 12 experimental conditions.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
ISOdesign
is the experimental design to apply ISOmaSigPro to ISOdata dataset example.
data(ISOdesign)
data(ISOdesign)
A matrix with 36 rows and 4 colums
rownames(ISOdesign) "Gr1_0h_1" "Gr1_0h_2" "Gr1_0h_3" "Gr1_2h_1" "Gr1_2h_2" "Gr1_2h_3" "Gr1_6h_1" "Gr1_6h_2" "Gr1_6h_3" "Gr1_12h_1" "Gr1_12h_2" "Gr1_12h_3" "Gr1_18h_1" "Gr1_18h_2" "Gr1_18h_3" "Gr1_24h_1" "Gr1_24h_2" "Gr1_24h_3" "Gr2_0h_1" "Gr2_0h_2" "Gr2_0h_3" "Gr2_2h_1" "Gr2_2h_2" "Gr2_2h_3" "Gr2_6h_1" "Gr2_6h_2" "Gr2_6h_3" "Gr2_12h_1" "Gr2_12h_2" "Gr2_12h_3" "Gr2_18h_1" "Gr2_18h_2" "Gr2_18h_3" "Gr2_24h_1" "Gr2_24h_2" "Gr2_24h_3"
colnames(ISOdesign) "time" "replicate" "Group1" "Group2"
Samples are given in rows and experiment descriptors are given in columns. Row names contain sample names.
"time"
indicates the values that variable Time takes in each
experimental condition. There are 6 time points.
"replicate"
is an index indicating the same experimental condition.
"Group1"
and "Group2"
columns indicate assigment to experimental groups,
coding with 1 and 0 whether each sample belongs to that group or not.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
IsoModel
Performs a model comparison for each gene to detect genes with different trends in time course experiments and applies maSigPro to the Isoforms belonging to selected genes.
IsoModel(data, gen, design = NULL, Q = 0.05, min.obs = 6, minorFoldfilter = NULL, counts = FALSE, family = NULL, theta = 10, epsilon = 1e-05)
IsoModel(data, gen, design = NULL, Q = 0.05, min.obs = 6, minorFoldfilter = NULL, counts = FALSE, family = NULL, theta = 10, epsilon = 1e-05)
data |
matrix containing isoform expression. Isoforms must be in rows and experimental conditions in columns |
gen |
vector with the name of the gene each isoform belongs to |
design |
design matrix for the regression fit such as that generated by the |
Q |
significance level |
min.obs |
cases with less than this number of true numerical values will be excluded from the analysis. Minimum value to estimate the model is (degree+1)xGroups+1. Default is 6. |
minorFoldfilter |
fold expression difference between minor isoforms and the most expressed isoform to exclude minor isoforms from analysis. Default NULL |
counts |
a logical indicating whether your data are counts |
family |
the distribution function to be used in the glm model. It must be specified as a function: gaussian(), poisson(), negative.binomial(theta)... If NULL family will be negative.binomial(theta) when counts=TRUE or gaussian() when counts=FALSE |
theta |
theta parameter for negative.binomial family |
epsilon |
argument to pass to |
rownames(design)
and colnames(data)
must be identical vectors and indicate experimental condition names.
rownames(data)
should contain unique isoform IDs.
colnames(design)
are the given names for the variables in the regression model.
data |
input data matrix to be used in the following steps |
gen |
input gen vector to be used in the following steps |
design |
input design matrix to be used in the following steps |
DSG |
Names of the selected genes: Differentially Spliced Genes |
pvector |
p.vector output of isoforms that belong to selected.genes |
Tfit |
Tfit output of isoforms that belong to selected.genes |
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.
Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
This function makes a plot with the isoforms of a specific gene splitting the different experimental groups.
IsoPlot(get, name, only.sig.iso=FALSE, ylim=NULL, xlab = "Time", ylab = "Expression value", points=TRUE, cex.main=3,cex.legend=1.5)
IsoPlot(get, name, only.sig.iso=FALSE, ylim=NULL, xlab = "Time", ylab = "Expression value", points=TRUE, cex.main=3,cex.legend=1.5)
get |
a |
name |
Name of the specific gen to show in the plot |
only.sig.iso |
TRUE when the plot is made only with statistically significant isoforms. |
ylim |
Range of the y axis of the desired plot. If it is NULL it will be computed automatically. |
xlab |
label for the x axis |
ylab |
label for the y axis |
points |
TRUE to plot points and lines. FALSE to plot only lines. |
cex.main |
graphical parameter magnification to be used for main |
cex.legend |
graphical parameter magnification to be used for legend |
The plot can be made with all the available isoforms or only with the statistilly significant ones.
Plot of isoform profiles of a specific gene by groups.
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.
Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) IsoPlot(Myget,"Gene1005",only.sig.iso=FALSE,cex.main=2,cex.legend=1)
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) IsoPlot(Myget,"Gene1005",only.sig.iso=FALSE,cex.main=2,cex.legend=1)
make.design.matrix
creates the design matrix of dummies for fitting time series micorarray
gene expression experiments.
make.design.matrix(edesign, degree = 2, time.col = 1, repl.col = 2, group.cols = c(3:ncol(edesign)))
make.design.matrix(edesign, degree = 2, time.col = 1, repl.col = 2, group.cols = c(3:ncol(edesign)))
edesign |
matrix describing experimental design. Rows must be arrays and columns experiment descriptors |
degree |
the degree of the regression fit polynome. |
time.col |
column number in edesign containing time values. Default is first column |
repl.col |
column number in edesign containing coding for replicate arrays. Default is second column |
group.cols |
column numbers in edesign indicating the coding for each experimental group (treatment, tissue, ...). See details |
rownames of edesign object should contain the arrays naming (i.e. array1, array2, ...). colnames of edesign must contain the names of experiment descriptors(i.e. "Time", "Replicates", "Treatment A", "Treatment B", etc.). for each experimental group a different column must be present in edesign, coding with 1 and 0 whether each array belongs to that group or not.
make.design.matrix
returns a design matrix where rows represent arrays and column variables of time, dummies and their interactions for up to the degree
given.
Dummies show the relative effect of each experimental group related to the first one. Single dummies indicate the abcissa component of each group.
$Time*dummy$ variables indicate slope changes, $Time^2*dummy$ indicates curvature changes. Higher grade values could model complex responses.
In case experimental groups share a initial state (i.e. common time 0), no single dummies are modeled.
dis |
design matrix of dummies for fitting time series |
groups.vector |
vector coding the experimental group to which each variable belongs to |
edesign |
|
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102
data(edesign.abiotic, edesignCT) make.design.matrix(edesign.abiotic) # quadratic model make.design.matrix(edesignCT, degree = 3) # cubic model with common starting time point
data(edesign.abiotic, edesignCT) make.design.matrix(edesign.abiotic) # quadratic model make.design.matrix(edesignCT, degree = 3) # cubic model with common starting time point
Finds the location of the maSigPro User's Guide and opens it.
maSigProUsersGuide(view=TRUE)
maSigProUsersGuide(view=TRUE)
view |
logical, to specify if the document is opened using the PDF document reader. |
The function vignette("maSigPro")
will find the short maSigPro Vignette which describes how to obtain the maSigPro User's Guide.
The User's Guide is not itself a true vignette because it is not automatically generated using Sweave during the package build process.
This means that it cannot be found using vignette, hence the need for this special function.
If the operating system is other than Windows, then the PDF viewer used is that given by Sys.getenv("R_PDFVIEWER")
.
The PDF viewer can be changed using Sys.putenv(R_PDFVIEWER=)
.
If vignette(view=TRUE)
, the PDF document reader is started and the User's Guide is opened.
If vignette(view=FALSE)
, returns the file location.
maSigProUsersGuide() maSigProUsersGuide(view=FALSE)
maSigProUsersGuide() maSigProUsersGuide(view=FALSE)
NBdata
contains a subset of a bigger normalized negative binomial simulated dataset.
data(NBdata)
data(NBdata)
A data frame with 100 observations on 36 numeric variables.
This dataset is part of a larger simulated and normalized dataset with 2 experimental groups, 6 time-points and 3 replicates. Simulation has been done by using a negative binomial distribution. The first 20 genes are simulated with changes among time.
data(NBdata)
data(NBdata)
NBdesign
contains a subset of a bigger normalized negative binomial simulated dataset.
data(NBdesign)
data(NBdesign)
A matrix with 36 rows and 4 colums
rows [1:36] "G1.T1.1" "G1.T1.2" "G1.T1.3" "G1.T2.1" ...
columns [1:6] [1] "Time" "Replicates" "Group.1" "Group.2"
Samples are given in rows and experiment descriptors are given in columns. Row names contain sample names.
"Time"
indicates the values that variable Time takes in each
experimental condition. There are 6 time points.
"Replicates"
is an index indicating the same experimental condition.
"Group.1"
and "Group.2"
columns indicate assigment to experimental groups,
coding with 1 and 0 whether each sample belongs to that group or not.
data(NBdesign)
data(NBdesign)
p.vector
performs a regression fit for each gene taking all variables present in the model given by a regression matrix
and returns a list of FDR corrected significant genes.
p.vector(data, design, Q = 0.05, MT.adjust = "BH", min.obs = 6, counts=FALSE, family=NULL, theta=10, epsilon=0.00001, item="gene")
p.vector(data, design, Q = 0.05, MT.adjust = "BH", min.obs = 6, counts=FALSE, family=NULL, theta=10, epsilon=0.00001, item="gene")
data |
matrix containing normalized gene expression data. Genes must be in rows and arrays in columns |
design |
design matrix for the regression fit such as that generated by the |
Q |
significance level |
MT.adjust |
argument to pass to |
min.obs |
genes with less than this number of true numerical values will be excluded from the analysis. Minimum value to estimate the model is (degree+1)xGroups+1. Default is 6. |
counts |
a logical indicating whether your data are counts |
family |
the distribution function to be used in the glm model. It must be specified as a function: gaussian(), poisson(), negative.binomial(theta)... If NULL family will be negative.binomial(theta) when counts=TRUE or gaussian() when counts=FALSE |
theta |
theta parameter for negative.binomial family |
epsilon |
argument to pass to |
item |
Name of the analysed item to show in the screen while p.vector is in process |
rownames(design)
and colnames(data)
must be identical vectors
and indicate array naming.
rownames(data)
should contain unique gene IDs.
colnames(design)
are the given names for the variables in the regression model.
SELEC |
matrix containing the expression values for significant genes |
p.vector |
vector containing the computed p-values |
G |
total number of input genes |
g |
number of genes taken in the regression fit |
FDR |
p-value at FDR |
i |
number of significant genes |
dis |
design matrix used in the regression fit |
dat |
matrix of expression value data used in the regression fit |
... |
additional values from input parameters |
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102
#### GENERATE TIME COURSE DATA ## generates n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c2 = 1.3, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.05) tc.p$i # number of significant genes tc.p$SELEC # expression value of signficant genes tc.p$FDR # p.value at FDR control tc.p$p.adjusted# adjusted p.values
#### GENERATE TIME COURSE DATA ## generates n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c2 = 1.3, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.05) tc.p$i # number of significant genes tc.p$SELEC # expression value of signficant genes tc.p$FDR # p.value at FDR control tc.p$p.adjusted# adjusted p.values
This function displays the gene expression profile for each experimental group in a time series gene expression experiment.
PlotGroups(data, edesign = NULL, time = edesign[, 1], groups = edesign[,c(3:ncol(edesign))], repvect = edesign[, 2], show.lines = TRUE, show.fit = FALSE, dis = NULL, step.method = "backward", min.obs = 2, alfa = 0.05, nvar.correction = FALSE, summary.mode = "median", groups.vector = NULL, main = NULL, sub = NULL, xlab = "Time", ylab = "Expression value", item = NULL, ylim = NULL, pch = 21, col = NULL, legend = TRUE, cex.legend = 1,lty.legend = NULL,... )
PlotGroups(data, edesign = NULL, time = edesign[, 1], groups = edesign[,c(3:ncol(edesign))], repvect = edesign[, 2], show.lines = TRUE, show.fit = FALSE, dis = NULL, step.method = "backward", min.obs = 2, alfa = 0.05, nvar.correction = FALSE, summary.mode = "median", groups.vector = NULL, main = NULL, sub = NULL, xlab = "Time", ylab = "Expression value", item = NULL, ylim = NULL, pch = 21, col = NULL, legend = TRUE, cex.legend = 1,lty.legend = NULL,... )
data |
vector or matrix containing the gene expression data |
edesign |
matrix describing experimental design. Rows must be arrays and columns experiment descriptors |
time |
vector indicating time assigment for each array |
groups |
matrix indicating experimental group to which each array is assigned |
repvect |
index vector indicating experimental replicates |
show.lines |
logical indicating whether a line must be drawn joining plotted data points for reach group |
show.fit |
logical indicating whether regression fit curves must be plotted |
dis |
regression design matrix |
step.method |
stepwise regression method to fit models for cluster mean profiles. It can be either |
min.obs |
minimal number of observations for a gene to be included in the analysis |
alfa |
significance level used for variable selection in the stepwise regression |
nvar.correction |
argument for correcting stepwise regression significance level. See |
summary.mode |
the method to condensate expression information when more than one gene is present in the data. Possible values are |
groups.vector |
vector indicating experimental group to which each variable belongs |
main |
plot main title |
sub |
plot subtitle |
xlab |
label for the x axis |
ylab |
label for the y axis |
item |
name of the analysed items to show |
ylim |
range of the y axis |
pch |
integer specifying type of points to plot |
col |
a vector specifying colours to plot. If missing first naturals will be used |
legend |
logical indicating whether legend must be added when plotting profiles |
cex.legend |
Expansion factor for legend |
lty.legend |
To add a coloured line in the legend |
... |
other graphical function argument |
To compute experimental groups either a edesign object must be provided,
or separate values must be given for the time
, repvect
and
groups
arguments.
When data is a matrix, the average expression value is displayed.
When there are array replicates in the data (as indicated by
repvect
), values are averaged by repvect
.
PlotGroups plots one single expression profile for each experimental
group even if there are more that one genes in the data set. The way
data is condensated for this is given by summary.mode
. When this
argument takes the value "representative"
, the gene with the
lowest distance to all genes in the cluster will be plotted. When the
argument is "median"
, then median expression value is computed.
When show.fit
is TRUE
the stepwise regression fit for the
data will be computed and the regression curves will be displayed.
If data is a matrix of genes and summary.mode
is "median"
,
the regression fit will be computed for the median expression value.
Plot of gene expression profiles by-group.
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) rownames(tc.DATA) <- paste("gene", c(1:10), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Ctl <- c(rep(1, 9), rep(0, 27)) Tr1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Tr2 <- c(rep(0, 18), rep(1, 9), rep(0, 9)) Tr3 <- c(rep(0, 27), rep(1, 9)) PlotGroups (tc.DATA, time = Time, repvect = Replicates, groups = cbind(Ctl, Tr1, Tr2, Tr3))
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) rownames(tc.DATA) <- paste("gene", c(1:10), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Ctl <- c(rep(1, 9), rep(0, 27)) Tr1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Tr2 <- c(rep(0, 18), rep(1, 9), rep(0, 9)) Tr3 <- c(rep(0, 27), rep(1, 9)) PlotGroups (tc.DATA, time = Time, repvect = Replicates, groups = cbind(Ctl, Tr1, Tr2, Tr3))
PlotProfiles
displays the expression profiles of a group of genes.
PlotProfiles(data, cond, cex.axis = 0.5, ylim = NULL, repvect, main = NULL, sub = NULL, color.mode = "rainbow", item = NULL)
PlotProfiles(data, cond, cex.axis = 0.5, ylim = NULL, repvect, main = NULL, sub = NULL, color.mode = "rainbow", item = NULL)
data |
a matrix containing the gene expression data |
cond |
vector for x axis labeling, typically array names |
cex.axis |
graphical parameter maginfication to be used for x axis in plotting functions |
ylim |
index vector indicating experimental replicates |
repvect |
index vector indicating experimental replicates |
main |
plot main title |
sub |
plot subtitle |
color.mode |
color scale for plotting profiles. Can be either |
item |
Name of the analysed items to show |
The repvect
argument is used to indicate with vertical lines groups of replicated arrays.
Plot of experiment-wide gene expression profiles.
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) rownames(tc.DATA) <- paste("gene", c(1:10), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") PlotProfiles (tc.DATA, cond = colnames(tc.DATA), main = "Time Course", repvect = rep(c(1:12), each = 3))
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) rownames(tc.DATA) <- paste("gene", c(1:10), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") PlotProfiles (tc.DATA, cond = colnames(tc.DATA), main = "Time Course", repvect = rep(c(1:12), each = 3))
This function provides lists of genes that have different Major isoforms (most expressed) when different intervals of the experimental conditions are considered.
The subrange of the experimental conditions can be chosen as a specific point, all the points of a specific experimental group or at any point.
PodiumChange(get, only.sig.iso=FALSE, comparison=c("any", "groups","specific"), group.name="Ctr", time.points=0)
PodiumChange(get, only.sig.iso=FALSE, comparison=c("any", "groups","specific"), group.name="Ctr", time.points=0)
get |
a |
only.sig.iso |
TRUE when changes are looked for only through statistically significant isoforms. |
comparison |
Type of search to do: any, groups or specific (see details). |
group.name |
required when |
time.points |
required when |
There are 3 possible values for the comparison argument:
"any"
: Detects genes with Major Isoform changes in at least one experimental condition.
"groups"
: Detects genes with different Major Isoform for different experimental groups.
"specific"
: Detects genes with Major Isoform changes in a specific time interval, especified in time.points
argument and a specific experimental group, especified in group.name
argument.
L |
Names of the genes with PodiumChange Isoforms |
data.L |
Data values of all the isoforms belonging to the genes in L |
gen.L |
gen vector with the name of the gene of each isoform |
edesign |
matrix describing experimental design needed to visualize PodiumChange selection with |
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526. Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) PC <- PodiumChange(Myget, only.sig.iso=TRUE, comparison="specific", group.name="Group2", time.points=c(18,24)) PC$L
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) PC <- PodiumChange(Myget, only.sig.iso=TRUE, comparison="specific", group.name="Group2", time.points=c(18,24)) PC$L
Finds the column position of a character variable in the column names of a data frame.
position(matrix, vari)
position(matrix, vari)
matrix |
matrix or data.frame with character column names |
vari |
character variable |
numerical. Column position for the given variable.
Ana Conesa and Maria Jose Nueda, [email protected]
x <- matrix(c(1, 1, 2, 2, 3, 3),ncol = 3,nrow = 2) colnames(x) <- c("one", "two", "three") position(x, "one")
x <- matrix(c(1, 1, 2, 2, 3, 3),ncol = 3,nrow = 2) colnames(x) <- c("one", "two", "three") position(x, "one")
reg.coeffs
calculates back regression coefficients for true variables (experimental groups) from dummy variables regression coefficients.
reg.coeffs(coefficients, indepen = groups.vector[nchar(groups.vector)==min(nchar(groups.vector))][1], groups.vector, group)
reg.coeffs(coefficients, indepen = groups.vector[nchar(groups.vector)==min(nchar(groups.vector))][1], groups.vector, group)
coefficients |
vector of regression coefficients obtained from a regression model with dummy variables |
indepen |
idependent variable of the regression formula |
groups.vector |
vector indicating the true variable of each variable in |
group |
true variable for which regression coefficients are to be computed |
regression coefficients in coefficients vector should be ordered by polynomial degree in a regression formula, ie: intercept, $x$ term, $x^2$ term, $x^3$ term, and so on...
reg.coeff |
vector of calculated regression coefficients |
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
groups.vector <-c("CT", "T1vsCT", "T2vsCT", "CT", "T1vsCT","T2vsCT", "CT", "T1vsCT", "T2vsCT") coefficients <- c(0.1, 1.2, -0.8, 1.7, 3.3, 0.4, 0.0, 2.1, -0.9) ## calculate true regression coefficients for variable "T1" reg.coeffs(coefficients, groups.vector = groups.vector, group = "T1")
groups.vector <-c("CT", "T1vsCT", "T2vsCT", "CT", "T1vsCT","T2vsCT", "CT", "T1vsCT", "T2vsCT") coefficients <- c(0.1, 1.2, -0.8, 1.7, 3.3, 0.4, 0.0, 2.1, -0.9) ## calculate true regression coefficients for variable "T1" reg.coeffs(coefficients, groups.vector = groups.vector, group = "T1")
This function provides visualisation tools for gene expression values in a time course experiment. The function first calls the heatmap function for a general overview of experiment results. Next a partioning of the data is generated using a clustering method. The results of the clustering are visualized both as gene expression profiles extended along all arrays in the experiment, as provided by the plot.profiles function, and as summary expression profiles for comparison among experimental groups.
see.genes(data, edesign = data$edesign, time.col = 1, repl.col = 2, group.cols = c(3:ncol(edesign)), names.groups = colnames(edesign)[3:ncol(edesign)], cluster.data = 1, groups.vector = data$groups.vector, k = 9, k.mclust=FALSE, cluster.method = "hclust", distance = "cor", agglo.method = "ward.D", show.lines = TRUE, show.fit = FALSE, dis = NULL, step.method = "backward", min.obs = 3, alfa = 0.05, nvar.correction = FALSE, iter.max = 500, summary.mode = "median", color.mode = "rainbow", ylim = NULL, item = "genes", legend = TRUE, cex.legend = 1, lty.legend = NULL,...)
see.genes(data, edesign = data$edesign, time.col = 1, repl.col = 2, group.cols = c(3:ncol(edesign)), names.groups = colnames(edesign)[3:ncol(edesign)], cluster.data = 1, groups.vector = data$groups.vector, k = 9, k.mclust=FALSE, cluster.method = "hclust", distance = "cor", agglo.method = "ward.D", show.lines = TRUE, show.fit = FALSE, dis = NULL, step.method = "backward", min.obs = 3, alfa = 0.05, nvar.correction = FALSE, iter.max = 500, summary.mode = "median", color.mode = "rainbow", ylim = NULL, item = "genes", legend = TRUE, cex.legend = 1, lty.legend = NULL,...)
data |
either matrix or a list containing the gene expression data, typically a |
edesign |
matrix of experimental design |
time.col |
column in edesign containing time values. Default is first column |
repl.col |
column in edesign containing coding for replicates arrays. Default is second column |
group.cols |
columns indicating the coding for each group (treatment, tissue,...) in the experiment (see details) |
names.groups |
names for experimental groups |
cluster.data |
type of data used by the cluster algorithm (see details) |
groups.vector |
vector indicating the experimental group to which each variable belongs |
k |
number of clusters for data partioning |
k.mclust |
TRUE for computing the optimal number of clusters with Mclust algorithm |
cluster.method |
clustering method for data partioning. Currently |
distance |
distance measurement function when |
agglo.method |
aggregation method used when |
show.lines |
logical indicating whether a line must be drawn joining plotted data points for reach group |
show.fit |
logical indicating whether regression fit curves must be plotted |
dis |
regression design matrix |
step.method |
stepwise regression method to fit models for cluster mean profiles. Can be either |
min.obs |
minimal number of observations for a gene to be included in the analysis |
alfa |
significance level used for variable selection in the stepwise regression |
nvar.correction |
argument for correcting |
iter.max |
maximum number of iterations when |
summary.mode |
the method |
color.mode |
color scale for plotting profiles. Can be either |
ylim |
range of the y axis to be used by |
item |
Name of the analysed items to show |
legend |
logical indicating whether legend must be added when plotting profiles |
cex.legend |
Expansion factor for legend |
lty.legend |
To add a coloured line in the legend |
... |
other graphical function argument |
Data can be provided either as a single data matrix of expression values, or a get.siggenes
object. In the later case
the other argument of the fuction can be taken directly from data
.
Data clustering can be done on the basis of either the original expression values, the regression coefficients,
or the t.scores. In case data
is a get.siggenes
object, this is given by providing the element names of the list
c("sig.profiles","coefficients","t.score")
of their list position (1,2 or 3).
Experiment wide gene profiles and by group profiles plots are generated for each data cluster in the graphical device.
cut |
vector indicating gene partioning into clusters |
c.algo.used |
clustering algorith used for data partioning |
groups |
groups matrix used for plotting functions |
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") see.genes(tc.DATA, edesign = edesign, k = 4) # This will show the regression fit curve dise <- make.design.matrix(edesign) see.genes(tc.DATA, edesign = edesign, k = 4, show.fit = TRUE, dis = dise$dis, groups.vector = dise$groups.vector, distance = "euclidean")
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") see.genes(tc.DATA, edesign = edesign, k = 4) # This will show the regression fit curve dise <- make.design.matrix(edesign) see.genes(tc.DATA, edesign = edesign, k = 4, show.fit = TRUE, dis = dise$dis, groups.vector = dise$groups.vector, distance = "euclidean")
seeDS
This function provides visualisation tools for Significant Isoforms in a time course
experiment. The function calls the see.genes
function for selected Isoforms. This cluster will be the reference in tableDS
function to identify the trends that follows the isoforms of a specific gene.
seeDS(get, rsq=0.4, cluster.all=TRUE, plot.mDSG=FALSE, k=6, cluster.method="hclust", k.mclust=FALSE, ...)
seeDS(get, rsq=0.4, cluster.all=TRUE, plot.mDSG=FALSE, k=6, cluster.method="hclust", k.mclust=FALSE, ...)
get |
a |
rsq |
Required when cluster.all=TRUE. It is the cut-off level at the R-squared value for detecting significant isoforms of all the genome. |
cluster.all |
TRUE to make the cluster with significant isoforms of all the genome. FALSE to make the cluster with significant isoforms of Differentially Spliced Genes. |
plot.mDSG |
TRUE to make a cluster with the Isoforms belonging to monoIsoform genes |
k |
number of clusters for data partioning |
cluster.method |
clustering method for data partioning. Currently |
k.mclust |
TRUE for computing the optimal number of clusters with Mclust algorithm |
... |
other graphical function argument |
The cluster reference can be made with significant isoforms of all the genome or with the isoforms belonging to the Differentially Spliced Genes.
Alternatively a cluster of monoIsoforms can be asked.
Next a partioning of the data is generated using a clustering method.
The results of the clustering are visualized in two plots as in see.genes
.
Experiment wide Isoform profiles and by group profiles plots are generated for each data cluster in the graphical device.
Model |
a |
get |
a |
NumIso.by.gene |
Number of selected Isoforms for each Differentially Spliced Gene |
cut |
vector indicating gene partioning into clusters |
names.genes |
vector with the name of the gene each selected isoform belongs to |
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.
Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
stepback
fits a linear regression model applying a backward-stepwise strategy.
stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
y |
dependent variable |
d |
data frame containing by columns the set of variables that could be in the selected model |
alfa |
significance level to decide if a variable stays or not in the model |
family |
the distribution function to be used in the glm model |
epsilon |
argument to pass to |
The strategy begins analysing a model with all the variables included in d. If all variables are statistically significant (all variables have a p-value less than alfa) this model will be the result. If not, the less statistically significant variable will be removed and the model is re-calculated. The process is repeated up to find a model with all the variables statistically significant.
stepback
returns an object of the class lm
, where the model uses
y
as dependent variable and all the selected variables from d
as independent variables.
The function summary
are used to obtain a summary and analysis of variance table of the results.
The generic accessor functions coefficients
, effects
,
fitted.values
and residuals
extract various useful features of the value returned by lm
.
Ana Conesa, [email protected]; Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
lm
, step
, stepfor
, two.ways.stepback
, two.ways.stepfor
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- stepback(y = y, d = dis) summary(s.fit)
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- stepback(y = y, d = dis) summary(s.fit)
stepfor
fits a linear regression model applying forward-stepwise strategy.
stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
y |
dependent variable |
d |
data frame containing by columns the set of variables that could be in the selected model |
alfa |
significance level to decide if a variable stays or not in the model |
family |
the distribution function to be used in the glm model |
epsilon |
argument to pass to |
The strategy begins analysing all the possible models with only one of the variables included in d
.
The most statistically significant variable (with the lowest p-value) is included in the model and
then it is considered to introduce in the model another variable analysing all the possible models
with two variables (the selected variable in the previous step plus a new variable). Again the most
statistically significant variable (with lowest p-value) is included in the model. The process is
repeated till there are no more statistically significant variables to include.
stepfor
returns an object of the class lm
, where the model uses
y
as dependent variable and all the selected variables from d
as independent variables.
The function summary
are used to obtain a summary and analysis of variance table of the results.
The generic accessor functions coefficients
, effects
,
fitted.values
and residuals
extract various useful features of the value returned by lm
.
Ana Conesa, [email protected]; Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
lm
, step
, stepback
, two.ways.stepback
, two.ways.stepfor
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- stepfor(y = y, d = dis) summary(s.fit)
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- stepfor(y = y, d = dis) summary(s.fit)
suma2Venn
transforms a matrix or a data frame with characters into a list to draw and display a Venn diagram with up to 7 sets
suma2Venn(x, size = 30, cexil = 0.9, cexsn = 1, zcolor = heat.colors(ncol(x)), ...)
suma2Venn(x, size = 30, cexil = 0.9, cexsn = 1, zcolor = heat.colors(ncol(x)), ...)
x |
matrix or data frame of character values |
size |
Plot size, in centimeters |
cexil |
Character expansion for the intersection labels |
cexsn |
Character expansion for the set names |
zcolor |
A vector of colors for the custom zones |
... |
Additional plotting arguments for the venn function |
suma2Venn
creates a list with the columns of a matrix or a data frame of characters which can be taken by the
venn
to generate a Venn Diagram
suma2Venn
returns a Venn Plot such as that created by the
venn
function
Ana Conesa and Maria Jose Nueda, [email protected]
A <- c("a","b","c", "d", "e", NA, NA) B <- c("a","b","f", NA, NA, NA, NA) C <- c("a","b","e","f", "h", "i", "j", "k") x <- cbind(A, B, C) suma2Venn(x)
A <- c("a","b","c", "d", "e", NA, NA) B <- c("a","b","f", NA, NA, NA, NA) C <- c("a","b","e","f", "h", "i", "j", "k") x <- cbind(A, B, C) suma2Venn(x)
T.fit
selects the best regression model for each gene using stepwise regression.
T.fit(data, design = data$dis, step.method = "backward", min.obs = data$min.obs, alfa = data$Q, nvar.correction = FALSE, family = gaussian(), epsilon=0.00001, item="gene")
T.fit(data, design = data$dis, step.method = "backward", min.obs = data$min.obs, alfa = data$Q, nvar.correction = FALSE, family = gaussian(), epsilon=0.00001, item="gene")
data |
can either be a |
design |
design matrix for the regression fit such as that generated by the |
step.method |
argument to be passed to the step function. Can be either |
min.obs |
genes with less than this number of true numerical values will be excluded from the analysis |
alfa |
significance level used for variable selection in the stepwise regression |
nvar.correction |
argument for correcting T.fit significance level. See details |
family |
the distribution function to be used in the glm model. It must be the same used in p.vector |
epsilon |
argument to pass to |
item |
Name of the analysed item to show in the screen while T.fit is in process |
In the maSigPro approach p.vector
and T.fit
are subsequent steps, meaning that significant genes are
first selected on the basis of a general model and then the significant variables for each gene are found by step-wise regression.
The step regression can be "backward"
or "forward"
indicating whether the step procedure starts from the
model with all or none variables. With the "two.ways.backward"
or "two.ways.forward"
options the variables are both allowed to get in and out.
At each step the p-value of each variable is computed and variables get in/out the model when this p-value is
lower or higher than given threshold alfa. When nva.correction is TRUE the given significance level is corrected by the number of variables in the model
sol |
matrix for summary results of the stepwise regression. For each selected gene the following values are given:
|
sig.profiles |
expression values for the genes contained in |
coefficients |
matrix containing regression coefficients for the adjusted models |
groups.coeffs |
matrix containing the coefficients of the impiclit models of each experimental group |
variables |
variables in the complete regression model |
G |
total number of input genes |
g |
number of genes taken in the regression fit |
dat |
input analysis data matrix |
dis |
regression design matrix |
step.method |
imputed step method for stepwise regression |
edesign |
matrix of experimental design |
influ.info |
data frame of genes containing influencial data |
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") ## run T.fit from a p.vector object tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01) tc.tstep <- T.fit(data = tc.p , alfa = 0.05) ## run T.fit from a data matrix and a design matrix dise <- make.design.matrix(edesign) tc.tstep <- T.fit (data = tc.DATA[271:300,], design = dise$dis, step.method = "two.ways.backward", min.obs = 10, alfa = 0.05) tc.tstep$sol # gives the p.values of the significant # regression coefficients of the optimized models
#### GENERATE TIME COURSE DATA ## generate n random gene expression profiles of a data set with ## one control plus 3 treatments, 3 time points and r replicates per time point. tc.GENE <- function(n, r, var11 = 0.01, var12 = 0.01,var13 = 0.01, var21 = 0.01, var22 = 0.01, var23 =0.01, var31 = 0.01, var32 = 0.01, var33 = 0.01, var41 = 0.01, var42 = 0.01, var43 = 0.01, a1 = 0, a2 = 0, a3 = 0, a4 = 0, b1 = 0, b2 = 0, b3 = 0, b4 = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0) { tc.dat <- NULL for (i in 1:n) { Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group gene <- c(Ctl, Tr1, Tr2, Tr3) tc.dat <- rbind(tc.dat, gene) } tc.dat } ## Create 270 flat profiles flat <- tc.GENE(n = 270, r = 3) ## Create 10 genes with profile differences between Ctl and Tr1 groups twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3) ## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2) ## Create 10 genes with profile differences between Ctl and Tr2 and different variance vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03) ## Create dataset tc.DATA <- rbind(flat, twodiff, threediff, vardiff) rownames(tc.DATA) <- paste("feature", c(1:300), sep = "") colnames(tc.DATA) <- paste("Array", c(1:36), sep = "") tc.DATA [sample(c(1:(300*36)), 300)] <- NA # introduce missing values #### CREATE EXPERIMENTAL DESIGN Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") ## run T.fit from a p.vector object tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01) tc.tstep <- T.fit(data = tc.p , alfa = 0.05) ## run T.fit from a data matrix and a design matrix dise <- make.design.matrix(edesign) tc.tstep <- T.fit (data = tc.DATA[271:300,], design = dise$dis, step.method = "two.ways.backward", min.obs = 10, alfa = 0.05) tc.tstep$sol # gives the p.values of the significant # regression coefficients of the optimized models
tableDS
identifies for each Differentialy Spliced Gene (DSG) the clusters where their isoforms belong to, labelling gene transcripts as mayor (or most expressed) and minor.
tableDS(seeDS)
tableDS(seeDS)
seeDS |
a |
This table includes DSG with 2 or more Isoforms. Mono isoform genes are useful to determine the trends of the cluster. However, as they have only one Isoform, there is not the possibility of comparing minor and major DETs.
IsoTable |
A classification table that indicates the distribution of isoforms across diferent clusters |
IsoClusters |
A data.frame with genes in rows and two columns: first indicates the number of cluster of the major isoform and second the number(s) of cluster(s) of the minor isoforms. |
Maria Jose Nueda, [email protected]
Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.
Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
data(ISOdata) data(ISOdesign) mdis <- make.design.matrix(ISOdesign) MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE) Myget <- getDS(MyIso) see <- seeDS(Myget, cluster.all=FALSE, k=6) table <- tableDS(see) table$IsoTable
two.ways.stepback
fits a linear regression model applying backward-stepwise strategy.
two.ways.stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001)
two.ways.stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001)
y |
dependent variable |
d |
data frame containing by columns the set of variables that could be in the selected model |
alfa |
significance level to decide if a variable stays or not in the model |
family |
the distribution function to be used in the glm model |
epsilon |
argument to pass to |
The strategy begins analysing a model with all the variables included in d. If all the variables are statistically significant (all the variables have a p-value less than alfa) this model will be the result. If not, the less statistically significant variable will be removed and the model is re-calculated. The process is repeated up to find a model with all the variables statistically significant (p-value < alpha). Each time that a variable is removed from the model, it is considered the possibility of one or more removed variables to come in again.
two.ways.stepback
returns an object of the class lm
, where the model uses
y
as dependent variable and all the selected variables from d
as independent variables.
The function summary
are used to obtain a summary and analysis of variance table of the results.
The generic accessor functions coefficients
, effects
,
fitted.values
and residuals
extract various useful features of the value returned by lm
.
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
lm
, step
, stepfor
, stepback
, two.ways.stepfor
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- two.ways.stepback(y = y, d = dis) summary(s.fit)
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- two.ways.stepback(y = y, d = dis) summary(s.fit)
two.ways.stepfor
fits a linear regression model applying forward-stepwise strategy.
two.ways.stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
two.ways.stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
y |
dependent variable |
d |
data frame containing by columns the set of variables that could be in the selected model |
alfa |
significance level to decide if a variable stays or not in the model |
family |
the distribution function to be used in the glm model |
epsilon |
argument to pass to |
The strategy begins analysing all the possible models with only one of the variables included in d
.
The most statistically significant variable (with the lowest p-value) is included in the model and then
it is considered to introduce in the model another variable analysing all the possible models with two variables
(the selected variable in the previous step plus a new variable). Again the most statistically significant variable
(with lowest p-value) is included in the model. The process is repeated till there are no more statistically significant
variables to include. Each time that a variable enters the model, the p-values of the current model vairables is recalculated and non significant variables will be removed.
two.ways.stepfor
returns an object of the class lm
, where the model uses
y
as dependent variable and all the selected variables from d
as independent variables.
The function summary
are used to obtain a summary and analysis of variance table of the results.
The generic accessor functions coefficients
, effects
,
fitted.values
and residuals
extract various useful features of the value returned by lm
.
Ana Conesa and Maria Jose Nueda, [email protected]
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
lm
, step
, stepback
, stepfor
, two.ways.stepback
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- two.ways.stepfor(y = y, d = dis) summary(s.fit)
## create design matrix Time <- rep(c(rep(c(1:3), each = 3)), 4) Replicates <- rep(c(1:12), each = 3) Control <- c(rep(1, 9), rep(0, 27)) Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18)) Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9)) Treat3 <- c(rep(0, 27), rep(1, 9)) edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3) rownames(edesign) <- paste("Array", c(1:36), sep = "") dise <- make.design.matrix(edesign) dis <- as.data.frame(dise$dis) ## expression vector y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, -0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931, -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463) s.fit <- two.ways.stepfor(y = y, d = dis) summary(s.fit)