Package 'maSigPro'

Title:	Significant Gene Expression Profile Differences in Time Course Gene Expression Data
Description:	maSigPro is a regression based approach to find genes for which there are significant gene expression profile differences between experimental groups in time course microarray and RNA-Seq experiments.
Authors:	Ana Conesa and Maria Jose Nueda
Maintainer:	Maria Jose Nueda <[email protected]>
License:	GPL (>= 2)
Version:	1.79.0
Built:	2025-02-16 03:00:17 UTC
Source:	https://github.com/bioc/maSigPro

Help Index

Average rows by match and index
Gene expression data potato abiotic stress
Experimental design potato abiotic stress
Experimental design with a shared time
Experimental design with different replicates
Extract significant genes for sets of variables in time series gene expression experiments
Extract lists of significant isoforms from Differentially Spliced Genes (DSG)
Lists of genes with Isoforms in different clusters
Ranks a vector to index
RNA-Seq dataset example for isoforms
Experimental design for ISOdata dataset example
Detection of genes with Isoforms with different gene expression in time course experiments
Plotting the isoform profiles of a specific gene by groups
Make a design matrix for regression fit of time series gene expression experiments
View maSigPro User's Guide
RNA-Seq dataset example
Experimental design for RNA-Seq example
Make regression fit for time series gene expression experiments
Function for plotting gene expression profile at different experimental groups
Function for visualization of gene expression profiles
Detection of Genes with switchs of their major isoforms
Column position of a variable in a data frame
Calculate true variables regression coefficients
Wrapper function for visualization of gene expression values of time course experiments
Wrapper function for visualization of significant isoforms from Differentially Spliced Genes
Fitting a linear model by backward-stepwise regression
Fitting a linear model by forward-stepwise regression
Creates a Venn Diagram from a matrix of characters
Makes a stepwise regression fit for time series gene expression experiments
Identification of Mayor and minor Isoforms in the clusters
Fitting a linear model by backward-stepwise regression
Fitting a linear model by forward-stepwise regression

Average rows by match and index

Description

average.rows matches rownames of a matrix to a match vector and performs averaging of the rows by the index provided by an index vector.

Usage

average.rows(x, index, match, r = 0.7)
average.rows(x, index, match, r = 0.7)

Arguments

`x`	a matrix
`index`	index vector indicating how rows must be averaged
`match`	match vector for indexing rows
`r`	minimal correlation value between rows to compute average

Details

rows will be averaged only if the pearson correlation coefficient between all rows of each given index is greater than r. If not, that group of rows is discarded in the result matrix.

Value

a matrix of averaged rows

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

Examples


## create data matrix for row averaging
x <- matrix(rnorm(30), nrow = 6,ncol = 5)
rownames(x) <- paste("ID", c(1, 2, 11, 12, 19, 20), sep = "")
i <- paste("g", rep(c(1:10), each = 2), sep = "")  # index vector
m <- paste("ID", c(1:20), sep = "")  # match vector
average.rows(x, i, m, r = 0)

## create data matrix for row averaging
x <- matrix(rnorm(30), nrow = 6,ncol = 5)
rownames(x) <- paste("ID", c(1, 2, 11, 12, 19, 20), sep = "")
i <- paste("g", rep(c(1:10), each = 2), sep = "")  # index vector
m <- paste("ID", c(1:20), sep = "")  # match vector
average.rows(x, i, m, r = 0)

Gene expression data potato abiotic stress

Description

data.abiotic contains gene expression of a time course microarray experiment where potato plants were submitted to 3 different abiotic stresses.

Usage

data(data.abiotic)data(data.abiotic)

Format

A data frame with 1000 observations on the following 36 variables.

Control_3H_1: a numeric vector
Control_3H_2: a numeric vector
Control_3H_3: a numeric vector
Control_9H_1: a numeric vector
Control_9H_2: a numeric vector
Control_9H_3: a numeric vector
Control_27H_1: a numeric vector
Control_27H_2: a numeric vector
Control_27H_3: a numeric vector
Cold_3H_1: a numeric vector
Cold_3H_2: a numeric vector
Cold_3H_3: a numeric vector
Cold_9H_1: a numeric vector
Cold_9H_2: a numeric vector
Cold_9H_3: a numeric vector
Cold_27H_1: a numeric vector
Cold_27H_2: a numeric vector
Cold_27H_3: a numeric vector
Heat_3H_1: a numeric vector
Heat_3H_2: a numeric vector
Heat_3H_3: a numeric vector
Heat_9H_1: a numeric vector
Heat_9H_2: a numeric vector
Heat_9H_3: a numeric vector
Heat_27H_1: a numeric vector
Heat_27H_2: a numeric vector
Heat_27H_3: a numeric vector
Salt_3H_1: a numeric vector
Salt_3H_2: a numeric vector
Salt_3H_3: a numeric vector
Salt_9H_1: a numeric vector
Salt_9H_2: a numeric vector
Salt_9H_3: a numeric vector
Salt_27H_1: a numeric vector
Salt_27H_2: a numeric vector
Salt_27H_3: a numeric vector

Details

This data set is part of a larger experiment in wich gene expression was monitored in both roots and leaves using a 11K cDNA potato chip. This example data set contains a ramdom subset of 1000 genes of the leave study.

References

Rensink WA, Iobst S, Hart A, Stegalkina S, Liu J, Buell CR. Gene expression profiling of potato responses to cold, heat, and salt stress. Funct Integr Genomics. 2005 Apr 22.

Examples

data(data.abiotic)

data(data.abiotic)

Experimental design potato abiotic stress

Description

edesign.abiotic contains experimental set up of a time course microarray experiment where potato plants were submitted to 3 different abiotic stresses.

Usage

data(edesign.abiotic)data(edesign.abiotic)

Format

A matrix with 36 rows and 6 columns

rows [1:36] "Control 3h 1" "Control 3h 2" "Control 3h 3" "Control 9h 1" ...

columns [1:6] "Time" "Replicates" "Control" "Cold" "Heat" "Salt"

Details

Arrays are given in rows and experiment descriptors are given in columns. Row names contain array names.

"Time" indicates the values that variable Time takes in each hybridization.

"Replicates" is an index indicating replicate hyridizations, i.e. hybridizations are numbered, giving replicates the same number.

"Control", "Cold", "Heat" and "Salt" columns indicate array assigment to experimental groups, coding with 1 and 0 whether each array belongs to that group or not.

References

Rensink WA, Iobst S, Hart A, Stegalkina S, Liu J, Buell CR. Gene expression profiling of potato responses to cold, heat, and salt stress. Funct Integr Genomics. 2005 Apr 22.

Examples

data(edesignCR)
data(edesignCR)

Experimental design with a shared time

Description

edesignCT contains the experimental set up of a time course microarray experiment where there is a common starting point for the different experimental groups.

Usage

data(edesignCT)data(edesignCT)

Format

A matrix with 32 rows and 7 colums

rows [1:32] "Array1" "Array2" "Array3" "Array4" ...

columns [1:7] "Time" "Replicates" "Control" "Tissue1" "Tissue2" "Tissue3" "Tissue4"

Details

Arrays are given in rows and experiment descriptors are given in columns. Row names contain array names.

"Time" indicates the values that variable Time takes in each hybridization. There are 4 time points, which allows an up to 3 degree regression polynome.

"Replicates" is an index indicating replicate hyridizations, i.e. hybridizations are numbered, giving replicates the same number.

"Control", "Tissue1", "Tissue2", "Tissue3" and "Tissue4" columns indicate array assigment to experimental groups, coding with 1 and 0 whether each array belongs to that group or not.

Examples

data(edesignCT)

data(edesignCT)

Experimental design with different replicates

Description

edesignDR contains experimental set up of a replicated time course microarray experiment where rats were submitted to 3 different dosis of a toxic compound. A control and an placebo treatments are also present in the experiment.

Usage

data(edesignDR)data(edesignDR)

Format

A matrix with 54 rows and 7 columns

rows [1:54] "Array1" "Array2" "Array3" "Array4" ...

columns [1:7] "Time" "Replicates" "Control" "Placebo" "Low" "Medium" "High"

Details

Arrays are given in rows and experiment descriptors are given in columns. Row names contain array names.

"Time" indicates the values that variable Time takes in each hybridization.

"Replicates" is an index indicating replicate hyridizations, i.e. hybridizations are numbered, giving replicates the same number.

"Control", "Placebo", "Low", "Medium" and "High" columns indicate array assigment to experimental groups, coding with 1 and 0 whether each array belongs to that group or not.

References

Heijne, W.H.M.; Stierum, R.; Slijper, M.; van Bladeren P.J. and van Ommen B.(2003). Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. Biochemical Pharmacology 65 857-875.

Examples

data(edesignDR)

data(edesignDR)

Extract significant genes for sets of variables in time series gene expression experiments

Description

This function creates lists of significant genes for a set of variables whose significance value has been computed with the T.fit function.

Usage

get.siggenes(tstep, rsq = 0.7, add.IDs = FALSE, IDs = NULL, matchID.col = 1,
             only.names = FALSE, vars = c("all", "each", "groups"),
	     significant.intercept = "dummy",

             groups.vector = NULL, trat.repl.spots = "none",
             index = IDs[, (matchID.col + 1)], match = IDs[, matchID.col],
	     r = 0.7)

get.siggenes(tstep, rsq = 0.7, add.IDs = FALSE, IDs = NULL, matchID.col = 1,
             only.names = FALSE, vars = c("all", "each", "groups"),
	     significant.intercept = "dummy",

             groups.vector = NULL, trat.repl.spots = "none",
             index = IDs[, (matchID.col + 1)], match = IDs[, matchID.col],
	     r = 0.7)

Arguments

`tstep`	a `T.fit` object
`rsq`	cut-off level at the R-squared value for the stepwise regression fit. Only genes with R-squared more than rsq are selected
`add.IDs`	logical indicating whether to include additional gene id's in the result
`IDs`	matrix contaning additional gene id information (required when `add.IDs` is `TRUE`)
`matchID.col`	number of matching column in matrix IDs for adding genes ids
`only.names`	logical. If `TRUE`, expression values are ommited in the results
`vars`	variables for which to extract significant genes (see details)
`significant.intercept`	experimental groups for which significant intercept coefficients are considered (see details)
`groups.vector`	required when `vars` is `"groups"`.
`trat.repl.spots`	treatment given to replicate spots. Possible values are `"none"` and `"average"`
`index`	argument of the `average.rows` function to use when `trat.repl.spots` is `"average"`
`match`	argument of the `average.rows` function to use when `trat.repl.spots` is `"average"`
`r`	minimun pearson correlation coefficient for replicated spots profiles to be averaged

Details

There are 3 possible values for the vars argument:

"all": generates one single matrix or gene list with all significant genes.

"each": generates as many significant genes extractions as variables in the general regression model. Each extraction contains the significant genes for that variable.

"groups": generates a significant genes extraction for each experimental group.

The difference between "each" and "groups" is that in the first case the variables of the same group (e.g. "TreatmentA" and "time*TreatmentA" ) will be extracted separately and in the second case jointly.

When add.IDs is TRUE, a matrix of gene ids must be provided as argument of IDs, the matchID.col column of which having same levels as in the row names of sig.profiles. The option only.names is TRUE will generate a vector of significant genes or a matrix when add.IDs is set also to TRUE.

When trat.repl.spots is "average", match and index vectors are required for the average.rows function. In gene expression data context, the index vector would contain geneIDs and indicate which spots are replicates. The match vector is used to match these genesIDs to rows in the significant genes matrix, and must have the same levels as the row names of sig.profiles.

The argument significant.intercept modulates the treatment for intercept coefficients to apply for selecting significant genes when vars equals "groups". There are three possible values: "none", no significant intercept (differences) are considered for significant gene selection, "dummy", includes genes with significant intercept differences between control and experimental groups, and "all" when both significant intercept coefficient for the control group and significant intercept differences are considered for selecting significant genes.

add.IDs = TRUE and trat.repl.spots = "average" are not compatible argumet values. add.IDs = TRUE and only.names = TRUE are compatible argumet values.

Value

summary

a vector or matrix listing significant genes for the variables given by the function parameters

sig.genes

a list with detailed information on the significant genes found for the variables given by the function parameters. Each element of the list is also a list containing:

sig.profiles:: expression values of significant genes
coefficients:: regression coefficients of the adjusted models
groups.coeffs:: regression coefficients of the impiclit models of each experimental group
sig.pvalues:: p-values of the regression coefficients for significant genes
g:: number of genes
...:: arguments passed by previous functions

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102

Examples


#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}
## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01)
tc.tstep <- T.fit(data = tc.p , alfa = 0.05)

## This will obtain sigificant genes per experimental group
## which have a regression model Rsquared > 0.9
tc.sigs <- get.siggenes (tc.tstep, rsq = 0.9, vars = "groups")

## This will obtain all sigificant genes regardless the Rsquared value.
## Replicated genes are averaged.
IDs <- rbind(paste("feature", c(1:300), sep = ""),
       rep(paste("gene", c(1:150), sep = ""), each = 2))
tc.sigs.ALL <- get.siggenes (tc.tstep, rsq = 0, vars = "all", IDs = IDs)
tc.sigs.groups <- get.siggenes (tc.tstep, rsq = 0, vars = "groups", significant.intercept="dummy")

#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}
## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01)
tc.tstep <- T.fit(data = tc.p , alfa = 0.05)

## This will obtain sigificant genes per experimental group
## which have a regression model Rsquared > 0.9
tc.sigs <- get.siggenes (tc.tstep, rsq = 0.9, vars = "groups")

## This will obtain all sigificant genes regardless the Rsquared value.
## Replicated genes are averaged.
IDs <- rbind(paste("feature", c(1:300), sep = ""),
       rep(paste("gene", c(1:150), sep = ""), each = 2))
tc.sigs.ALL <- get.siggenes (tc.tstep, rsq = 0, vars = "all", IDs = IDs)
tc.sigs.groups <- get.siggenes (tc.tstep, rsq = 0, vars = "groups", significant.intercept="dummy")

Extract lists of significant isoforms from Differentially Spliced Genes (DSG)

Description

getDS creates lists of significant isoforms from Differentially Spliced Genes (DSG)

Usage

  getDS(Model, vars="all", rsq=0.4)
getDS(Model, vars="all", rsq=0.4)

Arguments

`Model`	a `IsoModel` object
`vars`	argument of the `get.siggenes` function applied to isoforms
`rsq`	cut-off level at the R-squared value for the stepwise regression fit. Only isoforms with R-squared more than rsq are selected

Details

There are 3 possible values for the vars argument: "all", "each" and "groups". See get.siggenes.

Value

In the console a summary of the selection is printed.

`Model`	a `IsoModel` object to be used in the following steps
`get2`	a `get.siggenes` object to be used in the following steps
`DSG`	Names of the selected genes: Differentially Spliced Genes
`DET`	Names of the selected Isoforms: Differentally Expressed Transcripts
`List0`	a list with the names of Differentially Spliced Genes without Isoforms with R-squared higher than rsq
`NumIso.by.gene`	Number of selected Isoforms for each Differentially Spliced Gene

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.

Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)

Myget <- getDS(MyIso)
Myget$DSG
Myget$DET

see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable
data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)

Myget <- getDS(MyIso)
Myget$DSG
Myget$DET

see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable

Lists of genes with Isoforms in different clusters

Description

getDSPatterns is a function that makes a list with the names of genes identified with tableDS function.

Usage

getDSPatterns(tableDS, Cluster.Major, Cluster.minor)
getDSPatterns(tableDS, Cluster.Major, Cluster.minor)

Arguments

`tableDS`	a `tableDS` object
`Cluster.Major`	Number of the cluster where the major isoform belongs to
`Cluster.minor`	Number(s) of the cluster(s) where the minor isoform(s) belongs to (see details)

Details

When minor isoforms belong to different clusters, tableDS codifies them using "&". For instance: clusters 1 and 2, will be represented as "1&2". In such cases quotation marks must be used (see examples). When minor isoforms are only in one cluster there is no need to use quotation marks.

Value

A vector with the names of the genes.

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.

Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable

getDSPatterns(table, 1, 4)
getDSPatterns(table, "1", "4") #will give the same result.

getDSPatterns(table, 1, "1&5")
data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable

getDSPatterns(table, 1, 4)
getDSPatterns(table, "1", "4") #will give the same result.

getDSPatterns(table, 1, "1&5")

Ranks a vector to index

Description

Ranks the values in a vector to sucessive values. Ties are given the same value.

Usage

i.rank(x)
i.rank(x)

Arguments

x

vector

Value

Vector of ranked values

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

Examples

     i.rank(c(1, 1, 1, 3, 3, 5, 7, 7, 7))

i.rank(c(1, 1, 1, 3, 3, 5, 7, 7, 7))

RNA-Seq dataset example for isoforms

Description

ISOdata contains an example of RNA-Seq data at Isoform level.

Usage

data(ISOdata)data(ISOdata)

Format

A data frame with 2782 rows and 37 columns with RNA-Seq data.

Details

Rows correspond to 2782 isoforms belonging to 1000 gene.

First column is the name of the gene each isoform belongs to.

Remaining columns are the RNA-Seq data samples asociated to 3 replicates of 12 experimental conditions.

Examples

data(ISOdata)
data(ISOdesign)

mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
data(ISOdata)
data(ISOdesign)

mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)

Experimental design for ISOdata dataset example

Description

ISOdesign is the experimental design to apply ISOmaSigPro to ISOdata dataset example.

Usage

data(ISOdesign)data(ISOdesign)

Format

A matrix with 36 rows and 4 colums

rownames(ISOdesign) "Gr1_0h_1" "Gr1_0h_2" "Gr1_0h_3" "Gr1_2h_1" "Gr1_2h_2" "Gr1_2h_3" "Gr1_6h_1" "Gr1_6h_2" "Gr1_6h_3" "Gr1_12h_1" "Gr1_12h_2" "Gr1_12h_3" "Gr1_18h_1" "Gr1_18h_2" "Gr1_18h_3" "Gr1_24h_1" "Gr1_24h_2" "Gr1_24h_3" "Gr2_0h_1" "Gr2_0h_2" "Gr2_0h_3" "Gr2_2h_1" "Gr2_2h_2" "Gr2_2h_3" "Gr2_6h_1" "Gr2_6h_2" "Gr2_6h_3" "Gr2_12h_1" "Gr2_12h_2" "Gr2_12h_3" "Gr2_18h_1" "Gr2_18h_2" "Gr2_18h_3" "Gr2_24h_1" "Gr2_24h_2" "Gr2_24h_3"

colnames(ISOdesign) "time" "replicate" "Group1" "Group2"

Details

Samples are given in rows and experiment descriptors are given in columns. Row names contain sample names.

"time" indicates the values that variable Time takes in each experimental condition. There are 6 time points.

"replicate" is an index indicating the same experimental condition.

"Group1" and "Group2" columns indicate assigment to experimental groups, coding with 1 and 0 whether each sample belongs to that group or not.

Examples

data(ISOdata)
data(ISOdesign)

mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)

data(ISOdata)
data(ISOdesign)

mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)

Detection of genes with Isoforms with different gene expression in time course experiments

Description

IsoModel Performs a model comparison for each gene to detect genes with different trends in time course experiments and applies maSigPro to the Isoforms belonging to selected genes.

Usage

IsoModel(data, gen, design = NULL, Q = 0.05, min.obs = 6, minorFoldfilter = NULL,
    counts = FALSE, family = NULL, theta = 10, epsilon = 1e-05)
IsoModel(data, gen, design = NULL, Q = 0.05, min.obs = 6, minorFoldfilter = NULL,
    counts = FALSE, family = NULL, theta = 10, epsilon = 1e-05)

Arguments

`data`	matrix containing isoform expression. Isoforms must be in rows and experimental conditions in columns
`gen`	vector with the name of the gene each isoform belongs to
`design`	design matrix for the regression fit such as that generated by the `make.design.matrix` function
`Q`	significance level
`min.obs`	cases with less than this number of true numerical values will be excluded from the analysis. Minimum value to estimate the model is (degree+1)xGroups+1. Default is 6.
`minorFoldfilter`	fold expression difference between minor isoforms and the most expressed isoform to exclude minor isoforms from analysis. Default NULL
`counts`	a logical indicating whether your data are counts
`family`	the distribution function to be used in the glm model. It must be specified as a function: gaussian(), poisson(), negative.binomial(theta)... If NULL family will be negative.binomial(theta) when counts=TRUE or gaussian() when counts=FALSE
`theta`	theta parameter for negative.binomial family
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model

Details

rownames(design) and colnames(data) must be identical vectors and indicate experimental condition names.

rownames(data) should contain unique isoform IDs.

colnames(design) are the given names for the variables in the regression model.

Value

`data`	input data matrix to be used in the following steps
`gen`	input gen vector to be used in the following steps
`design`	input design matrix to be used in the following steps
`DSG`	Names of the selected genes: Differentially Spliced Genes
`pvector`	p.vector output of isoforms that belong to selected.genes
`Tfit`	Tfit output of isoforms that belong to selected.genes

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.

Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Conesa, A., Nueda M.J., Ferrer, A., Talon, T. 2006. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments. Bioinformatics 22, 1096-1102.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable
data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable

Plotting the isoform profiles of a specific gene by groups

Description

This function makes a plot with the isoforms of a specific gene splitting the different experimental groups.

Usage

IsoPlot(get, name, only.sig.iso=FALSE, ylim=NULL, xlab = "Time",
ylab = "Expression value", points=TRUE, cex.main=3,cex.legend=1.5)
IsoPlot(get, name, only.sig.iso=FALSE, ylim=NULL, xlab = "Time",
ylab = "Expression value", points=TRUE, cex.main=3,cex.legend=1.5)

Arguments

`get`	a `getDS` object a cluster of flat Isoform
`name`	Name of the specific gen to show in the plot
`only.sig.iso`	TRUE when the plot is made only with statistically significant isoforms.
`ylim`	Range of the y axis of the desired plot. If it is NULL it will be computed automatically.
`xlab`	label for the x axis
`ylab`	label for the y axis
`points`	TRUE to plot points and lines. FALSE to plot only lines.
`cex.main`	graphical parameter magnification to be used for main
`cex.legend`	graphical parameter magnification to be used for legend

Details

The plot can be made with all the available isoforms or only with the statistilly significant ones.

Value

Plot of isoform profiles of a specific gene by groups.

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.

Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)

IsoPlot(Myget,"Gene1005",only.sig.iso=FALSE,cex.main=2,cex.legend=1)

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)

IsoPlot(Myget,"Gene1005",only.sig.iso=FALSE,cex.main=2,cex.legend=1)

Make a design matrix for regression fit of time series gene expression experiments

Description

make.design.matrix creates the design matrix of dummies for fitting time series micorarray gene expression experiments.

Usage

make.design.matrix(edesign, degree = 2, time.col = 1,
                   repl.col = 2, group.cols = c(3:ncol(edesign)))
make.design.matrix(edesign, degree = 2, time.col = 1,
                   repl.col = 2, group.cols = c(3:ncol(edesign)))

Arguments

`edesign`	matrix describing experimental design. Rows must be arrays and columns experiment descriptors
`degree`	the degree of the regression fit polynome. `degree` = 1 returns linear regression, `degree` = 2 returns quadratic regression, etc
`time.col`	column number in edesign containing time values. Default is first column
`repl.col`	column number in edesign containing coding for replicate arrays. Default is second column
`group.cols`	column numbers in edesign indicating the coding for each experimental group (treatment, tissue, ...). See details

Details

rownames of edesign object should contain the arrays naming (i.e. array1, array2, ...). colnames of edesign must contain the names of experiment descriptors(i.e. "Time", "Replicates", "Treatment A", "Treatment B", etc.). for each experimental group a different column must be present in edesign, coding with 1 and 0 whether each array belongs to that group or not.

make.design.matrix returns a design matrix where rows represent arrays and column variables of time, dummies and their interactions for up to the degree given. Dummies show the relative effect of each experimental group related to the first one. Single dummies indicate the abcissa component of each group. $Time*dummy$ variables indicate slope changes, $Time^2*dummy$ indicates curvature changes. Higher grade values could model complex responses. In case experimental groups share a initial state (i.e. common time 0), no single dummies are modeled.

Value

`dis`	design matrix of dummies for fitting time series
`groups.vector`	vector coding the experimental group to which each variable belongs to
`edesign`	`edesign` value passed as argument

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Examples

data(edesign.abiotic, edesignCT)
make.design.matrix(edesign.abiotic)  # quadratic model
make.design.matrix(edesignCT, degree = 3)  # cubic model with common starting time point
data(edesign.abiotic, edesignCT)
make.design.matrix(edesign.abiotic)  # quadratic model
make.design.matrix(edesignCT, degree = 3)  # cubic model with common starting time point

View maSigPro User's Guide

Description

Finds the location of the maSigPro User's Guide and opens it.

Usage

maSigProUsersGuide(view=TRUE)maSigProUsersGuide(view=TRUE)

Arguments

view

logical, to specify if the document is opened using the PDF document reader.

Details

The function vignette("maSigPro") will find the short maSigPro Vignette which describes how to obtain the maSigPro User's Guide. The User's Guide is not itself a true vignette because it is not automatically generated using Sweave during the package build process. This means that it cannot be found using vignette, hence the need for this special function.

If the operating system is other than Windows, then the PDF viewer used is that given by Sys.getenv("R_PDFVIEWER"). The PDF viewer can be changed using Sys.putenv(R_PDFVIEWER=).

Value

If vignette(view=TRUE), the PDF document reader is started and the User's Guide is opened. If vignette(view=FALSE), returns the file location.

Examples

maSigProUsersGuide()
maSigProUsersGuide(view=FALSE)
maSigProUsersGuide()
maSigProUsersGuide(view=FALSE)

RNA-Seq dataset example

Description

NBdata contains a subset of a bigger normalized negative binomial simulated dataset.

Usage

data(NBdata)data(NBdata)

Format

A data frame with 100 observations on 36 numeric variables.

Details

This dataset is part of a larger simulated and normalized dataset with 2 experimental groups, 6 time-points and 3 replicates. Simulation has been done by using a negative binomial distribution. The first 20 genes are simulated with changes among time.

Examples

data(NBdata)

data(NBdata)

Experimental design for RNA-Seq example

Description

NBdesign contains a subset of a bigger normalized negative binomial simulated dataset.

Usage

data(NBdesign)data(NBdesign)

Format

A matrix with 36 rows and 4 colums

rows [1:36] "G1.T1.1" "G1.T1.2" "G1.T1.3" "G1.T2.1" ...

columns [1:6] [1] "Time" "Replicates" "Group.1" "Group.2"

Details

Samples are given in rows and experiment descriptors are given in columns. Row names contain sample names.

"Time" indicates the values that variable Time takes in each experimental condition. There are 6 time points.

"Replicates" is an index indicating the same experimental condition.

"Group.1" and "Group.2" columns indicate assigment to experimental groups, coding with 1 and 0 whether each sample belongs to that group or not.

Examples

data(NBdesign)

data(NBdesign)

Make regression fit for time series gene expression experiments

Description

p.vector performs a regression fit for each gene taking all variables present in the model given by a regression matrix and returns a list of FDR corrected significant genes.

Usage

p.vector(data, design, Q = 0.05, MT.adjust = "BH", min.obs = 6,
counts=FALSE, family=NULL, theta=10, epsilon=0.00001, item="gene")
p.vector(data, design, Q = 0.05, MT.adjust = "BH", min.obs = 6,
counts=FALSE, family=NULL, theta=10, epsilon=0.00001, item="gene")

Arguments

`data`	matrix containing normalized gene expression data. Genes must be in rows and arrays in columns
`design`	design matrix for the regression fit such as that generated by the `make.design.matrix` function
`Q`	significance level
`MT.adjust`	argument to pass to `p.adjust` function indicating the method for multiple testing adjustment of p.value
`min.obs`	genes with less than this number of true numerical values will be excluded from the analysis. Minimum value to estimate the model is (degree+1)xGroups+1. Default is 6.
`counts`	a logical indicating whether your data are counts
`family`	the distribution function to be used in the glm model. It must be specified as a function: gaussian(), poisson(), negative.binomial(theta)... If NULL family will be negative.binomial(theta) when counts=TRUE or gaussian() when counts=FALSE
`theta`	theta parameter for negative.binomial family
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model
`item`	Name of the analysed item to show in the screen while p.vector is in process

Details

rownames(design) and colnames(data) must be identical vectors and indicate array naming.

rownames(data) should contain unique gene IDs.

colnames(design) are the given names for the variables in the regression model.

Value

`SELEC`	matrix containing the expression values for significant genes
`p.vector`	vector containing the computed p-values
`G`	total number of input genes
`g`	number of genes taken in the regression fit
`FDR`	p-value at FDR `Q` control when Benajamini & Holderberg (BH) correction is used
`i`	number of significant genes
`dis`	design matrix used in the regression fit
`dat`	matrix of expression value data used in the regression fit
`...`	additional values from input parameters

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Examples

#### GENERATE TIME COURSE DATA
## generates n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c2 = 1.3, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.05)
tc.p$i # number of significant genes
tc.p$SELEC # expression value of signficant genes
tc.p$FDR # p.value at FDR control
tc.p$p.adjusted# adjusted p.values

#### GENERATE TIME COURSE DATA
## generates n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c2 = 1.3, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.05)
tc.p$i # number of significant genes
tc.p$SELEC # expression value of signficant genes
tc.p$FDR # p.value at FDR control
tc.p$p.adjusted# adjusted p.values

Function for plotting gene expression profile at different experimental groups

Description

This function displays the gene expression profile for each experimental group in a time series gene expression experiment.

Usage

PlotGroups(data, edesign = NULL, time = edesign[, 1],
groups = edesign[,c(3:ncol(edesign))], repvect = edesign[, 2],
show.lines = TRUE, show.fit = FALSE, dis = NULL,
step.method = "backward", min.obs = 2, alfa = 0.05,
nvar.correction = FALSE, summary.mode = "median",
groups.vector = NULL, main = NULL, sub = NULL, xlab = "Time",
ylab = "Expression value", item = NULL, ylim = NULL, pch = 21,
col = NULL, legend = TRUE, cex.legend = 1,lty.legend = NULL,... )
PlotGroups(data, edesign = NULL, time = edesign[, 1],
groups = edesign[,c(3:ncol(edesign))], repvect = edesign[, 2],
show.lines = TRUE, show.fit = FALSE, dis = NULL,
step.method = "backward", min.obs = 2, alfa = 0.05,
nvar.correction = FALSE, summary.mode = "median",
groups.vector = NULL, main = NULL, sub = NULL, xlab = "Time",
ylab = "Expression value", item = NULL, ylim = NULL, pch = 21,
col = NULL, legend = TRUE, cex.legend = 1,lty.legend = NULL,... )

Arguments

`data`	vector or matrix containing the gene expression data
`edesign`	matrix describing experimental design. Rows must be arrays and columns experiment descriptors
`time`	vector indicating time assigment for each array
`groups`	matrix indicating experimental group to which each array is assigned
`repvect`	index vector indicating experimental replicates
`show.lines`	logical indicating whether a line must be drawn joining plotted data points for reach group
`show.fit`	logical indicating whether regression fit curves must be plotted
`dis`	regression design matrix
`step.method`	stepwise regression method to fit models for cluster mean profiles. It can be either `"backward"`, `"forward"`, `"two.ways.backward"` or `"two.ways.forward"`
`min.obs`	minimal number of observations for a gene to be included in the analysis
`alfa`	significance level used for variable selection in the stepwise regression
`nvar.correction`	argument for correcting stepwise regression significance level. See `T.fit`
`summary.mode`	the method to condensate expression information when more than one gene is present in the data. Possible values are `"representative"` and `"median"`
`groups.vector`	vector indicating experimental group to which each variable belongs
`main`	plot main title
`sub`	plot subtitle
`xlab`	label for the x axis
`ylab`	label for the y axis
`item`	name of the analysed items to show
`ylim`	range of the y axis
`pch`	integer specifying type of points to plot
`col`	a vector specifying colours to plot. If missing first naturals will be used
`legend`	logical indicating whether legend must be added when plotting profiles
`cex.legend`	Expansion factor for legend
`lty.legend`	To add a coloured line in the legend
`...`	other graphical function argument

Details

To compute experimental groups either a edesign object must be provided, or separate values must be given for the time, repvect and groups arguments.

When data is a matrix, the average expression value is displayed.

When there are array replicates in the data (as indicated by repvect), values are averaged by repvect.

PlotGroups plots one single expression profile for each experimental group even if there are more that one genes in the data set. The way data is condensated for this is given by summary.mode. When this argument takes the value "representative", the gene with the lowest distance to all genes in the cluster will be plotted. When the argument is "median", then median expression value is computed.

When show.fit is TRUE the stepwise regression fit for the data will be computed and the regression curves will be displayed.

If data is a matrix of genes and summary.mode is "median", the regression fit will be computed for the median expression value.

Value

Plot of gene expression profiles by-group.

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples


#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
rownames(tc.DATA) <- paste("gene", c(1:10), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Ctl <- c(rep(1, 9), rep(0, 27))
Tr1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Tr2 <- c(rep(0, 18), rep(1, 9), rep(0, 9))
Tr3 <- c(rep(0, 27), rep(1, 9))

PlotGroups (tc.DATA, time = Time, repvect = Replicates, groups = cbind(Ctl, Tr1, Tr2, Tr3))

#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
rownames(tc.DATA) <- paste("gene", c(1:10), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Ctl <- c(rep(1, 9), rep(0, 27))
Tr1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Tr2 <- c(rep(0, 18), rep(1, 9), rep(0, 9))
Tr3 <- c(rep(0, 27), rep(1, 9))

PlotGroups (tc.DATA, time = Time, repvect = Replicates, groups = cbind(Ctl, Tr1, Tr2, Tr3))

Function for visualization of gene expression profiles

Description

PlotProfiles displays the expression profiles of a group of genes.

Usage

PlotProfiles(data, cond, cex.axis = 0.5, ylim = NULL, repvect,
main = NULL, sub = NULL, color.mode = "rainbow", item = NULL)
PlotProfiles(data, cond, cex.axis = 0.5, ylim = NULL, repvect,
main = NULL, sub = NULL, color.mode = "rainbow", item = NULL)

Arguments

`data`	a matrix containing the gene expression data
`cond`	vector for x axis labeling, typically array names
`cex.axis`	graphical parameter maginfication to be used for x axis in plotting functions
`ylim`	index vector indicating experimental replicates
`repvect`	index vector indicating experimental replicates
`main`	plot main title
`sub`	plot subtitle
`color.mode`	color scale for plotting profiles. Can be either `"rainblow"` or `"gray"`
`item`	Name of the analysed items to show

Details

The repvect argument is used to indicate with vertical lines groups of replicated arrays.

Value

Plot of experiment-wide gene expression profiles.

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples

#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
rownames(tc.DATA) <- paste("gene", c(1:10), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")

PlotProfiles (tc.DATA, cond = colnames(tc.DATA), main = "Time Course",
              repvect = rep(c(1:12), each = 3))

#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
tc.DATA <- tc.GENE(n = 10,r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
rownames(tc.DATA) <- paste("gene", c(1:10), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")

PlotProfiles (tc.DATA, cond = colnames(tc.DATA), main = "Time Course",
              repvect = rep(c(1:12), each = 3))

Detection of Genes with switchs of their major isoforms

Description

This function provides lists of genes that have different Major isoforms (most expressed) when different intervals of the experimental conditions are considered.

The subrange of the experimental conditions can be chosen as a specific point, all the points of a specific experimental group or at any point.

Usage

PodiumChange(get, only.sig.iso=FALSE,  comparison=c("any",
"groups","specific"), group.name="Ctr", time.points=0)
PodiumChange(get, only.sig.iso=FALSE,  comparison=c("any",
"groups","specific"), group.name="Ctr", time.points=0)

Arguments

`get`	a `getDS` object a cluster of flat Isoform
`only.sig.iso`	TRUE when changes are looked for only through statistically significant isoforms.
`comparison`	Type of search to do: any, groups or specific (see details).
`group.name`	required when `comparison` is `"specific"`.
`time.points`	required when `comparison` is `"specific"`.

Details

There are 3 possible values for the comparison argument:

"any": Detects genes with Major Isoform changes in at least one experimental condition.

"groups": Detects genes with different Major Isoform for different experimental groups.

"specific": Detects genes with Major Isoform changes in a specific time interval, especified in time.points argument and a specific experimental group, especified in group.name argument.

Value

`L`	Names of the genes with PodiumChange Isoforms
`data.L`	Data values of all the isoforms belonging to the genes in L
`gen.L`	gen vector with the name of the gene of each isoform
`edesign`	matrix describing experimental design needed to visualize PodiumChange selection with `IsoPlot` function. It is the input of `make.design.matrix`.

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526. Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis,
counts=TRUE)
Myget <- getDS(MyIso)

PC <- PodiumChange(Myget, only.sig.iso=TRUE,
comparison="specific", group.name="Group2", time.points=c(18,24))
PC$L

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis,
counts=TRUE)
Myget <- getDS(MyIso)

PC <- PodiumChange(Myget, only.sig.iso=TRUE,
comparison="specific", group.name="Group2", time.points=c(18,24))
PC$L

Column position of a variable in a data frame

Description

Finds the column position of a character variable in the column names of a data frame.

Usage

position(matrix, vari)
position(matrix, vari)

Arguments

`matrix`	matrix or data.frame with character column names
`vari`	character variable

Value

numerical. Column position for the given variable.

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

Examples

x <- matrix(c(1, 1, 2, 2, 3, 3),ncol = 3,nrow = 2)
colnames(x) <- c("one", "two", "three")
position(x, "one")

x <- matrix(c(1, 1, 2, 2, 3, 3),ncol = 3,nrow = 2)
colnames(x) <- c("one", "two", "three")
position(x, "one")

Calculate true variables regression coefficients

Description

reg.coeffs calculates back regression coefficients for true variables (experimental groups) from dummy variables regression coefficients.

Usage

reg.coeffs(coefficients,
indepen = groups.vector[nchar(groups.vector)==min(nchar(groups.vector))][1],
groups.vector,  group)
reg.coeffs(coefficients,
indepen = groups.vector[nchar(groups.vector)==min(nchar(groups.vector))][1],
groups.vector,  group)

Arguments

`coefficients`	vector of regression coefficients obtained from a regression model with dummy variables
`indepen`	idependent variable of the regression formula
`groups.vector`	vector indicating the true variable of each variable in `coefficients`
`group`	true variable for which regression coefficients are to be computed

Details

regression coefficients in coefficients vector should be ordered by polynomial degree in a regression formula, ie: intercept, $x$ term, $x^2$ term, $x^3$ term, and so on...

Value

reg.coeff

vector of calculated regression coefficients

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples

groups.vector <-c("CT", "T1vsCT", "T2vsCT", "CT", "T1vsCT","T2vsCT", "CT", "T1vsCT", "T2vsCT")
coefficients <- c(0.1, 1.2, -0.8, 1.7, 3.3, 0.4, 0.0, 2.1, -0.9)
## calculate true regression coefficients for variable "T1"
reg.coeffs(coefficients, groups.vector = groups.vector, group = "T1")
groups.vector <-c("CT", "T1vsCT", "T2vsCT", "CT", "T1vsCT","T2vsCT", "CT", "T1vsCT", "T2vsCT")
coefficients <- c(0.1, 1.2, -0.8, 1.7, 3.3, 0.4, 0.0, 2.1, -0.9)
## calculate true regression coefficients for variable "T1"
reg.coeffs(coefficients, groups.vector = groups.vector, group = "T1")

Wrapper function for visualization of gene expression values of time course experiments

Description

This function provides visualisation tools for gene expression values in a time course experiment. The function first calls the heatmap function for a general overview of experiment results. Next a partioning of the data is generated using a clustering method. The results of the clustering are visualized both as gene expression profiles extended along all arrays in the experiment, as provided by the plot.profiles function, and as summary expression profiles for comparison among experimental groups.

Usage

see.genes(data, edesign = data$edesign, time.col = 1, repl.col = 2, 
    group.cols = c(3:ncol(edesign)), names.groups = colnames(edesign)[3:ncol(edesign)], 
    cluster.data = 1, groups.vector = data$groups.vector, k = 9, k.mclust=FALSE,   
    cluster.method = "hclust", distance = "cor", agglo.method = "ward.D",
    show.lines = TRUE, show.fit = FALSE, dis = NULL, step.method = "backward", 
    min.obs = 3, alfa = 0.05, nvar.correction = FALSE, iter.max = 500, 
    summary.mode = "median", color.mode = "rainbow", ylim = NULL, item = "genes", 
    legend = TRUE, cex.legend = 1, lty.legend = NULL,...) 
see.genes(data, edesign = data$edesign, time.col = 1, repl.col = 2, 
    group.cols = c(3:ncol(edesign)), names.groups = colnames(edesign)[3:ncol(edesign)], 
    cluster.data = 1, groups.vector = data$groups.vector, k = 9, k.mclust=FALSE,   
    cluster.method = "hclust", distance = "cor", agglo.method = "ward.D",
    show.lines = TRUE, show.fit = FALSE, dis = NULL, step.method = "backward", 
    min.obs = 3, alfa = 0.05, nvar.correction = FALSE, iter.max = 500, 
    summary.mode = "median", color.mode = "rainbow", ylim = NULL, item = "genes", 
    legend = TRUE, cex.legend = 1, lty.legend = NULL,...)

Arguments

`data`	either matrix or a list containing the gene expression data, typically a `get.siggenes` object
`edesign`	matrix of experimental design
`time.col`	column in edesign containing time values. Default is first column
`repl.col`	column in edesign containing coding for replicates arrays. Default is second column
`group.cols`	columns indicating the coding for each group (treatment, tissue,...) in the experiment (see details)
`names.groups`	names for experimental groups
`cluster.data`	type of data used by the cluster algorithm (see details)
`groups.vector`	vector indicating the experimental group to which each variable belongs
`k`	number of clusters for data partioning
`k.mclust`	TRUE for computing the optimal number of clusters with Mclust algorithm
`cluster.method`	clustering method for data partioning. Currently `"hclust"`, `"kmeans"` and `"Mclust"` are supported
`distance`	distance measurement function when `cluster.method` is `hclust`
`agglo.method`	aggregation method used when `cluster.method` is `hclust`
`show.lines`	logical indicating whether a line must be drawn joining plotted data points for reach group
`show.fit`	logical indicating whether regression fit curves must be plotted
`dis`	regression design matrix
`step.method`	stepwise regression method to fit models for cluster mean profiles. Can be either `"backward"`, `"forward"`, `"two.ways.backward"` or `"two.ways.forward"`
`min.obs`	minimal number of observations for a gene to be included in the analysis
`alfa`	significance level used for variable selection in the stepwise regression
`nvar.correction`	argument for correcting `T.fit`significance level. See `T.fit`
`iter.max`	maximum number of iterations when `cluster.method` is `kmeans`
`summary.mode`	the method `PlotGroups` takes to condensate expression information when more than one gene is present in the data. Possible values are `"representative"` and `"median"`
`color.mode`	color scale for plotting profiles. Can be either `"rainblow"` or `"gray"`
`ylim`	range of the y axis to be used by `PlotProfiles` and `PlotGroups`
`item`	Name of the analysed items to show
`legend`	logical indicating whether legend must be added when plotting profiles
`cex.legend`	Expansion factor for legend
`lty.legend`	To add a coloured line in the legend
`...`	other graphical function argument

Details

Data can be provided either as a single data matrix of expression values, or a get.siggenes object. In the later case the other argument of the fuction can be taken directly from data.

Data clustering can be done on the basis of either the original expression values, the regression coefficients, or the t.scores. In case data is a get.siggenes object, this is given by providing the element names of the list c("sig.profiles","coefficients","t.score") of their list position (1,2 or 3).

Value

Experiment wide gene profiles and by group profiles plots are generated for each data cluster in the graphical device.

`cut`	vector indicating gene partioning into clusters
`c.algo.used`	clustering algorith used for data partioning
`groups`	groups matrix used for plotting functions

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Examples


#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with 
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

see.genes(tc.DATA, edesign = edesign, k = 4)

# This will show the regression fit curve
dise <- make.design.matrix(edesign)
see.genes(tc.DATA, edesign = edesign, k = 4, show.fit = TRUE, 
          dis = dise$dis, groups.vector = dise$groups.vector, distance = "euclidean")

#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with 
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

see.genes(tc.DATA, edesign = edesign, k = 4)

# This will show the regression fit curve
dise <- make.design.matrix(edesign)
see.genes(tc.DATA, edesign = edesign, k = 4, show.fit = TRUE, 
          dis = dise$dis, groups.vector = dise$groups.vector, distance = "euclidean")

Wrapper function for visualization of significant isoforms from Differentially Spliced Genes

Description

seeDS This function provides visualisation tools for Significant Isoforms in a time course experiment. The function calls the see.genes function for selected Isoforms. This cluster will be the reference in tableDS function to identify the trends that follows the isoforms of a specific gene.

Usage

 seeDS(get, rsq=0.4, cluster.all=TRUE, plot.mDSG=FALSE, k=6,
 cluster.method="hclust", k.mclust=FALSE, ...)
seeDS(get, rsq=0.4, cluster.all=TRUE, plot.mDSG=FALSE, k=6,
 cluster.method="hclust", k.mclust=FALSE, ...)

Arguments

`get`	a `getDS` object a cluster of flat Isoform
`rsq`	Required when cluster.all=TRUE. It is the cut-off level at the R-squared value for detecting significant isoforms of all the genome.
`cluster.all`	TRUE to make the cluster with significant isoforms of all the genome. FALSE to make the cluster with significant isoforms of Differentially Spliced Genes.
`plot.mDSG`	TRUE to make a cluster with the Isoforms belonging to monoIsoform genes
`k`	number of clusters for data partioning
`cluster.method`	clustering method for data partioning. Currently `"hclust"`, `"kmeans"` and `"Mclust"` are supported
`k.mclust`	TRUE for computing the optimal number of clusters with Mclust algorithm
`...`	other graphical function argument

Details

The cluster reference can be made with significant isoforms of all the genome or with the isoforms belonging to the Differentially Spliced Genes.

Alternatively a cluster of monoIsoforms can be asked.

Next a partioning of the data is generated using a clustering method.

The results of the clustering are visualized in two plots as in see.genes.

Value

Experiment wide Isoform profiles and by group profiles plots are generated for each data cluster in the graphical device.

`Model`	a `IsoModel` object to be used in the following steps
`get`	a `get.siggenes` object to be used in the following steps
`NumIso.by.gene`	Number of selected Isoforms for each Differentially Spliced Gene
`cut`	vector indicating gene partioning into clusters
`names.genes`	vector with the name of the gene each selected isoform belongs to

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.

Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)

table <- tableDS(see)
table$IsoTable
data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)

table <- tableDS(see)
table$IsoTable

Fitting a linear model by backward-stepwise regression

Description

stepback fits a linear regression model applying a backward-stepwise strategy.

Usage

stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )

Arguments

`y`	dependent variable
`d`	data frame containing by columns the set of variables that could be in the selected model
`alfa`	significance level to decide if a variable stays or not in the model
`family`	the distribution function to be used in the glm model
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model

Details

The strategy begins analysing a model with all the variables included in d. If all variables are statistically significant (all variables have a p-value less than alfa) this model will be the result. If not, the less statistically significant variable will be removed and the model is re-calculated. The process is repeated up to find a model with all the variables statistically significant.

Value

stepback returns an object of the class lm, where the model uses y as dependent variable and all the selected variables from d as independent variables.

The function summary are used to obtain a summary and analysis of variance table of the results. The generic accessor functions coefficients, effects, fitted.values and residuals extract various useful features of the value returned by lm.

Author(s)

Ana Conesa, [email protected]; Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples


## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, 
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- stepback(y = y, d = dis)
summary(s.fit)


## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, 
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- stepback(y = y, d = dis)
summary(s.fit)

Fitting a linear model by forward-stepwise regression

Description

stepfor fits a linear regression model applying forward-stepwise strategy.

Usage

stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001  )
stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001  )

Arguments

`y`	dependent variable
`d`	data frame containing by columns the set of variables that could be in the selected model
`alfa`	significance level to decide if a variable stays or not in the model
`family`	the distribution function to be used in the glm model
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model

Details

The strategy begins analysing all the possible models with only one of the variables included in d. The most statistically significant variable (with the lowest p-value) is included in the model and then it is considered to introduce in the model another variable analysing all the possible models with two variables (the selected variable in the previous step plus a new variable). Again the most statistically significant variable (with lowest p-value) is included in the model. The process is repeated till there are no more statistically significant variables to include.

Value

stepfor returns an object of the class lm, where the model uses y as dependent variable and all the selected variables from d as independent variables.

Author(s)

Ana Conesa, [email protected]; Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples


## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, 
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- stepfor(y = y, d = dis)
summary(s.fit)

## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040, 
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- stepfor(y = y, d = dis)
summary(s.fit)

Creates a Venn Diagram from a matrix of characters

Description

suma2Venn transforms a matrix or a data frame with characters into a list to draw and display a Venn diagram with up to 7 sets

Usage

suma2Venn(x, size = 30, cexil = 0.9, cexsn = 1, zcolor = heat.colors(ncol(x)), ...)
suma2Venn(x, size = 30, cexil = 0.9, cexsn = 1, zcolor = heat.colors(ncol(x)), ...)

Arguments

`x`	matrix or data frame of character values
`size`	Plot size, in centimeters
`cexil`	Character expansion for the intersection labels
`cexsn`	Character expansion for the set names
`zcolor`	A vector of colors for the custom zones
`...`	Additional plotting arguments for the venn function

Details

suma2Venn creates a list with the columns of a matrix or a data frame of characters which can be taken by the venn to generate a Venn Diagram

Value

suma2Venn returns a Venn Plot such as that created by the venn function

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

Examples

A <- c("a","b","c", "d", "e", NA, NA)
B <- c("a","b","f", NA, NA, NA, NA)
C <- c("a","b","e","f", "h", "i", "j", "k")
x <- cbind(A, B, C)
suma2Venn(x)

A <- c("a","b","c", "d", "e", NA, NA)
B <- c("a","b","f", NA, NA, NA, NA)
C <- c("a","b","e","f", "h", "i", "j", "k")
x <- cbind(A, B, C)
suma2Venn(x)

Makes a stepwise regression fit for time series gene expression experiments

Description

T.fit selects the best regression model for each gene using stepwise regression.

Usage

T.fit(data, design = data$dis, step.method = "backward",
      min.obs = data$min.obs, alfa = data$Q,
      nvar.correction = FALSE, family = gaussian(),
      epsilon=0.00001, item="gene")
T.fit(data, design = data$dis, step.method = "backward",
      min.obs = data$min.obs, alfa = data$Q,
      nvar.correction = FALSE, family = gaussian(),
      epsilon=0.00001, item="gene")

Arguments

`data`	can either be a `p.vector` object or a matrix containing expression data with the same requirements as for the `p.vector` function
`design`	design matrix for the regression fit such as that generated by the `make.design.matrix` function. If data is a `p.vector` object, the same design matrix is used by default
`step.method`	argument to be passed to the step function. Can be either `"backward"`, `"forward"`, `"two.ways.backward"` or `"two.ways.forward"`
`min.obs`	genes with less than this number of true numerical values will be excluded from the analysis
`alfa`	significance level used for variable selection in the stepwise regression
`nvar.correction`	argument for correcting T.fit significance level. See details
`family`	the distribution function to be used in the glm model. It must be the same used in p.vector
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model
`item`	Name of the analysed item to show in the screen while T.fit is in process

Details

In the maSigPro approach p.vector and T.fit are subsequent steps, meaning that significant genes are first selected on the basis of a general model and then the significant variables for each gene are found by step-wise regression.

The step regression can be "backward" or "forward" indicating whether the step procedure starts from the model with all or none variables. With the "two.ways.backward" or "two.ways.forward" options the variables are both allowed to get in and out. At each step the p-value of each variable is computed and variables get in/out the model when this p-value is lower or higher than given threshold alfa. When nva.correction is TRUE the given significance level is corrected by the number of variables in the model

Value

`sol`	matrix for summary results of the stepwise regression. For each selected gene the following values are given: p-value of the regression ANOVA R-squared of the model p-value of the regression coefficients of the selected variables
`sig.profiles`	expression values for the genes contained in `sol`
`coefficients`	matrix containing regression coefficients for the adjusted models
`groups.coeffs`	matrix containing the coefficients of the impiclit models of each experimental group
`variables`	variables in the complete regression model
`G`	total number of input genes
`g`	number of genes taken in the regression fit
`dat`	input analysis data matrix
`dis`	regression design matrix
`step.method`	imputed step method for stepwise regression
`edesign`	matrix of experimental design
`influ.info`	data frame of genes containing influencial data

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Examples


#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

## run T.fit from a p.vector object
tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01)
tc.tstep <- T.fit(data = tc.p , alfa = 0.05)

## run T.fit from a data matrix and a design matrix
dise <- make.design.matrix(edesign)
tc.tstep <- T.fit (data = tc.DATA[271:300,], design = dise$dis,
                   step.method = "two.ways.backward", min.obs = 10, alfa = 0.05)
tc.tstep$sol # gives the p.values of the significant
             # regression coefficients of the optimized models
#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.

tc.GENE <- function(n, r,
             var11 = 0.01, var12 = 0.01,var13 = 0.01,
             var21 = 0.01, var22 = 0.01, var23 =0.01,
             var31 = 0.01, var32 = 0.01, var33 = 0.01,
             var41 = 0.01, var42 = 0.01, var43 = 0.01,
             a1 = 0, a2 = 0, a3 = 0, a4 = 0,
             b1 = 0, b2 = 0, b3 = 0, b4 = 0,
             c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{

  tc.dat <- NULL
  for (i in 1:n) {
    Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13))  # Ctl group
    Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23))  # Tr1 group
    Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33))  # Tr2 group
    Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43))  # Tr3 group
    gene <- c(Ctl, Tr1, Tr2, Tr3)
    tc.dat <- rbind(tc.dat, gene)
  }
  tc.dat
}

## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA [sample(c(1:(300*36)), 300)] <- NA  # introduce missing values

#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")

## run T.fit from a p.vector object
tc.p <- p.vector(tc.DATA, design = make.design.matrix(edesign), Q = 0.01)
tc.tstep <- T.fit(data = tc.p , alfa = 0.05)

## run T.fit from a data matrix and a design matrix
dise <- make.design.matrix(edesign)
tc.tstep <- T.fit (data = tc.DATA[271:300,], design = dise$dis,
                   step.method = "two.ways.backward", min.obs = 10, alfa = 0.05)
tc.tstep$sol # gives the p.values of the significant
             # regression coefficients of the optimized models

Identification of Mayor and minor Isoforms in the clusters

Description

tableDS identifies for each Differentialy Spliced Gene (DSG) the clusters where their isoforms belong to, labelling gene transcripts as mayor (or most expressed) and minor.

Usage

 tableDS(seeDS)
tableDS(seeDS)

Arguments

seeDS

a seeDS object

Details

This table includes DSG with 2 or more Isoforms. Mono isoform genes are useful to determine the trends of the cluster. However, as they have only one Isoform, there is not the possibility of comparing minor and major DETs.

Value

`IsoTable`	A classification table that indicates the distribution of isoforms across diferent clusters
`IsoClusters`	A data.frame with genes in rows and two columns: first indicates the number of cluster of the major isoform and second the number(s) of cluster(s) of the minor isoforms.

Author(s)

Maria Jose Nueda, [email protected]

References

Nueda, M.J., Martorell, J., Marti, C., Tarazona, S., Conesa, A. 2018. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics. 34, 3, 524-526.

Nueda, M.J., Tarazona, S., Conesa, A. 2014. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics, 30, 2598-602.

Examples

data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable
data(ISOdata)
data(ISOdesign)
mdis <- make.design.matrix(ISOdesign)
MyIso <- IsoModel(data=ISOdata[,-1], gen=ISOdata[,1], design=mdis, counts=TRUE)
Myget <- getDS(MyIso)
see <- seeDS(Myget, cluster.all=FALSE, k=6)
table <- tableDS(see)
table$IsoTable

Fitting a linear model by backward-stepwise regression

Description

two.ways.stepback fits a linear regression model applying backward-stepwise strategy.

Usage

two.ways.stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001)
two.ways.stepback(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001)

Arguments

`y`	dependent variable
`d`	data frame containing by columns the set of variables that could be in the selected model
`alfa`	significance level to decide if a variable stays or not in the model
`family`	the distribution function to be used in the glm model
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model

Details

The strategy begins analysing a model with all the variables included in d. If all the variables are statistically significant (all the variables have a p-value less than alfa) this model will be the result. If not, the less statistically significant variable will be removed and the model is re-calculated. The process is repeated up to find a model with all the variables statistically significant (p-value < alpha). Each time that a variable is removed from the model, it is considered the possibility of one or more removed variables to come in again.

Value

two.ways.stepback returns an object of the class lm, where the model uses y as dependent variable and all the selected variables from d as independent variables.

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples


## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040,
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- two.ways.stepback(y = y, d = dis)
summary(s.fit)
## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040,
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- two.ways.stepback(y = y, d = dis)
summary(s.fit)

Fitting a linear model by forward-stepwise regression

Description

two.ways.stepfor fits a linear regression model applying forward-stepwise strategy.

Usage

two.ways.stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )
two.ways.stepfor(y = y, d = d, alfa = 0.05, family = gaussian(), epsilon=0.00001 )

Arguments

`y`	dependent variable
`d`	data frame containing by columns the set of variables that could be in the selected model
`alfa`	significance level to decide if a variable stays or not in the model
`family`	the distribution function to be used in the glm model
`epsilon`	argument to pass to `glm.control`, convergence tolerance in the iterative process to estimate de glm model

Details

Value

two.ways.stepfor returns an object of the class lm, where the model uses y as dependent variable and all the selected variables from d as independent variables.

Author(s)

Ana Conesa and Maria Jose Nueda, [email protected]

References

Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.

Examples


## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040,
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- two.ways.stepfor(y = y, d = dis)
summary(s.fit)
## create design matrix
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
dise <- make.design.matrix(edesign)
dis <- as.data.frame(dise$dis)


## expression vector
y <- c(0.082, 0.021, 0.010, 0.113, 0.013, 0.077, 0.068, 0.042, -0.056, -0.232, -0.014, -0.040,
-0.055, 0.150, -0.027, 0.064, -0.108, -0.220, 0.275, -0.130, 0.130, 1.018, 1.005, 0.931,
 -1.009, -1.101, -1.014, -0.045, -0.110, -0.128, -0.643, -0.785, -1.077, -1.187, -1.249, -1.463)

s.fit <- two.ways.stepfor(y = y, d = dis)
summary(s.fit)

Package 'maSigPro'

Help Index

Average rows by match and index

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Gene expression data potato abiotic stress

Description

Usage

Format

Details

References

Examples

Experimental design potato abiotic stress

Description

Usage

Format

Details

References

Examples

Experimental design with a shared time

Description

Usage

Format

Details

Examples

Experimental design with different replicates

Description

Usage

Format

Details

References

Examples

Extract significant genes for sets of variables in time series gene expression experiments

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Extract lists of significant isoforms from Differentially Spliced Genes (DSG)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Lists of genes with Isoforms in different clusters

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Ranks a vector to index

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

RNA-Seq dataset example for isoforms

Description

Usage

Format

Details

Examples