Title: | An R package for integrated analysis of temporal RNA-seq data with multiple biological conditions |
---|---|
Description: | Our R package MultiRNAflow provides an easy to use unified framework allowing to automatically make both unsupervised and supervised (DE) analysis for datasets with an arbitrary number of biological conditions and time points. In particular, our code makes a deep downstream analysis of DE information, e.g. identifying temporal patterns across biological conditions and DE genes which are specific to a biological condition for each time. |
Authors: | Rodolphe Loubaton [aut, cre] , Nicolas Champagnat [aut, ths] , Laurent Vallat [aut, ths] , Pierre Vallois [aut] , Région Grand Est [fnd], Cancéropôle Est [fnd] |
Maintainer: | Rodolphe Loubaton <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 1.5.0 |
Built: | 2024-12-04 06:19:35 UTC |
Source: | https://github.com/bioc/MultiRNAflow |
Transformation of a vector of integers into a vector of class "character" so that lexicographic order of characters corresponds to the numerical order of time measurements.
CharacterNumbers(Vect.number)
CharacterNumbers(Vect.number)
Vect.number |
Vector of integers. |
An appropriate number of character "0" is added in front of
the numerical characters corresponding to the decimal writing of
each integer in Vect.number
so that the order of elements of
the vector is preserved. For example, "9">"11", but "09"<"11".
A vector where each integer is transformed in class "character".
The function is called by
ColnamesToFactors()
.
CharacterNumbers(Vect.number=c(0,1,9,11,90,99,100,101)) CharacterNumbers(Vect.number=0:11) CharacterNumbers(Vect.number=1:8)
CharacterNumbers(Vect.number=c(0,1,9,11,90,99,100,101)) CharacterNumbers(Vect.number=0:11) CharacterNumbers(Vect.number=1:8)
This function generates new reduced column names according to the presence
of biological conditions and/or time points, and extract the different
factors (individual's names, time measurements, biological conditions)
from the column names of the dataset (see Details
).
ColnamesToFactors( ExprData, Column.gene, Group.position, Time.position, Individual.position )
ColnamesToFactors( ExprData, Column.gene, Group.position, Time.position, Individual.position )
ExprData |
Data.frame with |
Column.gene |
Integer indicating the column where gene names are given.
Set |
Group.position |
Integer indicating the position of group information
in the string of characters in each sample names (see |
Time.position |
Integer indicating the position of time measurement
information in the string of characters in each sample names
(see |
Individual.position |
Integer indicating the position of the name of
the individual (e.g patient, replicate, mouse, yeasts culture ...)
in the string of characters in each sample names (see |
The column names of ExprData
must be a vector of strings
of characters containing
a string of characters (if ) which is the label of the column
containing gene names.
sample names which must be strings of characters containing
at least : the name of the individual (e.g patient, mouse, yeasts culture),
its biological condition (if there is at least two) and
the time where data have been collected if there is at least two;
(must be either 't0', 'T0' or '0' for time 0,
't1', 'T1' or '1' for time 1, ...).
All these sample information must be separated by underscores in the sample name. For instance 'CLL_P_t0_r1', corresponds to the patient 'r1' belonging to the biological condition 'P' and where data were collected at time 't0'. I this example, 'CLL' describe the type of cells (here chronic lymphocytic leukemia) and is not used in our analysis.
In the string of characters 'CLL_P_t0_r1',
'r1' is localized after the third underscore,
so Individual.position=4
,
'P' is localized after the first underscore, so Group.position=2
and
't0' is localized after the second underscore, so Time.position=3
.
The function returns new column names of the dataset, a vector indicating the name of the individual for each sample, a vector indicating the time for each sample and/or a vector indicating the biological condition for each sample.
The ColnamesToFactors()
function
is used by the following
functions of our package :
DATAprepSE()
,
PCApreprocessing()
,
MFUZZclustersNumber()
and
MFUZZanalysis()
.
## Data simulated with our function RawCountsSimulation() Data.sim <- RawCountsSimulation(Nb.Group=3, Nb.Time=2, Nb.per.GT=3, Nb.Gene=10) ##------------------------------------------------------------------------## res.test.colnames <- ColnamesToFactors(ExprData=Data.sim$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) print(res.test.colnames)
## Data simulated with our function RawCountsSimulation() Data.sim <- RawCountsSimulation(Nb.Group=3, Nb.Time=2, Nb.per.GT=3, Nb.Gene=10) ##------------------------------------------------------------------------## res.test.colnames <- ColnamesToFactors(ExprData=Data.sim$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) print(res.test.colnames)
From raw counts, this function realizes one of
the three methods of normalization of the package DESeq2
:
Relative Log Expression (rle) transformation
(see BiocGenerics::estimateSizeFactors()
)
Regularized Log (rlog) transformation
(see DESeq2::rlog()
)
Variance Stabilizing Transformation (vst) transformation
(see DESeq2::vst()
)
DATAnormalization( SEres, Normalization = "vst", Blind.rlog.vst = FALSE, Plot.Boxplot = TRUE, Colored.By.Factors = FALSE, Color.Group = NULL, Plot.genes = FALSE, path.result = NULL, Name.folder.norm = NULL )
DATAnormalization( SEres, Normalization = "vst", Blind.rlog.vst = FALSE, Plot.Boxplot = TRUE, Colored.By.Factors = FALSE, Color.Group = NULL, Plot.genes = FALSE, path.result = NULL, Name.folder.norm = NULL )
SEres |
Results of the function
|
Normalization |
"rle", "vst", "rlog" and "rpkm".
"rle", "vst" and "rlog" correspond to a method of normalization proposed
by |
Blind.rlog.vst |
|
Plot.Boxplot |
|
Colored.By.Factors |
|
Color.Group |
|
Plot.genes |
|
path.result |
Character or |
Name.folder.norm |
Character or |
All results are built from the results of the function
DATAprepSE()
.
The function returns a SummarizedExperiment object (SEresNorm
)
identical as SEres
but
with the normalized count data saved in assays(SEresNORM)[[2]]
with the boxplot of normalized count
saved in the metadata Results[[1]][[1]]
of SEresNorm
.
The boxplot is plotted if Plot.Boxplot=TRUE
.
The DATAnormalization()
function calls our R function
DATAprepSE()
,
and the R functions
BiocGenerics::estimateSizeFactors()
,
DESeq2::rlog()
and
DESeq2::vst()
in order to realized the normalization.
data(RawCounts_Antoszewski2022_MOUSEsub500) ##------------------------------------------------------------------------## resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_Antoszewski2022_MOUSEsub500, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=TRUE, Colored.By.Factors=TRUE)
data(RawCounts_Antoszewski2022_MOUSEsub500) ##------------------------------------------------------------------------## resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_Antoszewski2022_MOUSEsub500, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=TRUE, Colored.By.Factors=TRUE)
From the results of either our R function
DATAprepSE()
or our R function
DATAnormalization()
(raw counts or normalized raw counts),
the function plots the distribution of all gene expressions using
a boxplot for each sample.
DATAplotBoxplotSamples( SEres, Log2.transformation = TRUE, Colored.By.Factors = FALSE, Color.Group = NULL, Plot.genes = FALSE, y.label = NULL )
DATAplotBoxplotSamples( SEres, Log2.transformation = TRUE, Colored.By.Factors = FALSE, Color.Group = NULL, Plot.genes = FALSE, y.label = NULL )
SEres |
Results of the function
|
Log2.transformation |
|
Colored.By.Factors |
|
Color.Group |
|
Plot.genes |
|
y.label |
|
The boxplot allows to visualize six summary statistics
(see ggplot2::geom_boxplot()
):
the median
two hinges: first and third quartiles denoted Q1 and Q3.
two whiskers: and
with
, the interquartile range.
outliers: data beyond the end of the whiskers are called "outlying" points and are plotted in black.
For better visualization of the six summary statistics described above,
raw counts must be transformed using the function .
This transformation is automatically performed by other functions of
the package, such as
DATAnormalization()
.
Log2.transformation
will be set as TRUE in
DATAnormalization()
if Normalization ="rle"
, otherwise Log2.transformation=FALSE
.
The function returns a graph which plots the distribution of all
gene expressions using a boxplot for each sample
(see ggplot2::geom_boxplot()
).
The DATAplotBoxplotSamples()
function
is used by the following function of our package:
DATAnormalization()
.
calls the R functions ggplot2::geom_boxplot and ggplot2::geom_jitter in order to print the boxplot.
data(RawCounts_Antoszewski2022_MOUSEsub500) ##------------------------------------------------------------------------## resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_Antoszewski2022_MOUSEsub500, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## DATAplotBoxplotSamples(SEres=resDATAprepSE, Log2.transformation=TRUE, Colored.By.Factors=TRUE, Color.Group=NULL, Plot.genes=FALSE, y.label=NULL)
data(RawCounts_Antoszewski2022_MOUSEsub500) ##------------------------------------------------------------------------## resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_Antoszewski2022_MOUSEsub500, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## DATAplotBoxplotSamples(SEres=resDATAprepSE, Log2.transformation=TRUE, Colored.By.Factors=TRUE, Color.Group=NULL, Plot.genes=FALSE, y.label=NULL)
The function allows to plot the gene expression profile of one gene only according to time and/or biological conditions.
DATAplotExpression1Gene( SEres, row.gene = 1, Color.Group = NULL, ylabel = "Expression" )
DATAplotExpression1Gene( SEres, row.gene = 1, Color.Group = NULL, ylabel = "Expression" )
SEres |
Results of either our R function
|
row.gene |
Non negative integer indicating the row of the gene to be plotted. |
Color.Group |
NULL or a data.frame with |
ylabel |
Character corresponding to the label of the y-axis.
By default, |
All results are built from either the results of our R function
DATAprepSE()
or the results of our R function
DATAnormalization()
.
The function plots for the gene selected with
the input row.gene
In the case where samples belong to different time points only : the evolution of the expression of each replicate across time and the evolution of the mean and the standard deviation of the expression across time.
In the case where samples belong to different biological conditions only:
a violin plot
(see ggplot2::geom_violin()
),
and error bars (standard deviation)
(see ggplot2::geom_errorbar()
)
for each biological condition.
In the case where samples belong to different time points and different biological conditions : the evolution of the expression of each replicate across time and the evolution of the mean and the standard deviation of the expression across time for each biological condition.
The DATAplotExpression1Gene()
function is used by the following function of our package:
DATAplotExpressionGenes()
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ##------------------------------------------------------------------------## resEVO1gene <- DATAplotExpression1Gene(SEres=resDATAprepSE, row.gene=1, Color.Group=NULL) print(resEVO1gene)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ##------------------------------------------------------------------------## resEVO1gene <- DATAplotExpression1Gene(SEres=resDATAprepSE, row.gene=1, Color.Group=NULL) print(resEVO1gene)
The function allows to plot gene expression profiles according to time and/or biological conditions.
DATAplotExpressionGenes( SEresNorm, Vector.row.gene, DATAnorm = TRUE, Color.Group = NULL, Plot.Expression = TRUE, path.result = NULL, Name.folder.profile = NULL )
DATAplotExpressionGenes( SEresNorm, Vector.row.gene, DATAnorm = TRUE, Color.Group = NULL, Plot.Expression = TRUE, path.result = NULL, Name.folder.profile = NULL )
SEresNorm |
Results of the function
|
Vector.row.gene |
Vector of non negative integers indicating the rows of the genes to be plotted. |
DATAnorm |
|
Color.Group |
|
Plot.Expression |
|
path.result |
Character or |
Name.folder.profile |
Character or |
All results are built from the results of our function
DATAnormalization()
.
The function returns the same SummarizedExperiment class object
SEresNorm
with the a graph for each gene
(depending on the experimental design)
selected with the input Vector.row.gene
(saved in the metadata Results[[1]][[5]]
of SEresNorm
)
In the case where samples belong to different time points only : the evolution of the expression of each replicate across time and the evolution of the mean and the standard deviation of the expression across time.
In the case where samples belong to different biological conditions only:
a violin plot
(see ggplot2::geom_violin()
),
and error bars (standard deviation)
(see ggplot2::geom_errorbar()
)
for each biological condition.
In the case where samples belong to different time points and different biological conditions : the evolution of the expression of each replicate across time and the evolution of the mean and the standard deviation of the expression across time for each biological condition.
The function plots the different graph if
Plot.Expression=TRUE
.
The function calls our R function
DATAnormalization()
fisrt, then
DATAplotExpression1Gene()
for each selected genes with Vector.row.gene
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##-----------------------------------------------------------------------## resEVOgenes <- DATAplotExpressionGenes(SEresNorm=resNorm, Vector.row.gene=c(1,3), DATAnorm=TRUE, Color.Group=NULL, Plot.Expression=TRUE, path.result=NULL, Name.folder.profile=NULL)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##-----------------------------------------------------------------------## resEVOgenes <- DATAplotExpressionGenes(SEresNorm=resNorm, Vector.row.gene=c(1,3), DATAnorm=TRUE, Color.Group=NULL, Plot.Expression=TRUE, path.result=NULL, Name.folder.profile=NULL)
This function creates automatically a SummarizedExperiment (SE) object from raw counts data to store
information for exploratory (unsupervised) analysis using the R function
SummarizedExperiment::SummarizedExperiment()
a DESeq2 object from raw counts data in order to store all information
for statistical (supervised) analysis using the R function
DESeq2::DESeqDataSetFromMatrix()
.
DATAprepSE( RawCounts, Column.gene, Group.position, Time.position, Individual.position, colData = NULL, VARfilter = 0, SUMfilter = 0, RNAlength = NULL )
DATAprepSE( RawCounts, Column.gene, Group.position, Time.position, Individual.position, colData = NULL, VARfilter = 0, SUMfilter = 0, RNAlength = NULL )
RawCounts |
Data.frame with |
Column.gene |
Integer indicating the column where gene names are given.
Set |
Group.position |
Integer indicating the position of group information
in the string of characters in each sample names (see |
Time.position |
Integer indicating the position of time measurement
information in the string of characters in each sample names
(see |
Individual.position |
Integer indicating the position of the name of
the individual (e.g patient, replicate, mouse, yeasts culture ...)
in the string of characters in each sample names (see |
colData |
|
VARfilter |
Positive numeric value, 0 as default.
All rows of |
SUMfilter |
Positive numeric value, 0 as default.
All rows of |
RNAlength |
If |
The column names of RawCounts
must be a vector of strings
of characters containing
a string of characters (if ) which is the label of the column
containing gene names.
sample names which must be strings of characters containing
at least : the name of the individual (e.g patient, mouse, yeasts culture),
its biological condition (if there is at least two) and
the time where data have been collected if there is at least two;
(must be either 't0', 'T0' or '0' for time 0,
't1', 'T1' or '1' for time 1, ...).
All these sample information must be separated by underscores in the sample name. For instance 'CLL_P_t0_r1', corresponds to the patient 'r1' belonging to the biological condition 'P' and where data were collected at time 't0'. I this example, 'CLL' describe the type of cells (here chronic lymphocytic leukemia) and is not used in our analysis.
In the string of characters 'CLL_P_t0_r1',
'r1' is localized after the third underscore,
so Individual.position=4
,
'P' is localized after the first underscore, so Group.position=2
and
't0' is localized after the second underscore, so Time.position=3
.
If the user does not have all these sample information separated by
underscores in the sample name, the user can build the data.frame
colData
describing the samples.
The function returns a SummarizedExperiment object containing all information for exploratory (unsupervised) analysis and DE statistical analysis.
The DATAprepSE()
function
is used by the following functions of our package :
DATAnormalization()
,
DEanalysisGlobal()
.
calls the R function
DESeq2::DESeqDataSetFromMatrix()
in order to create the DESeq2 object and
SummarizedExperiment::SummarizedExperiment()
in order to create the SummarizedExperiment object
BgCdEx <- rep(c("P", "NP"), each=27) TimeEx <- rep(paste0("t", seq_len(9) - 1), times=6) IndvEx <- rep(paste0("pcl", seq_len(6)), each=9) SampleNAMEex <- paste(BgCdEx, IndvEx, TimeEx, sep="_") RawCountEx <- data.frame(Gene.name=paste0("Name", seq_len(10)), matrix(sample(seq_len(100), length(SampleNAMEex)*10, replace=TRUE), ncol=length(SampleNAMEex), nrow=10)) colnames(RawCountEx) <- c("Gene.name", SampleNAMEex) ##------------------------------------------------------------------------## resDATAprepSE <- DATAprepSE(RawCounts=RawCountEx, Column.gene=1, Group.position=1, Time.position=3, Individual.position=2) ## ## colDataEx <- data.frame(Group=BgCdEx, Time=TimeEx, ID=IndvEx)
BgCdEx <- rep(c("P", "NP"), each=27) TimeEx <- rep(paste0("t", seq_len(9) - 1), times=6) IndvEx <- rep(paste0("pcl", seq_len(6)), each=9) SampleNAMEex <- paste(BgCdEx, IndvEx, TimeEx, sep="_") RawCountEx <- data.frame(Gene.name=paste0("Name", seq_len(10)), matrix(sample(seq_len(100), length(SampleNAMEex)*10, replace=TRUE), ncol=length(SampleNAMEex), nrow=10)) colnames(RawCountEx) <- c("Gene.name", SampleNAMEex) ##------------------------------------------------------------------------## resDATAprepSE <- DATAprepSE(RawCounts=RawCountEx, Column.gene=1, Group.position=1, Time.position=3, Individual.position=2) ## ## colDataEx <- data.frame(Group=BgCdEx, Time=TimeEx, ID=IndvEx)
The function realizes the DE analysis in three cases: either samples belonging to different time measurements, or samples belonging to different biological conditions, or samples belonging to different time measurements and different biological conditions.
DEanalysisGlobal( SEres, pval.min = 0.05, pval.vect.t = NULL, log.FC.min = 1, LRT.supp.info = FALSE, Plot.DE.graph = TRUE, path.result = NULL, Name.folder.DE = NULL )
DEanalysisGlobal( SEres, pval.min = 0.05, pval.vect.t = NULL, log.FC.min = 1, LRT.supp.info = FALSE, Plot.DE.graph = TRUE, path.result = NULL, Name.folder.DE = NULL )
SEres |
Results of either our R function
|
pval.min |
Numeric value between 0 and 1. A gene will be considered as
differentially expressed (DE) between two biological conditions if
its Benjamini-Hochberg adjusted p-value
(see |
pval.vect.t |
|
log.FC.min |
Non negative numeric value.
If the |
LRT.supp.info |
|
Plot.DE.graph |
|
path.result |
Character or |
Name.folder.DE |
Character or |
All results are built from the results of either our R function
DATAprepSE()
,
or our R function
DATAnormalization()
.
The function returns the same SummarizedExperiment class object
SEres
with the rle normalized count data (cf DATAnormalization()
)
automatically realized by
DESeq2::DESeq()
and saved in assays(SEresNORM)$rle
, and with the following results
saved in the metadata Results[[2]][[2]]
of SEres
,
depending on the experimental design.
If samples belong to different biological conditions only
(see DEanalysisGroup()
),
the function returns
a data.frame (output rowData(SEres)
) which contains
pvalues, log2 fold change and DE genes between each pairs of biological conditions.
a binary column (1 and 0) where 1 means the gene is DE between at least one pair of biological conditions.
binary columns,
where
is the number of biological conditions,
which gives the specific genes for each biological condition.
A '1' in one of these columns means the gene is specific to
the biological condition associated to the given column. 0 otherwise.
A gene is called specific to a given biological condition BC1,
if the gene is DE between BC1 and any other biological conditions,
but not DE between any pair of other biological conditions.
columns filled with -1, 0 and 1, one per biological
condition.
A '1' in one of these columns means the gene is up-regulated
(or over-expressed) for the biological condition associated to the
given column.
A gene is called up-regulated for a given biological condition BC1 if
the gene is specific to the biological condition BC1 and expressions
in BC1 are higher than in the other biological conditions.
A '-1' in one of these columns means the gene is down-regulated
(or under-expressed) for the biological condition associated to the
given column.
A gene is called down-regulated for a given biological condition BC1 if
the gene is specific to the biological condition BC1 and expressions
in BC1 are lower than in the other biological conditions.
A '0' in one of these columns means the gene is not specific to the
biological condition associated to the given column.
an UpSet plot (Venn diagram displayed as a barplot) which gives the
number of genes for each possible intersection
(see DEplotVennBarplotGroup()
).
We consider that a set of pairs of biological conditions forms an
intersection if there is at least one gene which is DE for each of these
pairs of biological conditions, but not for the others.
a barplot which gives the number of genes categorized as "Upregulated"
and "DownRugulated", per biological condition
(see DEplotBarplot()
).
a barplot which gives the number of genes categorized as "Upregulated",
"DownRugulated" and "Other", per biological condition
(see DEplotBarplot()
).
A gene is categorized as 'Other', for a given biological condition,
if the gene is not specific to the given biological condition.
So this barplot, only plotted when there are strictly more than two
biological conditions, is similar to the previous barplot but with
the category "Other".
a list (output List.Glossary
) containing the glossary of
the column names of DE.results
.
a list (output Summary.Inputs
) containing a summary of sample
information and inputs of
DEanalysisGlobal()
.
If data belong to different time points only
(see DEanalysisTime()
),
the function returns
a data.frame (output rowData(SEres)
) which contains
gene names
pvalues, log2 fold change and DE genes between each time ti versus the reference time t0.
a binary column (1 and 0) where 1 means the gene is DE at at least between one time ti versus the reference time t0.
a column where each element is succession of 0 and 1. The positions of '1' indicate the set of times ti such that the gene is DE between ti and the reference time t0.
an alluvial graph of differentially expressed (DE) genes
(see DEplotAlluvial()
)
a graph showing the number of DE genes as a function of time for
each temporal group
(see DEplotAlluvial()
).
By temporal group, we mean the sets of genes which are first DE at
the same time.
a barplot which gives the number of DE genes per time
(see DEplotBarplotTime()
)
an UpSet plot which gives the number of genes per temporal pattern
(see DEplotVennBarplotTime()
).
By temporal pattern, we mean the set of times ti such that the gene is
DE between ti and the reference time t0.
a similar UpSet plot where each bar is split in different colors corresponding to all possible numbers of DE times where genes are over expressed in a given temporal pattern.
a list (output List.Glossary
) containing the glossary of
the column names of DE.results
.
a list (output Summary.Inputs
) containing a summary of sample
information and inputs of
DEanalysisGlobal()
.
If data belong to different time points and different biological
conditions
(see DEanalysisTimeAndGroup()
),
the function returns
a data.frame (output rowData(SEres)
) which contains
gene names
Results from the temporal statistical analysis
pvalues, log2 fold change and DE genes between each pairs of biological conditions for each fixed time.
binary columns (0 and 1), one per biological condition
(with
the number of biological conditions).
A 1 in one of these two columns means the gene is DE at least between
one time ti versus the reference time t0, for the biological condition
associated to the given column.
columns, one per biological condition, where each
element is succession of 0 and 1. The positions of 1 in one of these
two columns, indicate the set of times ti such that the gene is DE
between ti and the reference time t0, for the biological condition
associated to the given column.
Results from the statistical analysis by biological condition
pvalues, log2 fold change and DE genes between each time ti and the reference time t0 for each biological condition.
binary columns (0 and 1), one per time
(with
the number of time measurements).
A 1 in one of these columns, means the gene is DE between at least
one pair of biological conditions, for the fixed time associated
to the given column.
binary columns, which give the genes specific
for each biological condition at each time ti.
A 1 in one of these columns means the gene is specific to the
biological condition at a fixed time associated to the given column.
0 otherwise. A gene is called specific to a given biological condition
BC1 at a time ti, if the gene is DE between BC1 and any other
biological conditions at time ti, but not DE between any pair of
other biological conditions at time ti.
columns filled with -1, 0 and 1.
A 1 in one of these columns means the gene is up-regulated
(or over-expressed) for the biological condition at a fixed time
associated to the given column. A gene is called up-regulated for a
given biological condition BC1 at time ti if the gene is specific to
the biological condition BC1 at time ti and expressions in BC1 at time
ti are higher than in the other biological conditions at time ti.
A -1 in one of these columns means the gene is down-regulated
(or under-expressed) for the biological condition at a fixed time
associated to the given column. A gene is called down-regulated for a
given biological condition at a time ti BC1 if the gene is specific to
the biological condition BC1 at time ti and expressions in BC1 at time
ti are lower than in the other biological conditions at time ti.
A 0 in one of these columns means the gene is not specific to the
biological condition at a fixed time associated to the given column.
binary columns (0 and 1). A 1 in one of these columns
means the gene is specific at at least one time ti, for the biological
condition associated to the given column. 0 otherwise.
Results from the combination of temporal and biological statistical analysis
binary columns, which give the signatures
genes for each biological condition at each time ti.
A 1 in one of these columns means the gene is signature gene to the
biological condition at a fixed time associated to the given column.
0 otherwise. A gene is called signature of a biological condition
BC1 at a given time ti, if the gene is specific to the biological
condition BC1 at time ti and DE between ti versus the reference time
t0 for the biological condition BC1.
binary columns (0 and 1). A 1 in one of these columns
means the gene is signature at at least one time ti, for the
biological condition associated to the given column. 0 otherwise.
the following plots from the temporal statistical analysis
a barplot which gives the number of DE genes between ti and the
reference time t0, for each time ti (except the reference time t0) and
biological condition
(see DEplotBarplotFacetGrid()
).
alluvial graphs of DE genes
(see
DEplotAlluvial()
),
one per biological condition.
graphs showing the number of DE genes as a function of
time for each temporal group, one per biological condition.
By temporal group, we mean the sets of genes which are first DE at the
same time.
UpSet plot showing the number of DE genes
belonging to each DE temporal pattern, for each biological condition.
By temporal pattern, we mean the set of times ti such that the gene is
DE between ti and the reference time t0
(see
DEplotVennBarplotTime()
).
an alluvial graph for DE genes which are DE at least one time for each group.
the following plots from the statistical analysis by biological condition
a barplot which gives the number of specific DE genes for each
biological condition and time
(see DEplotBarplotFacetGrid()
).
UpSet plot which give the number of genes
for each possible intersection (set of pairs of biological conditions),
one per time
(see
DEplotVennBarplotGroup()
).
an alluvial graph of genes which are specific at least one time
(see DEplotAlluvial()
).
the following plots from the combination of temporal and biological statistical analysis
a barplot which gives the number of signature genes for each
biological condition and time
(see DEplotBarplotFacetGrid()
).
a barplot showing the number of genes which are DE at at least one time, specific at at least one time and signature at at least one time, for each biological condition.
an alluvial graph of genes which are signature at least one time
(see DEplotAlluvial()
).
data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ##------------------------------------------------------------------------## ## Preprocessing resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDE <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, Name.folder.DE=NULL)
data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ##------------------------------------------------------------------------## ## Preprocessing resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDE <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, Name.folder.DE=NULL)
The function realizes from the
DESeq2::DESeq()
output the analysis of DE genes between all pairs of biological conditions.
DEanalysisGroup( DESeq.result, pval.min = 0.05, log.FC.min = 1, LRT.supp.info = TRUE, Plot.DE.graph = TRUE, path.result = NULL, SubFile.name = NULL )
DEanalysisGroup( DESeq.result, pval.min = 0.05, log.FC.min = 1, LRT.supp.info = TRUE, Plot.DE.graph = TRUE, path.result = NULL, SubFile.name = NULL )
DESeq.result |
Output from the function
|
pval.min |
Numeric value between 0 and 1. A gene will be considered as
differentially expressed (DE) between two biological conditions if
its Benjamini-Hochberg adjusted p-value
(see |
log.FC.min |
Non negative numeric value.
If the log2 fold change between biological conditions or times has
an absolute value below the threshold |
LRT.supp.info |
|
Plot.DE.graph |
|
path.result |
Character or |
SubFile.name |
Character or |
The function returns the same DESeqDataSet class object
DESeq.result
with the following results,
saved in the metadata DEresultsGroup
of DESeq.result
:
a data.frame (output DEsummary
of DEresultsGroup
)
which contains
gene names
pvalues, log2 fold change and DE genes between each pairs of biological conditions.
a binary column (1 and 0) where 1 means the gene is DE between at least one pair of biological conditions.
binary columns, where
is the number of
biological conditions, which gives the specific genes for each
biological condition.
A '1' in one of these columns means the gene is specific to the
biological condition associated to the given column. 0 otherwise.
A gene is called specific to a given biological condition BC1,
if the gene is DE between BC1 and any other biological conditions,
but not DE between any pair of other biological conditions.
columns filled with -1, 0 and 1, one per biological
condition. A '1' in one of these columns means the gene is up-regulated
(or over-expressed) for the biological condition associated to the
given column. A gene is called up-regulated for a given biological
condition BC1 if the gene is specific to the biological condition BC1
and expressions in BC1 are higher than in the other
biological conditions.
A '-1' in one of these columns means the gene is down-regulated
(or under-expressed) for the biological condition associated to the
given column.
A gene is called down-regulated for a given biological condition BC1 if
the gene is specific to the biological condition BC1 and expressions in
BC1 are lower than in the other biological conditions.
A '0' in one of these columns means the gene is not specific to the
biological condition associated to the given column.
A contingency matrix (output Summary.DEanalysis
of DEresultsGroup
) which gives for each biological condition
the number of genes categorized as
"Upregulated", "DownRugulated" and "Other".
A gene is categorized as 'Other', for a given biological condition,
if the gene is not specific to the given biological condition.
The category 'Other' does not exist when there are only two biological
conditions.
an UpSet plot (Venn diagram displayed as a barplot) which gives the
number of genes for each possible intersection
(see DEplotVennBarplotGroup()
).
We consider that a set of pairs of biological conditions forms an
intersection if there is at least one gene which is DE for each of
these pairs of biological conditions, but not for the others.
a barplot which gives the number of genes categorized as "Upregulated"
and "DownRugulated", per biological condition
(see DEplotBarplot()
).
a barplot which gives the number of genes categorized as "Upregulated",
"DownRugulated" and "Other", per biological condition
(see DEplotBarplot()
).
So this barplot, only plotted when there are strictly more than
two biological conditions, is similar to the previous barplot but with
the category "Other".
The outputs of the function are used by the main function
DEanalysisGlobal()
.
## Data data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt<-RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing step resDATAprepSEmus1<- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEmus1)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE.G <- DESeq2::DESeq(DESeq2obj, quiet=TRUE, betaPrior=FALSE) res.sum.group <- DEanalysisGroup(DESeq.result=dds.DE.G, pval.min=0.01, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, SubFile.name=NULL)
## Data data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt<-RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing step resDATAprepSEmus1<- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEmus1)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE.G <- DESeq2::DESeq(DESeq2obj, quiet=TRUE, betaPrior=FALSE) res.sum.group <- DEanalysisGroup(DESeq.result=dds.DE.G, pval.min=0.01, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, SubFile.name=NULL)
From the results from our function
DEanalysisGlobal()
,
the function extracts from the SummarizedExperiment class outputs of
the subset of genes selected with the inputs Set.Operation
and
ColumnsCriteria
, and saves them in a SummarizeExperiment object.
DEanalysisSubData( SEresDE, ColumnsCriteria = 1, Set.Operation = "union", Save.SubData = FALSE )
DEanalysisSubData( SEresDE, ColumnsCriteria = 1, Set.Operation = "union", Save.SubData = FALSE )
SEresDE |
A SummarizedExperiment class object. Output from
|
ColumnsCriteria |
A vector of integers where each integer indicates
a column of |
Set.Operation |
A character.
The user must choose between "union" (default), "intersect", "setdiff"
(see |
Save.SubData |
|
We have the following three cases:
If Set.Operation="union"
then the rows extracted from
the different datasets included in SEresDE
are those such that the sum of the selected columns of
SummarizedExperiment::rowData(SEresDE)
by ColumnsCriteria
is >0.
For example, the selected genes can be those DE at least at t1 or t2
(versus the reference time t0).
If Set.Operation="intersect"
then the rows extracted from
the different datasets included in SEresDE
are those such that the product of the selected columns of
SummarizedExperiment::rowData(SEresDE)
by ColumnsCriteria
is >0.
For example, the selected genes can be those DE at times t1 and t2
(versus the reference time t0).
If Set.Operation="setdiff"
then the rows extracted from
the different datasets included in SEresDE
are those such that only one element of the selected columns of
SummarizedExperiment::rowData(SEresDE)
by ColumnsCriteria
is >0.
For example, the selected genes can be those DE at times t1 only and
at times t2 only (versus the reference time t0).
The function returns a SummarizeExperiment class object containing
sub data.frames of the different dataset included in SEresDE
containing only the rows specified by
ColumnsCriteria
and Set.Operation
.
the DE results saved in SEresDE
of genes selected by
ColumnsCriteria
and Set.Operation
.
The genes specified by ColumnsCriteria
and Set.Operation
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=1, Nb.per.GT=4, Nb.Gene=5) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## Transformation of resDATAprepSE into results of DEanalysisGlobal resultsExamples <- data.frame(Gene=paste0("Gene", seq_len(5)), DE1=c(0, 1, 0, 0, 1), DE2=c(0, 1, 0, 1, 0)) listPATHnameEx <- list(Path.result=NULL, Folder.result=NULL) SummarizedExperiment::rowData(resDATAprepSE) <- resultsExamples S4Vectors::metadata(resDATAprepSE)$DESeq2obj$pathNAME <- listPATHnameEx S4Vectors::metadata(resDATAprepSE)$DESeq2obj$SEidentification<-"SEresultsDE" ##------------------------------------------------------------------------## ## results of DEanalysisSubData resDEsub <- DEanalysisSubData(SEresDE=resDATAprepSE, ColumnsCriteria=c(2, 3), Set.Operation="union", Save.SubData=FALSE)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=1, Nb.per.GT=4, Nb.Gene=5) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## Transformation of resDATAprepSE into results of DEanalysisGlobal resultsExamples <- data.frame(Gene=paste0("Gene", seq_len(5)), DE1=c(0, 1, 0, 0, 1), DE2=c(0, 1, 0, 1, 0)) listPATHnameEx <- list(Path.result=NULL, Folder.result=NULL) SummarizedExperiment::rowData(resDATAprepSE) <- resultsExamples S4Vectors::metadata(resDATAprepSE)$DESeq2obj$pathNAME <- listPATHnameEx S4Vectors::metadata(resDATAprepSE)$DESeq2obj$SEidentification<-"SEresultsDE" ##------------------------------------------------------------------------## ## results of DEanalysisSubData resDEsub <- DEanalysisSubData(SEresDE=resDATAprepSE, ColumnsCriteria=c(2, 3), Set.Operation="union", Save.SubData=FALSE)
The function realizes from the
DESeq2::DESeq()
output the analysis of DE genes between each time versus
the reference time t0.
DEanalysisTime( DESeq.result, pval.min = 0.05, pval.vect.t = NULL, log.FC.min = 1, LRT.supp.info = FALSE, Plot.DE.graph = TRUE, path.result = NULL, SubFile.name = NULL )
DEanalysisTime( DESeq.result, pval.min = 0.05, pval.vect.t = NULL, log.FC.min = 1, LRT.supp.info = FALSE, Plot.DE.graph = TRUE, path.result = NULL, SubFile.name = NULL )
DESeq.result |
Output from the function
|
pval.min |
Numeric value between 0 and 1. A gene will be considered as
differentially expressed (DE) between two biological conditions if
its Benjamini-Hochberg adjusted p-value
(see |
pval.vect.t |
|
log.FC.min |
Non negative numeric value.
If the |
LRT.supp.info |
|
Plot.DE.graph |
|
path.result |
Character or |
SubFile.name |
Character or |
The function returns the same DESeqDataSet class object
DESeq.result
with the following results,
saved in the metadata DEresultsTime
of DESeq.result
:
a data.frame (output DEsummary
of DEresultsTime
)
which contains
gene names
pvalues, log2 fold change and DE genes between each time ti versus the reference time t0.
a binary column (1 and 0) where 1 means the gene is DE at least at between one time ti versus the reference time t0.
a column where each element is succession of 0 and 1. The positions of '1' indicate the set of times ti such that the gene is DE between ti and the reference time t0.
an alluvial graph of differentially expressed (DE) genes
(see DEplotAlluvial()
)
a graph showing the number of DE genes as a function of time for each
temporal group
(see DEplotAlluvial()
).
By temporal group, we mean the sets of genes which are first DE at
the same time.
a barplot which gives the number of DE genes per time
(see DEplotBarplotTime()
)
an UpSet plot (Venn diagram displayed as a barplot) which gives the number
of genes per temporal pattern
(see DEplotVennBarplotTime()
).
By temporal pattern, we mean the set of times ti such that the gene is
DE between ti and the reference time t0.
a similar UpSet plot where each bar is split in different colors corresponding to all possible numbers of DE times where genes are over expressed in a given temporal pattern.
The outputs of the function will be used by the main function
DEanalysisGlobal()
.
data(RawCounts_Leong2014_FISSIONsub500wt) ## We take only the first three time for the speed of the example RawCounts_Fission_3t<-RawCounts_Leong2014_FISSIONsub500wt[seq_len(200), seq_len(10)] ## Preprocessing step resDATAprepSEfission <- DATAprepSE(RawCounts=RawCounts_Fission_3t, Column.gene=1, Group.position=NULL, Time.position=2, Individual.position=3) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEfission)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE.T <- DESeq2::DESeq(DESeq2obj, quiet=TRUE, betaPrior=FALSE) ## res.T <- DEanalysisTime(DESeq.result=dds.DE.T, pval.min=0.05, pval.vect.t=c(0.01,0.05,0.05), log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, SubFile.name=NULL)
data(RawCounts_Leong2014_FISSIONsub500wt) ## We take only the first three time for the speed of the example RawCounts_Fission_3t<-RawCounts_Leong2014_FISSIONsub500wt[seq_len(200), seq_len(10)] ## Preprocessing step resDATAprepSEfission <- DATAprepSE(RawCounts=RawCounts_Fission_3t, Column.gene=1, Group.position=NULL, Time.position=2, Individual.position=3) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEfission)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE.T <- DESeq2::DESeq(DESeq2obj, quiet=TRUE, betaPrior=FALSE) ## res.T <- DEanalysisTime(DESeq.result=dds.DE.T, pval.min=0.05, pval.vect.t=c(0.01,0.05,0.05), log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, SubFile.name=NULL)
The function realizes from the
DESeq2::DESeq()
output the analysis of :
DE genes between all pairs of biological conditions for each fixed time.
DE genes between all times ti and the reference time t0for each biological condition.
DEanalysisTimeAndGroup( DESeq.result, LRT.supp.info = TRUE, log.FC.min, pval.min, pval.vect.t, Plot.DE.graph = TRUE, path.result, SubFile.name )
DEanalysisTimeAndGroup( DESeq.result, LRT.supp.info = TRUE, log.FC.min, pval.min, pval.vect.t, Plot.DE.graph = TRUE, path.result, SubFile.name )
DESeq.result |
Output from the function
|
LRT.supp.info |
|
log.FC.min |
Non negative numeric value.
If the log2 fold change between biological conditions or times has an
absolute value below the threshold |
pval.min |
Numeric value between 0 and 1. A gene will be considered as
differentially expressed (DE) between two biological conditions
if its Benjamini-Hochberg adjusted p-value
(see |
pval.vect.t |
|
Plot.DE.graph |
|
path.result |
Character or |
SubFile.name |
Character or |
The function returns the same DESeqDataSet class object
DESeq.result
with the following results,
saved in the metadata DEresultsTimeGroup
of DESeq.result
:
a data.frame (output DEsummary
of DEresultsTimeGroup
)
which contains
gene names
Results from the temporal statistical analysis
pvalues, log2 fold change and DE genes between each pairs of biological conditions for each fixed time.
binary columns (0 and 1), one per biological condition
(with
the number of biological conditions).
A 1 in one of these two columns means the gene is DE at least between
one time ti versus the reference time t0, for the biological condition
associated to the given column.
columns, one per biological condition, where each element
is succession of 0 and 1. The positions of 1 in one of these two columns,
indicate the set of times ti such that the gene is DE between ti and
the reference time t0, for the biological condition associated to
the given column.
Results from the statistical analysis by biological condition
pvalues, log2 fold change and DE genes between each time ti and the reference time t0 for each biological condition.
binary columns (0 and 1), one per time
(with
the number of time measurements).
A 1 in one of these columns, means the gene is DE between at least
one pair of biological conditions, for the fixed time associated
to the given column.
binary columns, which give the genes specific
for each biological condition at each time ti.
A 1 in one of these columns means the gene is specific to the biological
condition at a fixed time associated to the given column. 0 otherwise.
A gene is called specific to a given biological condition BC1 at a
time ti, if the gene is DE between BC1 and any other biological
conditions at time ti, but not DE between any pair of other biological
conditions at time ti.
columns filled with -1, 0 and 1.
A 1 in one of these columns means the gene is up-regulated
(or over-expressed) for the biological condition at a fixed time
associated to the given column. A gene is called up-regulated for a
given biological condition BC1 at time ti if the gene is specific to
the biological condition BC1 at time ti and expressions in BC1 at time
ti are higher than in the other biological conditions at time ti.
A -1 in one of these columns means the gene is down-regulated
(or under-expressed) for the biological condition at a fixed time
associated to the given column. A gene is called down-regulated for a
given biological condition at a time ti BC1 if the gene is specific to
the biological condition BC1 at time ti and expressions in BC1 at time
ti are lower than in the other biological conditions at time ti.
A 0 in one of these columns means the gene is not specific to the
biological condition at a fixed time associated to the given column.
binary columns (0 and 1). A 1 in one of these columns
means the gene is specific at least at one time ti, for the biological
condition associated to the given column. 0 otherwise.
Results from the combination of temporal and biological statistical analysis
binary columns, which give the signatures genes
for each biological condition at each time ti. A 1 in one of these
columns means the gene is signature gene to the biological condition at
a fixed time associated to the given column. 0 otherwise.
A gene is called signature of a biological condition BC1 at a given time
ti, if the gene is specific to the biological condition BC1 at time ti
and DE between ti versus the reference time t0 for the biological
condition BC1.
binary columns (0 and 1). A 1 in one of these columns
means the gene is signature at least at one time ti, for the biological
condition associated to the given column. 0 otherwise.
the following plots from the temporal statistical analysis
a barplot which gives the number of DE genes between ti and the
reference time t0, for each time ti (except the reference time t0) and
biological condition
(see DEplotBarplotFacetGrid()
).
alluvial graphs of DE genes
(see
DEplotAlluvial()
),
one per biological condition.
graphs showing the number of DE genes as a function of time
for each temporal group, one per biological condition. By temporal group,
we mean the sets of genes which are first DE at the same time.
UpSet plot showing the number of DE genes
belonging to each DE temporal pattern, for each biological condition.
By temporal pattern, we mean the set of times ti such that the gene is
DE between ti and the reference time t0 (see
DEplotVennBarplotTime()
).
an alluvial graph for DE genes which are DE at least one time for each group.
the following plots from the statistical analysis by biological condition
a barplot which gives the number of specific DE genes for each
biological condition and time (see DEplotBarplotFacetGrid()
).
UpSet plot which give the number of genes
for each possible intersection (set of pairs of biological conditions),
one per time (see
DEplotVennBarplotGroup()
).
an alluvial graph of genes which are specific at least one time
(see DEplotAlluvial()
).
the following plots from the combination of temporal and biological statistical analysis
a barplot which gives the number of signature genes for each biological
condition and time (see DEplotBarplotFacetGrid()
).
a barplot showing the number of genes which are DE at least at one time, specific at least at one time and signature at least at one time, for each biological condition.
an alluvial graph of genes which are signature at least one time
(see DEplotAlluvial()
).
The outputs of the function will be used by the main
function DEanalysisGlobal()
.
data(RawCounts_Schleiss2021_CLLsub500) ## We take only the first three times (both group) for the speed of ## the example Index3t<-c(2:4,11:13,20:22, 29:31,38:40,47:49) RawCounts_3t<-RawCounts_Schleiss2021_CLLsub500[seq_len(200), c(1,Index3t)] ## Preprocessing step resDATAprepSEleuk <- DATAprepSE(RawCounts=RawCounts_3t, Column.gene=1, Group.position=2, Time.position=4, Individual.position=3) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEleuk)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE <- DESeq2::DESeq(DESeq2obj) ## res.G.T <- DEanalysisTimeAndGroup(DESeq.result=dds.DE, LRT.supp.info=FALSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=0.1, Plot.DE.graph=TRUE, path.result=NULL, SubFile.name=NULL)
data(RawCounts_Schleiss2021_CLLsub500) ## We take only the first three times (both group) for the speed of ## the example Index3t<-c(2:4,11:13,20:22, 29:31,38:40,47:49) RawCounts_3t<-RawCounts_Schleiss2021_CLLsub500[seq_len(200), c(1,Index3t)] ## Preprocessing step resDATAprepSEleuk <- DATAprepSE(RawCounts=RawCounts_3t, Column.gene=1, Group.position=2, Time.position=4, Individual.position=3) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEleuk)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE <- DESeq2::DESeq(DESeq2obj) ## res.G.T <- DEanalysisTimeAndGroup(DESeq.result=dds.DE, LRT.supp.info=FALSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=0.1, Plot.DE.graph=TRUE, path.result=NULL, SubFile.name=NULL)
The function takes as input a binary table with lines
corresponding to genes and
if Temporal.Group=TRUE
: columns corresponding to times
(with
the number of time points).
A '1' in the n-th row and t-th column means that the n-th gene is
differentially expressed (DE) at time t, compared with
the reference time t0.
if Temporal.Group=FALSE
:
columns corresponding to the number of group.
A '1' in the
-th row and
-th column means
that the n-th gene is
DE at least one time ti, compared with the reference time t0,
for the group .
specific at least one time ti, compared with the reference time t0,
for the group (see
DEanalysisTimeAndGroup()
for the notion 'specific').
a signature gene at least one time ti, compared with the reference time
t0, for the group (see
DEanalysisTimeAndGroup()
for the notion 'signature').
The function plots
if Temporal.Group=TRUE
, two graphs: an alluvial graph and
a plot showing the time evolution of the number of DE genes within
each temporal group. By temporal group, we mean the sets of genes which
are first DE at the same time.
if Temporal.Group=FALSE
: an alluvial graph.
DEplotAlluvial( table.DE.time, Temporal.Group = TRUE, title.alluvial = NULL, title.evolution = NULL )
DEplotAlluvial( table.DE.time, Temporal.Group = TRUE, title.alluvial = NULL, title.evolution = NULL )
table.DE.time |
Binary matrix (table filled with 0 and 1) with
|
Temporal.Group |
|
title.alluvial |
String of characters or |
title.evolution |
String of characters or |
The names of the columns of the table will be the axis labels in the plots. If the table has no column names, the function will automatically create column names (t1,t2,...).
The function returns, as described in description
if Temporal.Group=TRUE
, two graphs: an alluvial graph and
a plot showing the time evolution of the number of DE genes within
each temporal group.
By temporal group, we mean the sets of genes which are first DE
at the same time.
if Temporal.Group=FALSE
: an alluvial graph.
The DEplotAlluvial()
function
is used by the following functions of our package : DEanalysisTime()
and DEanalysisTimeAndGroup()
.
calls the R package ggplot2 in order to plot the two graphs.
set.seed(1994) NbTime.vst0 <- 4 BinTable <- matrix(sample(c(0,1),replace=TRUE, size=NbTime.vst0*120,c(0.60,0.40)), ncol=NbTime.vst0) colnames(BinTable) <- paste0("t", 1:NbTime.vst0) ##------------------------------------------------------------------------## res.alluvial <- DEplotAlluvial(table.DE.time=BinTable) print(res.alluvial$g.alluvial) print(res.alluvial$g.alluvial.freq)
set.seed(1994) NbTime.vst0 <- 4 BinTable <- matrix(sample(c(0,1),replace=TRUE, size=NbTime.vst0*120,c(0.60,0.40)), ncol=NbTime.vst0) colnames(BinTable) <- paste0("t", 1:NbTime.vst0) ##------------------------------------------------------------------------## res.alluvial <- DEplotAlluvial(table.DE.time=BinTable) print(res.alluvial$g.alluvial) print(res.alluvial$g.alluvial.freq)
From a contingency table between two variables,
the function plots a barplot of the frequency distribution of one variable
against the other (see Details
).
DEplotBarplot(ContingencyTable, dodge = TRUE)
DEplotBarplot(ContingencyTable, dodge = TRUE)
ContingencyTable |
A numeric data.frame, corresponding to a contingency table, of dimension N1*N2, with N1 and N2, respectively the number of levels in the first and second variable (see examples and details). |
dodge |
|
A contingency table (or cross-tabulation) is a table that displays the
frequency distribution of two variables (each containing several levels),
i.e. the number of observation recorded per pair of levels.
The function plots a single barplot from ContingencyTable
.
This function is called by DEanalysisGroup()
and
DEanalysisTimeAndGroup()
.
These two functions produce several contingency tables,
giving information about specific and particular
DE genes, as described below.
First, we look for all genes that are DE between at least two biological conditions. A gene will be called specific to a given biological condition BC1, if the gene is DE between BC1 and any other biological conditions, but not DE between any pair of other biological conditions. Then each DE gene will be categorized as follow:
If a gene is not specific, the gene will be categorized as 'Other'. The category 'Other' does not exist when there are only two biological conditions.
If a gene is specific to a given biological condition BC1 and expressions in BC1 are higher than in the other biological conditions, the gene will be categorized as 'Upregulated'.
If a gene is specific to a given biological condition BC1 and expressions in BC1 are lower than in the other biological conditions, the gene will be categorized as 'Downregulated'.
The functions DEanalysisGroup()
and DEanalysisTimeAndGroup()
produce two
contingency table that allow to plot both
the number of genes categorized as 'Other', 'Upregulated' and 'Downregulated' (only when there are strictly more than two biological conditions).
the number of genes categorized 'Upregulated' and 'Downregulated'.
Second, we look for all genes that are DE between at least one time point (except t0) and t0 for each biological condition. A gene will be categorized as 'particular' to a given biological condition BC1 for a given time point ti (except t0), if the gene is DE between ti and t0 for the biological condition BC1, but not DE between ti and t0 for the other biological conditions. A gene will be categorized as 'common' to all biological conditions, if the gene is DE between ti and t0 for all biological conditions. Otherwise, a gene will categorized as 'Other'.
The function DEanalysisTimeAndGroup()
produces a contingency table that
allow to plot the number of 'specific', 'common' and 'other' genes for
each ti (except t0).
A barplot using ggplot2 (see details).
The DEplotBarplot()
function
is used by the following functions of our package: DEanalysisGroup()
and DEanalysisTimeAndGroup()
.
calls the R package ggplot2 in order to plot the barplot.
## Data simulation CrossTabulation <- matrix(c(75,30,10,5, 5,35,5,20, 220,235,285,275), ncol=4, byrow=TRUE) colnames(CrossTabulation) <- c("A", "B", "C", "D") row.names(CrossTabulation) <- c("Spe.Pos", "Spe.Neg", "Other") ##------------------------------------------------------------------------## res.dodgeTRUE <- DEplotBarplot(ContingencyTable=CrossTabulation,dodge=FALSE) res.dodgeTRUE res.dodgeFALSE <- DEplotBarplot(ContingencyTable=CrossTabulation,dodge=TRUE) res.dodgeFALSE
## Data simulation CrossTabulation <- matrix(c(75,30,10,5, 5,35,5,20, 220,235,285,275), ncol=4, byrow=TRUE) colnames(CrossTabulation) <- c("A", "B", "C", "D") row.names(CrossTabulation) <- c("Spe.Pos", "Spe.Neg", "Other") ##------------------------------------------------------------------------## res.dodgeTRUE <- DEplotBarplot(ContingencyTable=CrossTabulation,dodge=FALSE) res.dodgeTRUE res.dodgeFALSE <- DEplotBarplot(ContingencyTable=CrossTabulation,dodge=TRUE) res.dodgeFALSE
The function creates a faceted barplot from a data.frame containing two or three qualitative variables and one quantitative variable.
DEplotBarplotFacetGrid( Data, Abs.col, Legend.col, Facet.col, Value.col, Color.Legend = NULL, LabsPlot = c("", "") )
DEplotBarplotFacetGrid( Data, Abs.col, Legend.col, Facet.col, Value.col, Color.Legend = NULL, LabsPlot = c("", "") )
Data |
Data.frame containing three or four columns. One must contain quantitative variable and the other qualitative variables. |
Abs.col |
Integer indicating the column of |
Legend.col |
Integer indicating the column of |
Facet.col |
Integer indicating the column of |
Value.col |
Integer indicating the column of |
Color.Legend |
Data.frame or |
LabsPlot |
Vector of two characters indicating the x-axis label and
the y-axis label of the facet grid barplot.
By default, |
The function will plot a facet grid barplot.
The function is called by our function
DEanalysisTimeAndGroup()
in order to plot the number of specific (up- or down-regulated) DE genes
per biological condition for each time points.
The function
is called by the function
DEanalysisTimeAndGroup()
calls the R functions
ggplot2::facet_grid()
and
ggplot2::geom_bar()
.
Group.ex <- c('G1', 'G2', 'G3') Time.ex <- c('t1', 't2', 't3', 't4') Spe.sign.ex <- c("Pos", "Neg") GtimesT <- length(Group.ex)*length(Time.ex) Nb.Spe <- sample(3:60, GtimesT, replace=FALSE) Nb.Spe.sign <- sample(3:60, 2*GtimesT, replace=FALSE) ##------------------------------------------------------------------------## Melt.Dat.1 <- data.frame(Group=rep(Group.ex, times=length(Time.ex)), Time=rep(Time.ex, each=length(Group.ex)), Nb.Spe.DE=Nb.Spe) DEplotBarplotFacetGrid(Data=Melt.Dat.1, Abs.col=2, Legend.col=2, Facet.col=1, Value.col=3, Color.Legend=NULL) DEplotBarplotFacetGrid(Data=Melt.Dat.1, Abs.col=1, Legend.col=1, Facet.col=2, Value.col=3, Color.Legend=NULL) ##------------------------------------------------------------------------## Melt.Dat.2 <- data.frame(Group=rep(Group.ex, times=length(Time.ex)*2), Time=rep(Time.ex, each=length(Group.ex)*2), Spe.sign=rep(Spe.sign.ex, times=2*GtimesT), Nb.Spe.DE=Nb.Spe.sign) DEplotBarplotFacetGrid(Data=Melt.Dat.2, Abs.col=1, Legend.col=3, Facet.col=2, Value.col=4, Color.Legend=NULL)
Group.ex <- c('G1', 'G2', 'G3') Time.ex <- c('t1', 't2', 't3', 't4') Spe.sign.ex <- c("Pos", "Neg") GtimesT <- length(Group.ex)*length(Time.ex) Nb.Spe <- sample(3:60, GtimesT, replace=FALSE) Nb.Spe.sign <- sample(3:60, 2*GtimesT, replace=FALSE) ##------------------------------------------------------------------------## Melt.Dat.1 <- data.frame(Group=rep(Group.ex, times=length(Time.ex)), Time=rep(Time.ex, each=length(Group.ex)), Nb.Spe.DE=Nb.Spe) DEplotBarplotFacetGrid(Data=Melt.Dat.1, Abs.col=2, Legend.col=2, Facet.col=1, Value.col=3, Color.Legend=NULL) DEplotBarplotFacetGrid(Data=Melt.Dat.1, Abs.col=1, Legend.col=1, Facet.col=2, Value.col=3, Color.Legend=NULL) ##------------------------------------------------------------------------## Melt.Dat.2 <- data.frame(Group=rep(Group.ex, times=length(Time.ex)*2), Time=rep(Time.ex, each=length(Group.ex)*2), Spe.sign=rep(Spe.sign.ex, times=2*GtimesT), Nb.Spe.DE=Nb.Spe.sign) DEplotBarplotFacetGrid(Data=Melt.Dat.2, Abs.col=1, Legend.col=3, Facet.col=2, Value.col=4, Color.Legend=NULL)
The function takes as input two tables
a binary table with rows corresponding to genes and
columns corresponding to times
(with
the number of time points).
A '1' in the n-th row and i-th column means that the n-th gene is
differentially expressed (DE) at time ti,
compared with the reference time t0.
a numeric matrix with positive and negative values with
rows corresponding to genes and
columns corresponding
to times.
The element in n-th row and i-th column corresponds to the log2 fold change
between the time ti and the reference time t0 for the n-th gene.
If the gene is DE and the sign is positive, then the gene n will be
considered as over-expressed (up-regulated) at the time ti.
If the gene is DE and the sign is negative, then the gene n will be
considered as under-expressed (down-regulated) at the time ti.
The function plots two graphs: a barplot showing the number of DE genes per time and a barplot showing the number of under- and over-expressed genes per times.
DEplotBarplotTime(table.DE.time, Log2.FC.matrix)
DEplotBarplotTime(table.DE.time, Log2.FC.matrix)
table.DE.time |
Binary matrix (table filled with 0 and 1) with
|
Log2.FC.matrix |
Numeric matrix with positive and negative with
|
The function plots two graphs: a barplot showing the number of DE genes per time and a barplot showing the number of under and over expressed genes per times.
set.seed(1994) Dat1.FTP <- matrix(sample(c(0,1), replace=TRUE, size=120, prob=c(0.3,0.7)), ncol=3) Dat2.FTP <- matrix(round(rnorm(n=120, mean=0, sd=1),digits=2), ncol=3) colnames(Dat1.FTP) <- paste0("t", 1:3) colnames(Dat2.FTP) <- paste0("t", 1:3) ##-----------------------------------------------------------------------### res.DE.all.t <- DEplotBarplotTime(table.DE.time=Dat1.FTP, Log2.FC.matrix=Dat2.FTP) print(res.DE.all.t$g.nb.DEPerTime) print(res.DE.all.t$g.nb.DEPerTime.sign)
set.seed(1994) Dat1.FTP <- matrix(sample(c(0,1), replace=TRUE, size=120, prob=c(0.3,0.7)), ncol=3) Dat2.FTP <- matrix(round(rnorm(n=120, mean=0, sd=1),digits=2), ncol=3) colnames(Dat1.FTP) <- paste0("t", 1:3) colnames(Dat2.FTP) <- paste0("t", 1:3) ##-----------------------------------------------------------------------### res.DE.all.t <- DEplotBarplotTime(table.DE.time=Dat1.FTP, Log2.FC.matrix=Dat2.FTP) print(res.DE.all.t$g.nb.DEPerTime) print(res.DE.all.t$g.nb.DEPerTime.sign)
The function returns two heatmaps:
one heatmap of gene expressions between samples and selected genes and
a correlation heatmap between samples from the output of
DEanalysisGlobal()
.
DEplotHeatmaps( SEresDE, ColumnsCriteria = 2, Set.Operation = "union", NbGene.analysis = 20, Color.Group = NULL, SizeLabelRows = 5, SizeLabelCols = 5, Display.plots = TRUE, Save.plots = FALSE )
DEplotHeatmaps( SEresDE, ColumnsCriteria = 2, Set.Operation = "union", NbGene.analysis = 20, Color.Group = NULL, SizeLabelRows = 5, SizeLabelCols = 5, Display.plots = TRUE, Save.plots = FALSE )
SEresDE |
A SummarizedExperiment class object. Output from
|
ColumnsCriteria |
A vector of integers where each integer indicates
a column of |
Set.Operation |
A character.
The user must choose between "union" (default), "intersect", "setdiff"
(see |
NbGene.analysis |
An integer or |
Color.Group |
|
SizeLabelRows |
Numeric >0. Size of the labels for the genes in the heatmaps. |
SizeLabelCols |
Numeric >0. Size of the labels for the samples in the heatmaps. |
Display.plots |
|
Save.plots |
TRUE or FALSE or a Character.
If |
We have the following three cases:
If Set.Operation="union"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that the sum of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having at least one ’1’
in one of the selected columns.
If Set.Operation="intersect"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that the product of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having ’1’
in all of the selected columns.
If Set.Operation="setdiff"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that only one element of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having ’1’
in only one of the selected columns.
The function returns the same SummarizedExperiment class object
SEresDE
with two heatmaps
saved in the metadata Results[[2]][[4]]
of SEresDE
a correlation heatmap between samples (correlation heatmap)
a heatmap across samples and genes called Zscore heatmap, for a subset of genes that can be selected by the user.
The two heatmaps are plotted if Display.plots=TRUE
.
The second heatmap is build from the normalized
count data after being both centered and scaled (Zscore).
The function calls the function
ComplexHeatmap::Heatmap()
in order to plot the Heatmaps.
## data importation data("RawCounts_Antoszewski2022_MOUSEsub500") ## No time points. We take only two groups for the speed of the example dataT1wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing with Results of DEanalysisGlobal() resDATAprepSE <- DATAprepSE(RawCounts=dataT1wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=FALSE, path.result=NULL, Name.folder.DE=NULL) ##------------------------------------------------------------------------## resHeatmap <- DEplotHeatmaps(SEresDE=resDET1wt, ColumnsCriteria=3, ## Specific genes N1haT1ko Set.Operation="union", NbGene.analysis=20, Color.Group=NULL, SizeLabelRows=5, SizeLabelCols=5, Display.plots=TRUE, Save.plots=FALSE)
## data importation data("RawCounts_Antoszewski2022_MOUSEsub500") ## No time points. We take only two groups for the speed of the example dataT1wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing with Results of DEanalysisGlobal() resDATAprepSE <- DATAprepSE(RawCounts=dataT1wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=FALSE, path.result=NULL, Name.folder.DE=NULL) ##------------------------------------------------------------------------## resHeatmap <- DEplotHeatmaps(SEresDE=resDET1wt, ColumnsCriteria=3, ## Specific genes N1haT1ko Set.Operation="union", NbGene.analysis=20, Color.Group=NULL, SizeLabelRows=5, SizeLabelCols=5, Display.plots=TRUE, Save.plots=FALSE)
The function takes as input a binary matrix or data.frame with
rows and
columns with
the number of genes and
the number of biological conditions.
The number of 1 in the n-th row gives the number of pairs of
biological conditions where the gene
is DE.
We consider that a set of pairs of biological conditions forms
an intersection if there is at least one gene which is DE for each of
these pairs of biological conditions, but not for the others.
The function calls the UpSetR::upset()
function in order to plot
the number of genes for each possible intersection in an UpSet plot
(Venn diagram displayed as a barplot).
DEplotVennBarplotGroup(Mat.DE.pair.group)
DEplotVennBarplotGroup(Mat.DE.pair.group)
Mat.DE.pair.group |
Binary matrix or data.frame with |
The function plots the number of genes for each possible intersection in a UpSet plot.
The function
calls the function UpSetR::upset()
in order to plot the UpSet plot.
is called by the functions DEanalysisGroup()
and
DEanalysisTimeAndGroup()
.
set.seed(1994) ##------------------------------------------------------------------------## ## Binary matrix Bin.Table.G <- matrix(c(sample(c(0,1), replace=TRUE, size=240,c(0.75,0.35)), sample(c(0,1), replace=TRUE, size=240,c(0.3,0.7)), rep(0,18)), ncol=6, byrow=TRUE) colnames(Bin.Table.G) <- c(".A..B.",".A..C.",".A..D.", ".B..C.",".B..D.",".C..D.") ##------------------------------------------------------------------------## ## Results res.t.upset <- DEplotVennBarplotGroup(Mat.DE.pair.group=Bin.Table.G) print(res.t.upset$Upset.global) print(res.t.upset$Upset.threshold)
set.seed(1994) ##------------------------------------------------------------------------## ## Binary matrix Bin.Table.G <- matrix(c(sample(c(0,1), replace=TRUE, size=240,c(0.75,0.35)), sample(c(0,1), replace=TRUE, size=240,c(0.3,0.7)), rep(0,18)), ncol=6, byrow=TRUE) colnames(Bin.Table.G) <- c(".A..B.",".A..C.",".A..D.", ".B..C.",".B..D.",".C..D.") ##------------------------------------------------------------------------## ## Results res.t.upset <- DEplotVennBarplotGroup(Mat.DE.pair.group=Bin.Table.G) print(res.t.upset$Upset.global) print(res.t.upset$Upset.threshold)
The function takes as input two matrix or data.frame
a binary matrix or data.frame with rows corresponding to genes
and
columns corresponding to times
(with
the number of time points).
A '1' in the n-th row and i-th column means that the n-th gene is
differentially expressed (DE) at time ti, compared with
the reference time t0.
a numeric matrix or data.frame with rows corresponding to genes
and
columns corresponding to times.
The element in n-th row and i-th column corresponds to the
fold change between the time ti and the reference time t0 for the n-th gene.
If the gene is DE and the sign is positive, then the gene n will be
considered as over-expressed (up-regulated) at time ti.
If the gene is DE and the sign is negative, then the gene n will be
considered as under-expressed (down-regulated) at time ti.
DEplotVennBarplotTime(table.DE.time, Log2.FC.matrix)
DEplotVennBarplotTime(table.DE.time, Log2.FC.matrix)
table.DE.time |
Binary matrix or data.frame (table filled with 0 and 1)
with |
Log2.FC.matrix |
Numeric matrix or data.frame with |
The function plots
the number of genes per time patterns in an UpSet plot (Venn diagram
displayed as a barplot) with the R function UpSetR::upset()
.
By temporal pattern, we mean the set of times ti such that the gene is
DE between ti and the reference time t0.
a similar UpSet plot where each bar is split in different colors corresponding to all possible numbers of DE times where genes are over expressed in a given temporal pattern.
The function
calls the function UpSetR::upset()
in order to plot the UpSet plot.
is called by the functions DEanalysisTime()
and
DEanalysisTimeAndGroup()
.
set.seed(1994) Nb.Time <- 4 ## Number of time measurement ##------------------------------------------------------------------------## table.DE.time.ex <- matrix(sample(c(0,1), replace=TRUE, size=40*(Nb.Time-1), c(0.2, 0.8)), ncol=Nb.Time-1) colnames(table.DE.time.ex) <- paste0("t", 1:(Nb.Time-1)) ##------------------------------------------------------------------------## Log2FC.mat.ex <- matrix(round(rnorm(n=40*(Nb.Time-1), mean=0, sd=1), digits=2), ncol=(Nb.Time-1)) colnames(Log2FC.mat.ex) <- paste0("t", 1:(Nb.Time-1)) ##------------------------------------------------------------------------## res.test.VennBarplot <- DEplotVennBarplotTime(table.DE.time=table.DE.time.ex, Log2.FC.matrix=Log2FC.mat.ex) print(res.test.VennBarplot$Upset.graph) print(res.test.VennBarplot$Upset.graph.with.nb.over) res.test.VennBarplot$DE.pattern.t.01.sum
set.seed(1994) Nb.Time <- 4 ## Number of time measurement ##------------------------------------------------------------------------## table.DE.time.ex <- matrix(sample(c(0,1), replace=TRUE, size=40*(Nb.Time-1), c(0.2, 0.8)), ncol=Nb.Time-1) colnames(table.DE.time.ex) <- paste0("t", 1:(Nb.Time-1)) ##------------------------------------------------------------------------## Log2FC.mat.ex <- matrix(round(rnorm(n=40*(Nb.Time-1), mean=0, sd=1), digits=2), ncol=(Nb.Time-1)) colnames(Log2FC.mat.ex) <- paste0("t", 1:(Nb.Time-1)) ##------------------------------------------------------------------------## res.test.VennBarplot <- DEplotVennBarplotTime(table.DE.time=table.DE.time.ex, Log2.FC.matrix=Log2FC.mat.ex) print(res.test.VennBarplot$Upset.graph) print(res.test.VennBarplot$Upset.graph.with.nb.over) res.test.VennBarplot$DE.pattern.t.01.sum
The function returns Volcano plots and MA plots from
the results of our function
DEanalysisGlobal()
.
DEplotVolcanoMA( SEresDE, NbGene.plotted = 2, SizeLabel = 3, Display.plots = FALSE, Save.plots = FALSE )
DEplotVolcanoMA( SEresDE, NbGene.plotted = 2, SizeLabel = 3, Display.plots = FALSE, Save.plots = FALSE )
SEresDE |
A SummarizedExperiment class object. Output from
|
NbGene.plotted |
Non negative integer. The algorithm computes the sum
of all the absolute |
SizeLabel |
Numeric. Give the size of the names of plotted genes.
By default, |
Display.plots |
|
Save.plots |
If If |
If data belong to different time points only, the function returns
volcano and MA plots
(with
the number of time measurements), corresponding to the
fold change between each time ti and the reference time t0,
for all
.
If data belong to different biological conditions only,
the function returns volcano and MA plots
(with
the number of biological conditions),
corresponding to the
fold change between each pair of
biological condition.
If data belong to different biological conditions and time points, the function returns
volcano and MA plots,
corresponding to the
fold change between each time ti and
the reference time t0, for all biological condition.
volcano and MA plots,
corresponding to the
fold change between
each pair of biological conditions, for all fixed time point.
The function returns the same SummarizedExperiment class object
SEresDE
with Volcano plots and MA plots from the results of
our function DEanalysisGlobal()
,
all saved in the metadata Results[[2]][[3]]
of SEresDE
.
The function calls the output of
DEanalysisGlobal()
.
## data importation data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example dataT1wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing with Results of DEanalysisGlobal() resDATAprepSE <- DATAprepSE(RawCounts=dataT1wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=FALSE, path.result=NULL, Name.folder.DE=NULL) ##------------------------------------------------------------------------## ## Volcano MA resVolcanoMA <- DEplotVolcanoMA(SEresDE=resDET1wt, NbGene.plotted=5, Display.plots=TRUE, Save.plots=FALSE)
## data importation data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example dataT1wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing with Results of DEanalysisGlobal() resDATAprepSE <- DATAprepSE(RawCounts=dataT1wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=FALSE, path.result=NULL, Name.folder.DE=NULL) ##------------------------------------------------------------------------## ## Volcano MA resVolcanoMA <- DEplotVolcanoMA(SEresDE=resDET1wt, NbGene.plotted=5, Display.plots=TRUE, Save.plots=FALSE)
This function realizes the intermediary steps of the analysis
of the function
DEanalysisGroup()
.
DEresultGroup( DESeq.result, LRT.supp.info = TRUE, pval.min = 0.05, log.FC.min = 1 )
DEresultGroup( DESeq.result, LRT.supp.info = TRUE, pval.min = 0.05, log.FC.min = 1 )
DESeq.result |
Output from the function
|
LRT.supp.info |
|
pval.min |
Numeric value between 0 and 1. A gene will be considered as
differentially expressed (DE) between two biological conditions if
its Benjamini-Hochberg adjusted p-value
(see |
log.FC.min |
Non negative numeric value.
If the log2 fold change between biological conditions or times has
an absolute value below the threshold |
The function returns the same DESeqDataSet class object
DESeq.result
with the following results,
saved in the metadata DEresultsGroup
of DESeq.result
:
a data.frame (output DEsummary
of DEresultsGroup
)
which contains
gene names
pvalues, log2 fold change and DE genes between each pairs of biological conditions.
a binary column (1 and 0) where 1 means the gene is DE between at least one pair of biological conditions.
binary columns, where
is the number of
biological conditions, which gives the specific genes for each
biological condition.
A '1' in one of these columns means the gene is specific to the
biological condition associated to the given column. 0 otherwise.
A gene is called specific to a given biological condition BC1,
if the gene is DE between BC1 and any other biological conditions,
but not DE between any pair of other biological conditions.
columns filled with -1, 0 and 1, one per
biological condition.
A '1' in one of these columns means the gene is up-regulated
(or over-expressed) for the biological condition associated
to the given column.
A gene is called up-regulated for a given biological condition BC1 if
the gene is specific to the biological condition BC1 and expressions in
BC1 are higher than in the other biological conditions.
A '-1' in one of these columns means the gene is down-regulated
(or under-expressed) for the biological condition associated to the given
column.
A gene is called regulated for a given biological condition BC1 if
the gene is specific to the biological condition BC1 and expressions in
BC1 are lower than in the other biological conditions.
A '0' in one of these columns means the gene is not specific to
the biological condition associated to the given column.
a data.frame (output DE.per.pair.G
of DEresultsGroup
)
with rows and
columns
with
the number of genes
and
the number of biological conditions.
The number of 1 in the n-th row gives the number of pairs of
biological conditions where the gene
is DE.
The output
DE.per.pair.G
will be the input of the function
DEplotVennBarplotGroup()
.
a contingency matrix (output Contingence.per.group
of DEresultsGroup
) which gives for each biological condition
the number of genes categorized as
"Upregulated", "DownRugulated" and "Other".
A gene is categorized as 'Other', for a given biological condition BC1,
if the gene is not specific to the biological condition BC1.
The category 'Other' does not exist when there are only two
biological conditions.
The output Contingence.per.group
will be the input of the function
DEplotBarplot()
.
The output of the function are used by the main function
DEanalysisGroup()
.
## Data data("RawCounts_Antoszewski2022_MOUSEsub500") ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing step resDATAprepSEmus1 <- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEmus1)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE.G <- DESeq2::DESeq(DESeq2obj) res.sum.G <- DEresultGroup(DESeq.result=dds.DE.G, LRT.supp.info=FALSE, log.FC.min=1, pval.min=0.05)
## Data data("RawCounts_Antoszewski2022_MOUSEsub500") ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing step resDATAprepSEmus1 <- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEmus1)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE.G <- DESeq2::DESeq(DESeq2obj) res.sum.G <- DEresultGroup(DESeq.result=dds.DE.G, LRT.supp.info=FALSE, log.FC.min=1, pval.min=0.05)
This function realizes the intermediate steps of the analysis
of the function DEanalysisTimeAndGroup()
.
DEresultGroupPerTime( DESeq.result, LRT.supp.info = TRUE, pval.min = 0.05, log.FC.min = 1 )
DEresultGroupPerTime( DESeq.result, LRT.supp.info = TRUE, pval.min = 0.05, log.FC.min = 1 )
DESeq.result |
Output from the function
|
LRT.supp.info |
|
pval.min |
Numeric value between 0 and 1. A gene will be considered as
differentially expressed (DE) between two biological conditions if its
Benjamini-Hochberg adjusted p-value
(see |
log.FC.min |
Non negative numeric value.
If the log2 fold change between biological conditions or times has
an absolute value below the threshold |
The function returns the same DESeqDataSet class object
DESeq.result
with the following results,
saved in the metadata DEresultsTimeGroup
of DESeq.result
:
a data.frame (output DEsummary
of DEresultsTimeGroup
)
which contains
pvalues, log2 fold change and DE genes between each pairs of biological conditions for a fixed time ti (except the reference time t0).
DE specific genes per biological condition for a fixed time ti (except the reference time t0).
inputs for the functions :
DEplotBarplot()
,
DEplotBarplotTime()
,
DEplotVennBarplotGroup()
,
DEplotVennBarplotTime()
,
DEplotBarplotFacetGrid()
,
DEplotAlluvial()
.
The output of the function are used by the main function
DEanalysisTimeAndGroup()
.
data("RawCounts_Schleiss2021_CLLsub500") ## We take only the first three times (both group) for the speed of ## the example Index3t<-c(2:4,11:13,20:22, 29:31,38:40,47:49) RawCounts_3t<-RawCounts_Schleiss2021_CLLsub500[seq_len(200), c(1,Index3t)] ## Preprocessing step resDATAprepSEleuk <- DATAprepSE(RawCounts=RawCounts_3t, Column.gene=1, Group.position=2, Time.position=4, Individual.position=3) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEleuk)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE <- DESeq2::DESeq(DESeq2obj) ## res.G.T.2 <- DEresultGroupPerTime(DESeq.result=dds.DE, LRT.supp.info=FALSE, log.FC.min=1, pval.min=0.05)
data("RawCounts_Schleiss2021_CLLsub500") ## We take only the first three times (both group) for the speed of ## the example Index3t<-c(2:4,11:13,20:22, 29:31,38:40,47:49) RawCounts_3t<-RawCounts_Schleiss2021_CLLsub500[seq_len(200), c(1,Index3t)] ## Preprocessing step resDATAprepSEleuk <- DATAprepSE(RawCounts=RawCounts_3t, Column.gene=1, Group.position=2, Time.position=4, Individual.position=3) DESeq2preprocess <- S4Vectors::metadata(resDATAprepSEleuk)$DESeq2obj DESeq2obj <- DESeq2preprocess$DESeq2preproceesing ##------------------------------------------------------------------------## dds.DE <- DESeq2::DESeq(DESeq2obj) ## res.G.T.2 <- DEresultGroupPerTime(DESeq.result=dds.DE, LRT.supp.info=FALSE, log.FC.min=1, pval.min=0.05)
The function returns, from the output of
DEanalysisGlobal()
,
specific files designed to be used as input for several online online tools
and software given in the section Value
.
GSEApreprocessing( SEresDE, ColumnsCriteria, Set.Operation, Rnk.files = TRUE, Save.files = FALSE )
GSEApreprocessing( SEresDE, ColumnsCriteria, Set.Operation, Rnk.files = TRUE, Save.files = FALSE )
SEresDE |
A SummarizedExperiment class object. Output from
|
ColumnsCriteria |
A vector of integers where each integer indicates
a column of |
Set.Operation |
A character. The user must choose between "union"
(default), "intersect", "setdiff" (see |
Rnk.files |
|
Save.files |
|
We have the following three cases:
If Set.Operation="union"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that the sum of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having at least one ’1’
in one of the selected columns.
If Set.Operation="intersect"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that the product of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having ’1’
in all of the selected columns.
If Set.Operation="setdiff"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that only one element of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having ’1’
in only one of the selected columns.
The function returns
A vector of character containing gene names specified by
ColumnsCriteria
and Set.Operation
.
A vector of character containing all gene names
And, in case where Save.files=TRUE
and the path.result
of
DEanalysisGlobal()
is not NULL, specific files designed to be used
as input for the following online tools and software :
WebGestalt : http://www.webgestalt.org
gProfiler : https://biit.cs.ut.ee/gprofiler/gost
Panther : http://www.pantherdb.org
ShinyGO : http://bioinformatics.sdstate.edu/go/
Enrichr : https://maayanlab.cloud/Enrichr/
GOrilla : http://cbl-gorilla.cs.technion.ac.il.
data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt <- RawCounts_Antoszewski2022_MOUSEsub500[, seq_len(7)] ##------------------------------------------------------------------------## ## Preprocessing resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, Name.folder.DE=NULL) ##------------------------------------------------------------------------## resGp <- GSEApreprocessing(SEresDE=resDET1wt, ColumnsCriteria=2, Set.Operation="union", Rnk.files=TRUE, Save.files=FALSE)
data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example RawCounts_T1Wt <- RawCounts_Antoszewski2022_MOUSEsub500[, seq_len(7)] ##------------------------------------------------------------------------## ## Preprocessing resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_T1Wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## DE analysis resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, pval.min=0.05, pval.vect.t=NULL, log.FC.min=1, LRT.supp.info=FALSE, Plot.DE.graph=TRUE, path.result=NULL, Name.folder.DE=NULL) ##------------------------------------------------------------------------## resGp <- GSEApreprocessing(SEresDE=resDET1wt, ColumnsCriteria=2, Set.Operation="union", Rnk.files=TRUE, Save.files=FALSE)
The function realizes, from the outputs of
DEanalysisGlobal()
,
an enrichment analysis (GSEA) of a subset of genes with
the R package gprofiler2
.
GSEAQuickAnalysis( Internet.Connection = FALSE, SEresDE, ColumnsCriteria = 1, ColumnsLog2ordered = NULL, Set.Operation = "union", Organism = "hsapiens", MaxNumberGO = 20, Background = FALSE, Display.plots = TRUE, Save.plots = FALSE )
GSEAQuickAnalysis( Internet.Connection = FALSE, SEresDE, ColumnsCriteria = 1, ColumnsLog2ordered = NULL, Set.Operation = "union", Organism = "hsapiens", MaxNumberGO = 20, Background = FALSE, Display.plots = TRUE, Save.plots = FALSE )
Internet.Connection |
|
SEresDE |
A SummarizedExperiment class object. Output from
|
ColumnsCriteria |
A vector of integers where each integer indicates
a column of |
ColumnsLog2ordered |
|
Set.Operation |
A character. The user must choose between "union"
(default), "intersect", "setdiff" (see |
Organism |
A character indicating the organism
where data were taken from.
See vignette of the R package |
MaxNumberGO |
An integer.
The user can select the |
Background |
|
Display.plots |
|
Save.plots |
|
If ColumnsLog2ordered
is a vector of integers,
the rows of Res.DE.analysis$DE.results
(corresponding to genes)
will be decreasingly ordered according to the sum of absolute
fold change (the selected columns must contain
fold change
values) before the enrichment analysis.
The enrichment analysis will take into account the genes order as
the first genes will be considered to have the highest biological importance
and the last genes the lowest.
See the input
ordered_query
of
gprofiler2::gost()
and the vignette of gprofiler2
for more details.
We have the following three cases:
If Set.Operation="union"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that the sum of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having at least one ’1’
in one of the selected columns.
If Set.Operation="intersect"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that the product of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having ’1’
in all of the selected columns.
If Set.Operation="setdiff"
then the rows extracted from
the different datasets (raw counts, normalized data and
SummarizedExperiment::rowData(SEresDE)
)
included in the SummarizedExperiment class object SEresDE
are those such that only one element of the selected columns of
SummarizedExperiment::rowData(SEresDE)
given in ColumnsCriteria
is >0.
This means that the selected genes are those having ’1’
in only one of the selected columns.
The function returns the same SummarizedExperiment class object
SEresDE
with
a data.frame which contains the outputs of
gprofiler2::gost()
a Manhattan plot showing all GO names according to their pvalue
a lollipop graph showing the MaxNumberGO
most important GO.
saved in the metadata Results[[2]][[5]]
of SEresDE
.
The Manhattan plot and the lollipop graph are plotted if
Display.plots=TRUE
.
The function uses the R package gprofiler2
https://cran.r-project.org/web/packages/gprofiler2/vignettes/gprofiler2.html.
The R package gprofiler2
provides an R interface to the web toolset
g:Profiler https://biit.cs.ut.ee/gprofiler/gost.
## data importation data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example dataT1wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing with Results of DEanalysisGlobal() resDATAprepSE <- DATAprepSE(RawCounts=dataT1wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## Internet is needed in order to run the following lines of code because ## gprofileR2 needs an internet connection ## DE analysis # resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, # pval.min=0.05, # pval.vect.t=NULL, # log.FC.min=1, # LRT.supp.info=FALSE, # Plot.DE.graph=FALSE, # path.result=NULL, # Name.folder.DE=NULL) ######### # resGs <- GSEAQuickAnalysis(Internet.Connection=TRUE, # SEresDE=resDET1wt, # ColumnsCriteria=3, # ColumnsLog2ordered=NULL, # Set.Operation="union", # Organism="mmusculus", # MaxNumberGO=20, # Background=FALSE, # Display.plots=TRUE, # Save.plots=FALSE)
## data importation data(RawCounts_Antoszewski2022_MOUSEsub500) ## No time points. We take only two groups for the speed of the example dataT1wt <- RawCounts_Antoszewski2022_MOUSEsub500[seq_len(200), seq_len(7)] ## Preprocessing with Results of DEanalysisGlobal() resDATAprepSE <- DATAprepSE(RawCounts=dataT1wt, Column.gene=1, Group.position=1, Time.position=NULL, Individual.position=2) ##------------------------------------------------------------------------## ## Internet is needed in order to run the following lines of code because ## gprofileR2 needs an internet connection ## DE analysis # resDET1wt <- DEanalysisGlobal(SEres=resDATAprepSE, # pval.min=0.05, # pval.vect.t=NULL, # log.FC.min=1, # LRT.supp.info=FALSE, # Plot.DE.graph=FALSE, # path.result=NULL, # Name.folder.DE=NULL) ######### # resGs <- GSEAQuickAnalysis(Internet.Connection=TRUE, # SEresDE=resDET1wt, # ColumnsCriteria=3, # ColumnsLog2ordered=NULL, # Set.Operation="union", # Organism="mmusculus", # MaxNumberGO=20, # Background=FALSE, # Display.plots=TRUE, # Save.plots=FALSE)
The functions performs a hierarchical clustering on results
from a factor analysis with the R function
FactoMineR::HCPC()
.
HCPCanalysis( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Plot.HCPC = FALSE, Color.Group = NULL, Phi = 25, Theta = 140, epsilon = 0.2, Cex.point = 0.7, Cex.label = 0.7, motion3D = FALSE, path.result = NULL, Name.folder.hcpc = NULL )
HCPCanalysis( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Plot.HCPC = FALSE, Color.Group = NULL, Phi = 25, Theta = 140, epsilon = 0.2, Cex.point = 0.7, Cex.label = 0.7, motion3D = FALSE, path.result = NULL, Name.folder.hcpc = NULL )
SEresNorm |
Results of the function
|
DATAnorm |
|
gene.deletion |
|
sample.deletion |
|
Plot.HCPC |
|
Color.Group |
|
Phi |
Angle defining the colatitude direction for the 3D PCA plot
(see |
Theta |
Angle defining the azimuthal direction for the 3D PCA plot
(see |
epsilon |
Non negative numeric value giving the length between points
and their labels in all PCA plots which are not automatically plotted
by |
Cex.point |
Non negative numeric value giving the size of points
in all PCA plots which are not automatically plotted by
|
Cex.label |
Non negative numeric value giving the size of the labels
associated to each point of the all PCA graphs which are not automatically
plotted by |
motion3D |
|
path.result |
Character or |
Name.folder.hcpc |
Character or |
All results are built from the results of our function
DATAnormalization()
.
The number of clusters is automatically selected by
FactoMineR::HCPC()
and is described in the section Details
of
FactoMineR::HCPC()
.
The function returns the same SummarizedExperiment class object
SEresNorm
with the outputs from the function
FactoMineR::HCPC()
,
(saved in the metadata Results[[1]][[3]]
of SEresNorm
)
a dendrogram (also called hierarchical tree) using the function
factoextra::fviz_dend()
one 2D PCA and two 3D PCA produced by the function
PCAgraphics()
where samples are colored with different colors for different clusters.
The two 3D PCA graphs are identical but one of them will be opened
in a rgl window
(see plot3Drgl::plotrgl()
)
allowing to interactively rotate and zoom.
The interactive 3D graph will be plotted only if motion3D=TRUE
.
A graph indicating for each sample, its cluster and the time and/or biological condition associated to the sample.
the outputs of
FactoMineR::HCPC()
.
The function calls the functions
PCArealization()
and
FactoMineR::HCPC()
.
The function
FactoMineR::HCPC()
will take as input the output of
PCArealization()
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resHCPCanalysis <- HCPCanalysis(SEresNorm=resNorm, DATAnorm=TRUE, sample.deletion=NULL, gene.deletion=NULL, Plot.HCPC=TRUE, Color.Group=NULL, Phi=25, Theta=140, Cex.point=1, Cex.label=0.6, epsilon=0.4, motion3D=FALSE, path.result=NULL, Name.folder.hcpc=NULL)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resHCPCanalysis <- HCPCanalysis(SEresNorm=resNorm, DATAnorm=TRUE, sample.deletion=NULL, gene.deletion=NULL, Plot.HCPC=TRUE, Color.Group=NULL, Phi=25, Theta=140, Cex.point=1, Cex.label=0.6, epsilon=0.4, motion3D=FALSE, path.result=NULL, Name.folder.hcpc=NULL)
The function performs a soft clustering of temporal patterns
based on the fuzzy c-means algorithm using the R package Mfuzz
.
MFUZZanalysis( SEresNorm, DATAnorm = TRUE, DataNumberCluster = NULL, Method = "hcpc", Max.clust = 6, Membership = 0.5, Min.std = 0.1, Plot.Mfuzz = TRUE, path.result = NULL, Name.folder.mfuzz = NULL )
MFUZZanalysis( SEresNorm, DATAnorm = TRUE, DataNumberCluster = NULL, Method = "hcpc", Max.clust = 6, Membership = 0.5, Min.std = 0.1, Plot.Mfuzz = TRUE, path.result = NULL, Name.folder.mfuzz = NULL )
SEresNorm |
Results of the function
|
DATAnorm |
|
DataNumberCluster |
Data.frame or |
Method |
"kmeans" or "hcpc". The method used for selecting the number
of cluster to be used for the temporal cluster analysis
(see |
Max.clust |
Integer strictly superior to 1 indicating the maximum
number of clusters.
|
Membership |
Numeric value between 0 and 1.
For each cluster, genes with membership values below the threshold
|
Min.std |
Numeric positive value.
All genes where their standard deviations are smaller than the threshold
|
Plot.Mfuzz |
|
path.result |
Character or |
Name.folder.mfuzz |
Character or |
All results are built from the results of our function
DATAnormalization()
.
The Mfuzz
package works with datasets where rows correspond to genes
and columns correspond to times.
If RawCounts
(input of our function
DATAprepSE()
)
contains several replicates per time,
the algorithm computes the mean of replicates for each gene before using
Mfuzz::mfuzz()
.
When there are several biological conditions, the algorithm realizes the
Mfuzz::mfuzz()
analysis for each biological condition.
The function returns the same SummarizedExperiment class object
SEresNorm
with the different elements below
(saved in the metadata Results[[1]][[4]]
of SEresNorm
)
the final data used for the Mfuzz
analysis (see Details
).
the cluster associated to each gene.
plots generated by
MFUZZclustersNumber()
and
Mfuzz::mfuzz.plot2()
for each biological condition.
The function uses the function
MFUZZclustersNumber()
to compute the optimal number of cluster for each biological condition
with the kmeans method.
## Data simulation set.seed(33) DATAclustSIM <- matrix(rnorm(12*10*3, sd=0.2, mean=rep(c(rep(c(1, 6, 9, 4, 3, 1, 6.5, 0.7, 10), times=2), rep(c(2, 3.6, 3.7, 5, 7.9, 8, 7.5, 3.5, 3.4), times=2)), each=10)), nrow=30, ncol=12) DATAclustSIM <- floor(DATAclustSIM*100) ## colnames(DATAclustSIM) <- c("G1_t0_r1", "G1_t1_r1", "G1_t2_r1", "G1_t0_r2", "G1_t1_r2", "G1_t2_r2", "G2_t0_r3", "G2_t1_r3", "G2_t2_r3", "G2_t0_r4", "G2_t1_r4", "G2_t2_r4") ##------------------------------------------------------------------------## ## Plot the temporal expression of each individual graphics::matplot(t(rbind(DATAclustSIM[, 1:3], DATAclustSIM[, 4:6], DATAclustSIM[, 7:9], DATAclustSIM[, 10:12])), col=rep(c("black", "red"), each=6*10), xlab="Time", ylab="Gene expression", type=c("b"), pch=19) ##------------------------------------------------------------------------## ## Preprocessing step DATAclustSIM <- data.frame(DATAclustSIM) resDATAprepSE <- DATAprepSE(RawCounts=DATAclustSIM, Column.gene=NULL, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resMFUZZ <- MFUZZanalysis(SEresNorm=resNorm, DATAnorm=TRUE, DataNumberCluster=NULL, Membership=0.5, Min.std=0.1, Plot.Mfuzz=TRUE, path.result=NULL)
## Data simulation set.seed(33) DATAclustSIM <- matrix(rnorm(12*10*3, sd=0.2, mean=rep(c(rep(c(1, 6, 9, 4, 3, 1, 6.5, 0.7, 10), times=2), rep(c(2, 3.6, 3.7, 5, 7.9, 8, 7.5, 3.5, 3.4), times=2)), each=10)), nrow=30, ncol=12) DATAclustSIM <- floor(DATAclustSIM*100) ## colnames(DATAclustSIM) <- c("G1_t0_r1", "G1_t1_r1", "G1_t2_r1", "G1_t0_r2", "G1_t1_r2", "G1_t2_r2", "G2_t0_r3", "G2_t1_r3", "G2_t2_r3", "G2_t0_r4", "G2_t1_r4", "G2_t2_r4") ##------------------------------------------------------------------------## ## Plot the temporal expression of each individual graphics::matplot(t(rbind(DATAclustSIM[, 1:3], DATAclustSIM[, 4:6], DATAclustSIM[, 7:9], DATAclustSIM[, 10:12])), col=rep(c("black", "red"), each=6*10), xlab="Time", ylab="Gene expression", type=c("b"), pch=19) ##------------------------------------------------------------------------## ## Preprocessing step DATAclustSIM <- data.frame(DATAclustSIM) resDATAprepSE <- DATAprepSE(RawCounts=DATAclustSIM, Column.gene=NULL, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resMFUZZ <- MFUZZanalysis(SEresNorm=resNorm, DATAnorm=TRUE, DataNumberCluster=NULL, Membership=0.5, Min.std=0.1, Plot.Mfuzz=TRUE, path.result=NULL)
The function uses
stats::kmeans()
or
FactoMineR::HCPC()
in order to compute the number of cluster for the
Mfuzz::mfuzz()
analysis.
MFUZZclustersNumber( SEresNorm, DATAnorm = TRUE, Method = "hcpc", Max.clust = 3, Min.std = 0.1, Plot.Cluster = TRUE, path.result = NULL )
MFUZZclustersNumber( SEresNorm, DATAnorm = TRUE, Method = "hcpc", Max.clust = 3, Min.std = 0.1, Plot.Cluster = TRUE, path.result = NULL )
SEresNorm |
Results of the function
|
DATAnorm |
|
Method |
"kmeans" or "hcpc". The method used for selecting the number
of cluster to be used for the temporal cluster analysis (see |
Max.clust |
Integer strictly superior to 1 indicating
the maximum number of clusters. The default is |
Min.std |
Numeric positive value. All genes where their standard deviations are smaller than the threshold Min.std will be excluded. |
Plot.Cluster |
|
path.result |
Character or |
All results are built from the results of our function
DATAnormalization()
.
The Mfuzz
package works with datasets where rows correspond to genes
and columns correspond to times.
If RawCounts
(input of our function
DATAprepSE()
)
contains several replicates per time,
the algorithm computes the mean of replicates for each gene before using
Mfuzz::mfuzz()
.
When there are several biological conditions, the algorithm realizes
the Mfuzz::mfuzz()
analysis for each biological condition.
The kmeans method or the hierarchical clustering method,
respectively included in
stats::kmeans()
and
FactoMineR::HCPC()
,
is used in order to compute the optimal number of clusters.
If there are several biological conditions, the algorithm computes
one optimal number of clusters per biological condition.
The function returns the same SummarizedExperiment class object
SEresNorm
with the different elements below,
saved in the metadata Results[[1]][[4]]
of SEresNorm
,
the optimal number of clusters for each biological condition
(between 2 and Max.clust
).
a data.frame with () columns and
Max.clust
rows
with the number of biological conditions.
If Method="kmeans"
, the ith rows and the jth column correspond
to the within-cluster intertia (see tot.withinss
from
stats::kmeans()
)
dividing by the sum of the variance of each row of ExprData
of the (j-1)th biological condition computed by
stats::kmeans()
with i clusters.
When there is only one cluster, the within-cluster intertia
corresponds to the sum of the variance of each row of
ExprData
(see Details
).
The first column contains integers between 1 and Max.clust
which corresponds to the number of clusters selected for the
stats::kmeans()
analysis.
If Method="hcpc"
, the jth column correspond to the clustering
heights (see the output height
from
FactoMineR::HCPC()
)
dividing by the maximum value of height
.
The first column contains integers between 1 and Max.clust
which corresponds to the number of clusters selected for the
stats::kmeans()
analysis.
a plot which gives
If Method="kmeans"
, the evolution of the weighted
within-cluster intertia per number of clusters
(from 1 to Max.clust
) for each biological condition.
The optimal number of cluster for each biological condition
will be colored in blue.
If Method="hcpc"
, the evolution of the scaled height per
number of clusters (from 1 to Max.clust
)
for each biological condition.
The optimal number of cluster for each biological condition will be
colored in blue.
The function is called by
MFUZZanalysis()
.
## Data simulation set.seed(33) DATAclustSIM <- matrix(rnorm(12*10*3, sd=0.2, mean=rep(c(rep(c(1, 6, 9, 4, 3, 1, 6.5, 0.7, 10), times=2), rep(c(2, 3.6, 3.7, 5, 7.9, 8, 7.5, 3.5, 3.4), times=2)), each=10)), nrow=30, ncol=12) DATAclustSIM <- floor(DATAclustSIM*100) ## colnames(DATAclustSIM) <- c("G1_t0_r1", "G1_t1_r1", "G1_t2_r1", "G1_t0_r2", "G1_t1_r2", "G1_t2_r2", "G2_t0_r3", "G2_t1_r3", "G2_t2_r3", "G2_t0_r4", "G2_t1_r4", "G2_t2_r4") ##------------------------------------------------------------------------## ## Plot the temporal expression of each individual graphics::matplot(t(rbind(DATAclustSIM[, 1:3], DATAclustSIM[, 4:6], DATAclustSIM[, 7:9], DATAclustSIM[, 10:12])), col=rep(c("black", "red"), each=6*10), xlab="Time", ylab="Gene expression", type=c("b"), pch=19) ##------------------------------------------------------------------------## ## Preprocessing step DATAclustSIM <- data.frame(DATAclustSIM) resDATAprepSE <- DATAprepSE(RawCounts=DATAclustSIM, Column.gene=NULL, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resMFUZZcluster <- MFUZZclustersNumber(SEresNorm=resNorm, DATAnorm=FALSE, Method="hcpc", Max.clust=5, Plot.Cluster=TRUE, path.result=NULL)
## Data simulation set.seed(33) DATAclustSIM <- matrix(rnorm(12*10*3, sd=0.2, mean=rep(c(rep(c(1, 6, 9, 4, 3, 1, 6.5, 0.7, 10), times=2), rep(c(2, 3.6, 3.7, 5, 7.9, 8, 7.5, 3.5, 3.4), times=2)), each=10)), nrow=30, ncol=12) DATAclustSIM <- floor(DATAclustSIM*100) ## colnames(DATAclustSIM) <- c("G1_t0_r1", "G1_t1_r1", "G1_t2_r1", "G1_t0_r2", "G1_t1_r2", "G1_t2_r2", "G2_t0_r3", "G2_t1_r3", "G2_t2_r3", "G2_t0_r4", "G2_t1_r4", "G2_t2_r4") ##------------------------------------------------------------------------## ## Plot the temporal expression of each individual graphics::matplot(t(rbind(DATAclustSIM[, 1:3], DATAclustSIM[, 4:6], DATAclustSIM[, 7:9], DATAclustSIM[, 10:12])), col=rep(c("black", "red"), each=6*10), xlab="Time", ylab="Gene expression", type=c("b"), pch=19) ##------------------------------------------------------------------------## ## Preprocessing step DATAclustSIM <- data.frame(DATAclustSIM) resDATAprepSE <- DATAprepSE(RawCounts=DATAclustSIM, Column.gene=NULL, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resMFUZZcluster <- MFUZZclustersNumber(SEresNorm=resNorm, DATAnorm=FALSE, Method="hcpc", Max.clust=5, Plot.Cluster=TRUE, path.result=NULL)
The functions performs an automatic principal component analysis (PCA) from a gene expression dataset where samples can belong to different biological conditions and/or time points.
PCAanalysis( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Plot.PCA = TRUE, Mean.Accross.Time = FALSE, Color.Group = NULL, Phi = 25, Theta = 140, epsilon = 0.2, Cex.point = 0.7, Cex.label = 0.7, motion3D = FALSE, path.result = NULL, Name.folder.pca = NULL )
PCAanalysis( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Plot.PCA = TRUE, Mean.Accross.Time = FALSE, Color.Group = NULL, Phi = 25, Theta = 140, epsilon = 0.2, Cex.point = 0.7, Cex.label = 0.7, motion3D = FALSE, path.result = NULL, Name.folder.pca = NULL )
SEresNorm |
Results of the function
|
DATAnorm |
|
gene.deletion |
|
sample.deletion |
|
Plot.PCA |
|
Mean.Accross.Time |
|
Color.Group |
|
Phi |
Angle defining the colatitude direction for the 3D PCA plot
(see |
Theta |
Angle defining the azimuthal direction for the 3D PCA plot
(see |
epsilon |
Non negative numeric value giving the length between points
and their labels in all PCA plots which are not automatically plotted
by |
Cex.point |
Non negative numeric value giving the size of points
in all PCA plots which are not automatically plotted by
|
Cex.label |
Non negative numeric value giving the size of
the labels associated to each point of the all PCA graphs
which are not automatically plotted by
|
motion3D |
|
path.result |
Character or |
Name.folder.pca |
Character or |
All results are built from the results of our function
DATAnormalization()
.
The function returns the same SummarizedExperiment class object
SEresNorm
with the outputs from the function
FactoMineR::PCA()
,
and several 2D and 3D PCA graphs depending on the experimental design
(if Plot.PCA=TRUE
),
saved in the metadata Results[[1]][[2]]
of SEresNorm
,
When samples belong only to different biological conditions,
the function returns a 2D and two 3D PCA graphs.
In each graph, samples are colored with different colors for different
biological conditions. The two 3D PCA graphs are identical but one of them
will be opened in a rgl window
(see plot3Drgl::plotrgl()
)
and it allows to interactively rotate and zoom.
When samples belong only to different time points, the function returns
One 2D PCA graph, one 3D PCA graph and the same 3D PCA graph
in a rgl window where samples are colored with different colors
for different time points.
Furthermore, lines are drawn between each pair of consecutive points
for each sample (if Mean.Accross.Time=FALSE
,
otherwise it will be only between means).
The same graphs describe above but without lines.
When samples belong to different time points and different biological conditions, the function returns
One 2D PCA graph, one 3D PCA graph and the same 3D PCA graph
in a rgl window where samples are colored with different colors
for different time points.
Furthermore, lines are drawn between each pair of consecutive points
for each sample (if Mean.Accross.Time=FALSE
,
otherwise it will be only between means).
The same graphs describe above but without lines.
The same six following graphs for each biological condition
(one PCA analysis per biological condition).
One 2D PCA graph, one 3D PCA graph and the same 3D PCA graph
in a rgl window where samples belong to only one biological condition
and are colored with different colors for different time points.
Furthermore, lines are drawn between each pair of consecutive points
for each sample (if Mean.Accross.Time=FALSE
,
otherwise it will be only between means).
The three others graphs are identical to the three previous ones
but without lines.
The interactive 3D graphs will be plotted only if motion3D=TRUE
.
The function calls the R functions
PCAgraphics()
and
ColnamesToFactors()
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ## Color for each group GROUPcolor <- data.frame(Name=c("G1", "G2"), Col=c("black", "red")) ##------------------------------------------------------------------------## resPCAanalysis <- PCAanalysis(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c("Gene1", "Gene5"), sample.deletion=c(2, 6), Plot.PCA=TRUE, Mean.Accross.Time=FALSE, Color.Group=GROUPcolor, motion3D=FALSE, Phi=25, Theta=140, Cex.label=0.7, Cex.point=0.7, epsilon=0.2, path.result=NULL, Name.folder.pca=NULL)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ## Color for each group GROUPcolor <- data.frame(Name=c("G1", "G2"), Col=c("black", "red")) ##------------------------------------------------------------------------## resPCAanalysis <- PCAanalysis(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c("Gene1", "Gene5"), sample.deletion=c(2, 6), Plot.PCA=TRUE, Mean.Accross.Time=FALSE, Color.Group=GROUPcolor, motion3D=FALSE, Phi=25, Theta=140, Cex.label=0.7, Cex.point=0.7, epsilon=0.2, path.result=NULL, Name.folder.pca=NULL)
The function plots 2D and 3D PCA using the function
PCArealization()
which realizes a PCA analysis. This function is called repeatedly by the
function PCAanalysis()
if samples belong to different biological conditions and time points.
PCAgraphics( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Plot.PCA = TRUE, Mean.Accross.Time = FALSE, Color.Group = NULL, motion3D = FALSE, Phi = 25, Theta = 140, epsilon = 0.2, Cex.point = 0.7, Cex.label = 0.7, path.result = NULL, Name.file.pca = NULL )
PCAgraphics( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Plot.PCA = TRUE, Mean.Accross.Time = FALSE, Color.Group = NULL, motion3D = FALSE, Phi = 25, Theta = 140, epsilon = 0.2, Cex.point = 0.7, Cex.label = 0.7, path.result = NULL, Name.file.pca = NULL )
SEresNorm |
Results of the function
|
DATAnorm |
|
gene.deletion |
|
sample.deletion |
|
Plot.PCA |
|
Mean.Accross.Time |
|
Color.Group |
|
motion3D |
|
Phi |
Angle defining the colatitude direction for the 3D PCA plot
(see |
Theta |
Angle defining the azimuthal direction for the 3D PCA plot
(see |
epsilon |
Non negative numeric value giving the length between points
and their labels in all PCA plots which are not automatically plotted
by |
Cex.point |
Non negative numeric value giving the size of points
in all PCA plots which are not automatically plotted by
|
Cex.label |
Non negative numeric value giving the size of
the labels associated to each point of the all PCA graphs
which are not automatically plotted by
|
path.result |
Character or |
Name.file.pca |
Character or |
All results are built from the results of our function
DATAnormalization()
.
The function returns the same SummarizedExperiment class object
SEresNorm
with the outputs from the function
FactoMineR::PCA()
,
and plots several 2D and 3D PCA graphs depending
on the experimental design (if Plot.PCA=TRUE
),
saved in the metadata Results[[1]][[2]]
of SEresNorm
,
When samples belong only to different biological conditions,
the function returns a 2D and two 3D PCA graphs.
In each graph, samples are colored with different colors for different
biological conditions. The two 3D PCA graphs are identical but
one of them will be opened in a rgl window
(see plot3Drgl::plotrgl()
)
and it allows to interactively rotate and zoom.
When samples belong only to different time points, the function returns
One 2D PCA graph, one 3D PCA graph and the same 3D PCA graph
in a rgl window where samples are colored with different colors
for different time points.
Furthermore, lines are drawn between each pair of consecutive points
for each sample (if Mean.Accross.Time=FALSE
,
otherwise it will be only between means).
The same graphs describe above but without lines.
When samples belong to different time points and different biological conditions, the function returns
One 2D PCA graph, one 3D PCA graph and the same 3D PCA graph
in a rgl window where samples are colored with different colors
for different time points.
Furthermore, lines are drawn between each pair of consecutive points
for each sample (if Mean.Accross.Time=FALSE
,
otherwise it will be only between means).
The same graphs describe above but without lines.
The same six following graphs for each biological condition
(one PCA analysis per biological condition).
One 2D PCA graph, one 3D PCA graph and the same 3D PCA graph
in a rgl window where samples belong to only one biological condition
and are colored with different colors for different time points.
Furthermore, lines are drawn between each pair of consecutive points
for each sample (if Mean.Accross.Time=FALSE
,
otherwise it will be only between means).
The three others graphs are identical to the three previous ones
but without lines.
The interactive 3D graphs will be plotted only if motion3D=TRUE
.
This function is called by our function
PCAanalysis()
and calls our function
PCArealization()
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ## Color for each group GROUPcolor <- data.frame(Name=c("G1", "G2"), Col=c("black", "red")) ##------------------------------------------------------------------------## resPCAgraph <- PCAgraphics(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c("Gene1", "Gene5"), sample.deletion=c(2,6), Plot.PCA=TRUE, Mean.Accross.Time=FALSE, Color.Group=GROUPcolor, motion3D=FALSE, Phi=25, Theta=140, Cex.label=0.7, Cex.point=0.7, epsilon=0.2, path.result=NULL, Name.file.pca=NULL)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ## Color for each group GROUPcolor <- data.frame(Name=c("G1", "G2"), Col=c("black", "red")) ##------------------------------------------------------------------------## resPCAgraph <- PCAgraphics(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c("Gene1", "Gene5"), sample.deletion=c(2,6), Plot.PCA=TRUE, Mean.Accross.Time=FALSE, Color.Group=GROUPcolor, motion3D=FALSE, Phi=25, Theta=140, Cex.label=0.7, Cex.point=0.7, epsilon=0.2, path.result=NULL, Name.file.pca=NULL)
The function generates a SummarizedExperiment class object
containing the dataset reshaped from the original dataset,
to be used by the function
FactoMineR::PCA()
,
which performs the Principal Component Analysis (PCA).
This function is called by the function
PCArealization()
,
which also calls the function
FactoMineR::PCA()
.
PCApreprocessing(SEresNorm, DATAnorm = TRUE)
PCApreprocessing(SEresNorm, DATAnorm = TRUE)
SEresNorm |
Results of the function
|
DATAnorm |
|
All results are built from the results of our function
DATAnormalization()
.
The function returns the same SummarizedExperiment class object
SEresNorm
with the different elements below
information for the functions
PCArealization()
and PCAgraphics()
a reshape of the originally dataset for the PCA analysis
(realized by the function PCArealization()
)
saved in the metadata Results[[1]][[2]]
of SEresNorm
.
The reshaped dataset which corresponds to a data.frame with
() columns and
rows, where
is the number of genes,
is the number of samples and
if samples belong to different biological condition or
time points.
In that case, the first column will contain the biological condition
or the time point associated to each sample.
if samples belong to different biological condition
and time points.
In that case, the first column will contain the biological condition
and the second column the time point associated to each sample.
The other columns form a sub data.frame which is a transpose of
the data.frame composed of the
numeric columns of
ExprData
.
The function is called by our function
PCArealization()
and uses our function
DATAnormalization()
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resPCAdata <- PCApreprocessing(SEresNorm=resNorm, DATAnorm=TRUE)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resPCAdata <- PCApreprocessing(SEresNorm=resNorm, DATAnorm=TRUE)
From a gene expression dataset, the functions performs
the Principal Component Analysis (PCA) through the R function
FactoMineR::PCA()
.
PCArealization( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Supp.del.sample = FALSE )
PCArealization( SEresNorm, DATAnorm = TRUE, gene.deletion = NULL, sample.deletion = NULL, Supp.del.sample = FALSE )
SEresNorm |
Results of the function
|
DATAnorm |
|
gene.deletion |
|
sample.deletion |
|
Supp.del.sample |
|
All results are built from the results of our function
DATAnormalization()
.
The function returns the same SummarizedExperiment class object
SEresNorm
but with the output of the
FactoMineR::PCA()
function (see
FactoMineR::PCA()
)
saved in the metadata Results[[1]][[2]]
of SEresNorm
.
The PCArealization()
function
is used by the following functions of our package :
PCAanalysis()
and
HCPCanalysis()
.
calls the R function
PCApreprocessing()
for reshaping the data and
uses its output for performing a Principal Component (PCA)
with
FactoMineR::PCA()
.
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resPCAex <- PCArealization(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c(3, 5), sample.deletion=c("G1_t0_Ind2", "G1_t1_Ind3"), Supp.del.sample=FALSE) ##------------------------------------------------------------------------## resPCAex2 <- PCArealization(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c("Gene3", "Gene5"), sample.deletion=c(3, 8), Supp.del.sample=TRUE)
## Simulation raw counts resSIMcount <- RawCountsSimulation(Nb.Group=2, Nb.Time=3, Nb.per.GT=4, Nb.Gene=10) ## Preprocessing step resDATAprepSE <- DATAprepSE(RawCounts=resSIMcount$Sim.dat, Column.gene=1, Group.position=1, Time.position=2, Individual.position=3) ## Normalization resNorm <- DATAnormalization(SEres=resDATAprepSE, Normalization="rle", Plot.Boxplot=FALSE, Colored.By.Factors=FALSE) ##------------------------------------------------------------------------## resPCAex <- PCArealization(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c(3, 5), sample.deletion=c("G1_t0_Ind2", "G1_t1_Ind3"), Supp.del.sample=FALSE) ##------------------------------------------------------------------------## resPCAex2 <- PCArealization(SEresNorm=resNorm, DATAnorm=TRUE, gene.deletion=c("Gene3", "Gene5"), sample.deletion=c(3, 8), Supp.del.sample=TRUE)
There are 4 groups : samples with or without hyper activation of the gene NOTTCH1 (N1ha versus N1wt) and with or without knock out of the gene TCF1 (T1ko versus T1wt). The original dataset has 39017 genes but we kept only 500 genes in order to increase the speed of each function in our algorithm.
data(RawCounts_Antoszewski2022_MOUSEsub500)
data(RawCounts_Antoszewski2022_MOUSEsub500)
A data frame with 500 rows (genes) and 13 columns (samples). The column names are as follow
ENSEMBL gene names.
The sample is the first replica (r1) of the biological condition N1wt and T1wt.
The sample is the second replica (r2) of the biological condition N1wt and T1wt.
The sample is the third replica (r3) of the biological condition N1wt and T1wt.
The sample is the first replica (r4) of the biological condition N1ha and T1wt.
The sample is the second replica (r5) of the biological condition N1ha and T1wt.
The sample is the third replica (r6) of the biological condition N1ha and T1wt.
The sample is the first replica (r7) of the biological condition N1ha and T1ko.
The sample is the second replica (r8) of the biological condition N1ha and T1ko.
The sample is the third replica (r9) of the biological condition N1ha and T1ko.
The sample is the first replica (r10) of the biological condition N1wt and T1ko.
The sample is the second replica (r11) of the biological condition N1wt and T1ko.
The sample is the third replica (r12) of the biological condition N1wt and T1ko.
The following is quoted from the GEO series GSE169116 (link in source):
Summary : "NOTCH1 is a well-established lineage specifier for T cells and among the most frequently mutated genes throughout all subclasses of T cell acute lymphoblastic leukemia (T-ALL). How oncogenic NOTCH1 signaling launches a leukemia-prone chromatin landscape during T-ALL initiation is unknown. Here we demonstrate an essential role for the high-mobility-group transcription factor Tcf1 in orchestrating chromatin accessibility and topology allowing for aberrant Notch1 signaling to convey its oncogenic function. Although essential, Tcf1 is not sufficient to initiate leukemia. The formation of a leukemia-prone landscape at the distal Notch1-regulated Myc enhancer, which is fundamental to this disease, is Tcf1-dependent and occurs within the earliest progenitor stage even before cells adopt a T lymphocyte or leukemic fate. Moreover, we discovered an additional evolutionarily conserved Tcf1-regulated enhancer element, in the distal Myc-enhancer, which is important for the transition of pre-leukemic cells to full-blown disease."
Overall design: "Expression profile comparisons of sorted LSK derived from C57BL/6J; Sv/129 compound mice with Notch1 induced or Tcf1 knocked-down."
We kept 500 genes only in order to increase the speed for each example.
Mouse dataset with four biological conditions.
This dataset comes from Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE169116. The name of the samples was renamed in order to be used with our package.
Antoszewski M, Fournier N, Ruiz Buendía GA, Lourenco J et al. 'Tcf1 is essential for initiation of oncogenic Notch1-driven chromatin topology in T-ALL'. Blood 2022 Jan 12. PMID:35020836. GEO:GSE169116.
data(RawCounts_Antoszewski2022_MOUSEsub500)
data(RawCounts_Antoszewski2022_MOUSEsub500)
Raw counts data for fission yeast RNA-Seq experiment with two groups (wt and mut), 6 times (0, 15min, 30min, 60min, 120min, 180min) and 3 replicates for each group and time. The original dataset has 7039 genes but we kept only 500 genes in order to increase the speed of each function in our algorithm.
data(RawCounts_Leong2014_FISSIONsub500wt)
data(RawCounts_Leong2014_FISSIONsub500wt)
A data frame with 500 rows (genes) and 37 columns (samples). The column names are as follow
Gene name
The sample is the first replica (r1) of the biological condition control (wt) at time t0 (0 min)
The sample is the second replica (r2) of the biological condition control (wt) at time t0 (0 min)
The sample is the third replica (r3) of the biological condition control (wt) at time t0 (0 min)
The sample is the first replica (r1) of the biological condition control (wt) at time t1 (15 min)
The sample is the second replica (r2) of the biological condition control (wt) at time t1 (15 min)
The sample is the third replica (r3) of the biological condition control (wt) at time t1 (15 min)
The sample is the first replica (r1) of the biological condition control (wt) at time t2 (30 min)
The sample is the second replica (r2) of the biological condition control (wt) at time t2 (30 min)
The sample is the third replica (r3) of the biological condition control (wt) at time t2 (30 min)
The sample is the first replica (r1) of the biological condition control (wt) at time t3 (60 min)
The sample is the second replica (r2) of the biological condition control (wt) at time t3 (60 min)
The sample is the third replica (r3) of the biological condition control (wt) at time t3 (60 min)
The sample is the first replica (r1) of the biological condition control (wt) at time t4 (120 min)
The sample is the second replica (r2) of the biological condition control (wt) at time t4 (120 min)
The sample is the third replica (r3) of the biological condition control (wt) at time t4 (120 min)
The sample is the first replica (r1) of the biological condition control (wt) at time t5 (180 min)
The sample is the second replica (r2) of the biological condition control (wt) at time t5 (180 min)
The sample is the third replica (r3) of the biological condition control (wt) at time t5 (180 min)
The following is quoted from the GEO series GSE56761 (link in source):
Summary: "Mitogen Activated Protein Kinase (MAPK) signaling cascades transduce information arising from events external to the cell, such as environmental stresses, to a variety of downstream effectors and transcription factors. The fission yeast stress activated MAP kinase (SAPK) pathway is conserved with the p38 and JNK pathways in humans, and comprises the MAPKKKs Win1, Wis4, the MAPKK Wis1, and the MAPK, Sty1. Sty1 and its main downstream effector Atf1 regulate a large set of core environmental stress response genes. The fission yeast genome encodes three other ATF proteins: Atf21, Atf31 and Pcr1. Among these, atf21 is specifically induced under conditions of high osmolarity. We have therefore instigated a programme to investigate the role played by non coding RNAs (ncRNAs) in response to osmotic stress challenge in wild type and atf21Delta cells. By integrating global proteomics and RNA sequencing data, we identified a systematic program in which elevated antisense RNAs arising both from ncRNAs and from 3'-overlapping convergent gene-pairs is directly associated with substantial reductions in protein levels throughout the fission yeast genome. We also found an xtensive array of ncRNAs with trans associations that have the potential to influence different biological processes and stress responses in fission yeast, suggesting ncRNAs comprise additional components of the SAPK regulatory system".
Overall design: "Global transcription profiles of fission yeast wild type (WT) and atf21del strains over an osmotic stress time course following treatment with 1M sorbitol at 0, 15, 30, 60, 120 and 180 mins. Strand-specific single end sequencing of total RNA was performed in biological triplicates on the Applied Biosystems SOLiD 5500xl Genetic Analyzer System".
We kept 500 genes only in order to increase the speed for each example.
Yeast dataset with 6 time measurements.
This dataset can be found in the R Package fission. https://bioconductor.org/packages/release/data/experiment/html/fission.html Link of GEO series GSE56761: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE56761. The name of the samples was renamed in order to be used with our package.
Leong HS, Dawson K, Wirth C, Li Y et al. 'A global non-coding RNA system modulates fission yeast protein levels in response to stress'. Nat Commun 2014 May 23;5:3947. PMID:24853205. GEO:GSE56761.
data(RawCounts_Leong2014_FISSIONsub500wt)
data(RawCounts_Leong2014_FISSIONsub500wt)
This time series count data (read counts) represents the temporal transcriptional response of primary human chronic lymphocytic leukemia (CLL)-cells after B-cell receptor stimulation. There are 9 time points (before stimulation (0h) and at the time points 1h, 1h30, 3h30, 6h30, 12h, 24h, 48h and 96h after cell stimulation) and samples are divided in two groups : Proliferating (P) and Non Proliferating (NP). There are also 3 replicates for a time and biological condition. The original dataset has 25369 genes but we kept only 500 genes in order to increase the speed of each function in our algorithm.
data(RawCounts_Schleiss2021_CLLsub500)
data(RawCounts_Schleiss2021_CLLsub500)
A data frame with 500 rows (genes) and 55 columns (samples). The column names are as follow
Symbol gene name.
The sample is the first replica (r1) of the biological condition control (P) at time t0 (0h)
The sample is the first replica (r1) of the biological condition control (P) at time t1 (1h)
The sample is the first replica (r1) of the biological condition control (P) at time t2 (1h30)
The sample is the first replica (r1) of the biological condition control (P) at time t3 (3h30)
The sample is the first replica (r1) of the biological condition control (P) at time t4 (6h30)
The sample is the first replica (r1) of the biological condition control (P) at time t5 (12h)
The sample is the first replica (r1) of the biological condition control (P) at time t6 (24h)
The sample is the first replica (r1) of the biological condition control (P) at time t7 (48h)
The sample is the first replica (r1) of the biological condition control (P) at time t8 (96h)
The sample is the second replica (r2) of the biological condition control (P) at time t0 (0h)
The sample is the second replica (r2) of the biological condition control (P) at time t1 (1h)
The sample is the second replica (r2) of the biological condition control (P) at time t2 (1h30)
The sample is the second replica (r2) of the biological condition control (P) at time t3 (3h30)
The sample is the second replica (r2) of the biological condition control (P) at time t4 (6h30)
The sample is the second replica (r2) of the biological condition control (P) at time t5 (12h)
The sample is the second replica (r2) of the biological condition control (P) at time t6 (24h)
The sample is the second replica (r2) of the biological condition control (P) at time t7 (48h)
The sample is the second replica (r2) of the biological condition control (P) at time t8 (96h)
The sample is the third replica (r3) of the biological condition control (P) at time t0 (0h)
The sample is the third replica (r3) of the biological condition control (P) at time t1 (1h)
The sample is the third replica (r3) of the biological condition control (P) at time t2 (1h30)
The sample is the third replica (r3) of the biological condition control (P) at time t3 (3h30)
The sample is the third replica (r3) of the biological condition control (P) at time t4 (6h30)
The sample is the third replica (r3) of the biological condition control (P) at time t5 (12h)
The sample is the third replica (r3) of the biological condition control (P) at time t6 (24h)
The sample is the third replica (r3) of the biological condition control (P) at time t7 (48h)
The sample is the third replica (r3) of the biological condition control (P) at time t8 (96h)
The sample is the first replica (r4) of the biological condition control (NP) at time t0 (0h)
The sample is the first replica (r4) of the biological condition control (NP) at time t1 (1h)
The sample is the first replica (r4) of the biological condition control (NP) at time t2 (1h30)
The sample is the first replica (r4) of the biological condition control (NP) at time t3 (3h30)
The sample is the first replica (r4) of the biological condition control (NP) at time t4 (6h30)
The sample is the first replica (r4) of the biological condition control (NP) at time t5 (12h)
The sample is the first replica (r4) of the biological condition control (NP) at time t6 (24h)
The sample is the first replica (r4) of the biological condition control (NP) at time t7 (48h)
The sample is the first replica (r4) of the biological condition control (NP) at time t8 (96h)
The sample is the second replica (r5) of the biological condition control (NP) at time t0 (0h)
The sample is the second replica (r5) of the biological condition control (NP) at time t1 (1h)
The sample is the second replica (r5) of the biological condition control (NP) at time t2 (1h30)
The sample is the second replica (r5) of the biological condition control (NP) at time t3 (3h30)
The sample is the second replica (r5) of the biological condition control (NP) at time t4 (6h30)
The sample is the second replica (r5) of the biological condition control (NP) at time t5 (12h)
The sample is the second replica (r5) of the biological condition control (NP) at time t6 (24h)
The sample is the second replica (r5) of the biological condition control (NP) at time t7 (48h)
The sample is the second replica (r5) of the biological condition control (NP) at time t8 (96h)
The sample is the third replica (r6) of the biological condition control (NP) at time t0 (0h)
The sample is the third replica (r6) of the biological condition control (NP) at time t1 (1h)
The sample is the third replica (r6) of the biological condition control (NP) at time t2 (1h30)
The sample is the third replica (r6) of the biological condition control (NP) at time t3 (3h30)
The sample is the third replica (r6) of the biological condition control (NP) at time t4 (6h30)
The sample is the third replica (r6) of the biological condition control (NP) at time t5 (12h)
The sample is the third replica (r6) of the biological condition control (NP) at time t6 (24h)
The sample is the third replica (r6) of the biological condition control (NP) at time t7 (48h)
The sample is the third replica (r6) of the biological condition control (NP) at time t8 (96h)
The following is quoted from the GEO series GSE130385 (link in source):
Summary: "The B-cell receptor (BCR) signaling is crucial for the pathophysiology of most leukemias and lymphomas originated from mature B lymphocytes and has emerged as a new therapeutic target, especially for chronic lymphocytic leukemia (CLL). However, the precise mechanisms by which BCR signaling controls neoplastic B-cell proliferation are ill characterized. This work was performed using primary leukemic cells of untreated patients at initial stage of CLL (Binet stage A / Rai 0) presenting biological characteristics of aggressive form of the disease (unmutated IGHV genes and ZAP70 protein expression). In order to mimic the primary leukemogenic step occurring in vivo, this study focused on the BCR-dependent proliferation of CLL cells induced ex vivo using anti-IgM, together with mandatory co-stimulating factors (CD40L, IL-4 and IL-21) (Schleiss, Sci Rep, 2019). Cell proliferation was objectivized by the emergence of proliferative clusters and the presence of more than 25% of CLL cells that did undergo division within the cell culture at day 6. To capture the specific actors of the proliferative response in these samples, we also included non-proliferating control CLL samples. Gene expression was analyzed by RNA-seq before stimulation (T0) and at the time points 1h, 1h30, 3h30, 6h30, 12h, 24h, 48h and 96h after cell stimulation (n=54 data points), the latest time points corresponding to the emergence of the proliferation clusters."
Overall design: "Temporal transcriptional response (T0 + 8 time points) of primary chronic lymphocytic leukemia (CLL) cells after BCR engagement ex vivo (anti-IgM, IL-4, CD40Ligand and IL-21) of 3 Proliferating (P1, P2, P3) and 3 Non Proliferating samples (NP1, NP2, NP3)".
Human CCL times series dataset with two biological conditions and with 9 time measurements.
This dataset comes from Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130385. I rewrite the name of the sample in order to be used with my package.
Schleiss C, Carapito R, Fornecker LM, Muller L et al. 'Temporal multiomic modeling reveals a B-cell receptor proliferative program in chronic lymphocytic leukemia'. Leukemia 2021 May;35(5):1463-1474. PMID:33833385. GEO:GSE130385
data(RawCounts_Schleiss2021_CLLsub500)
data(RawCounts_Schleiss2021_CLLsub500)
This time series count data (read counts) represents the temporal transcriptional response (six time measurements across the course of a day) of Bmal1 wild-type (WT) and Cry1/2 WT, Bmal1 KO and Cry1/2 WT, Bmal1 (WT) and Cry1/2 KO, and Bmal1 KO and Cry1/2 KO mice under an ad libitum (AL) or night restricted feeding (RF) regimen. Therefore, there are eight biological conditions. As there are only two mice per biological condition, we will not consider the effect of the regimen. The original dataset has 40327 genes but we kept only 500 genes in order to increase the speed of each function in our algorithm.
data(RawCounts_Weger2021_MOUSEsub500)
data(RawCounts_Weger2021_MOUSEsub500)
A data frame with 500 rows (genes) and 97 columns (samples). The column names are as follow
ENSEMBL gene names.
The sample is the first replica (r1) of the biological condition Bmal1 and KO at time t0 (00h).
The sample is the first replica (r1) of the biological condition Bmal1 and KO at time t1 (04h).
The sample is the first replica (r1) of the biological condition Bmal1 and KO at time t2 (08h).
The sample is the first replica (r1) of the biological condition Bmal1 and KO at time t3 (12h).
The sample is the first replica (r1) of the biological condition Bmal1 and KO at time t4 (16h).
The sample is the first replica (r1) of the biological condition Bmal1 and KO at time t5 (20h).
The sample is the first replica (r2) of the biological condition Bmal1 and KO at time t0 (00h).
The sample is the first replica (r2) of the biological condition Bmal1 and KO at time t1 (04h).
The sample is the first replica (r2) of the biological condition Bmal1 and KO at time t2 (08h).
The sample is the first replica (r2) of the biological condition Bmal1 and KO at time t3 (12h).
The sample is the first replica (r2) of the biological condition Bmal1 and KO at time t4 (16h).
The sample is the first replica (r2) of the biological condition Bmal1 and KO at time t5 (20h).
The sample is the first replica (r3) of the biological condition Bmal1 and KO at time t0 (00h).
The sample is the first replica (r3) of the biological condition Bmal1 and KO at time t1 (04h).
The sample is the first replica (r3) of the biological condition Bmal1 and KO at time t2 (08h).
The sample is the first replica (r3) of the biological condition Bmal1 and KO at time t3 (12h).
The sample is the first replica (r3) of the biological condition Bmal1 and KO at time t4 (16h).
The sample is the first replica (r3) of the biological condition Bmal1 and KO at time t5 (20h).
The sample is the first replica (r4) of the biological condition Bmal1 and KO at time t0 (00h).
The sample is the first replica (r4) of the biological condition Bmal1 and KO at time t1 (04h).
The sample is the first replica (r4) of the biological condition Bmal1 and KO at time t2 (08h).
The sample is the first replica (r4) of the biological condition Bmal1 and KO at time t3 (12h).
The sample is the first replica (r4) of the biological condition Bmal1 and KO at time t4 (16h).
The sample is the first replica (r4) of the biological condition Bmal1 and KO at time t5 (20h).
The sample is the first replica (r5) of the biological condition Bmal1 and wild-type at time t0 (00h).
The sample is the first replica (r5) of the biological condition Bmal1 and wild-type at time t1 (04h).
The sample is the first replica (r5) of the biological condition Bmal1 and wild-type at time t2 (08h).
The sample is the first replica (r5) of he biological condition Bmal1 and wild-type at time t3 (12h).
The sample is the first replica (r5) of the biological condition Bmal1 and wild-type at time t4 (16h).
The sample is the first replica (r5) of the biological condition Bmal1 and wild-type at time t5 (20h).
The sample is the first replica (r6) of the biological condition Bmal1 and wild-type at time t0 (00h).
The sample is the first replica (r6) of the biological condition Bmal1 and wild-type at time t1 (04h).
The sample is the first replica (r6) of the biological condition Bmal1 and wild-type at time t2 (08h).
The sample is the first replica (r6) of the biological condition Bmal1 and wild-type at time t3 (12h).
The sample is the first replica (r6) of the biological condition Bmal1 and wild-type at time t4 (16h).
The sample is the first replica (r6) of the biological condition Bmal1 and wild-type at time t5 (20h).
The sample is the first replica (r7) of the biological condition Bmal1 and wild-type at time t0 (00h).
The sample is the first replica (r7) of the biological condition Bmal1 and wild-type at time t1 (04h).
The sample is the first replica (r7) of the biological condition Bmal1 and wild-type at time t2 (08h).
The sample is the first replica (r7) of the biological condition Bmal1 and wild-type at time t3 (12h).
The sample is the first replica (r7) of the biological condition Bmal1 and wild-type at time t4 (16h).
The sample is the first replica (r7) of the biological condition Bmal1 and wild-type at time t5 (20h).
The sample is the first replica (r8) of the biological condition Bmal1 and wild-type at time t0 (00h).
The sample is the first replica (r8) of the biological condition Bmal1 and wild-type at time t1 (04h).
The sample is the first replica (r8) of the biological condition Bmal1 and wild-type at time t2 (08h).
The sample is the first replica (r8) of the biological condition Bmal1 and wild-type at time t3 (12h).
The sample is the first replica (r8) of the biological condition Bmal1 and wild-type at time t4 (16h).
The sample is the first replica (r8) of the biological condition Bmal1 and wild-type at time t5 (20h).
The sample is the first replica (r9) of the biological condition Cry1/2 and KO at time t0 (00h).
The sample is the first replica (r9) of the biological condition Cry1/2 and KO at time t1 (04h).
The sample is the first replica (r9) of the biological condition Cry1/2 and KO at time t2 (08h).
The sample is the first replica (r9) of the biological condition Cry1/2 and KO at time t3 (12h).
The sample is the first replica (r9) of the biological condition Cry1/2 and KO at time t4 (16h).
The sample is the first replica (r9) of the biological condition Cry1/2 and KO at time t5 (20h).
The sample is the first replica (r10) of the biological condition Cry1/2 and KO at time t0 (00h).
The sample is the first replica (r10) of the biological condition Cry1/2 and KO at time t1 (04h).
The sample is the first replica (r10) of the biological condition Cry1/2 and KO at time t2 (08h).
The sample is the first replica (r10) of the biological condition Cry1/2 and KO at time t3 (12h).
The sample is the first replica (r10) of the biological condition Cry1/2 and KO at time t4 (16h).
The sample is the first replica (r10) of the biological condition Cry1/2 and KO at time t5 (20h).
The sample is the first replica (r11) of the biological condition Cry1/2 and KO at time t0 (00h).
The sample is the first replica (r11) of the biological condition Cry1/2 and KO at time t1 (04h).
The sample is the first replica (r11) of the biological condition Cry1/2 and KO at time t2 (08h).
The sample is the first replica (r11) of the biological condition Cry1/2 and KO at time t3 (12h).
The sample is the first replica (r11) of the biological condition Cry1/2 and KO at time t4 (16h).
The sample is the first replica (r11) of the biological condition Cry1/2 and KO at time t5 (20h).
The sample is the first replica (r12) of the biological condition Cry1/2 and KO at time t0 (00h).
The sample is the first replica (r12) of the biological condition Cry1/2 and KO at time t1 (04h).
The sample is the first replica (r12) of the biological condition Cry1/2 and KO at time t2 (08h).
The sample is the first replica (r12) of the biological condition Cry1/2 and KO at time t3 (12h).
The sample is the first replica (r12) of the biological condition Cry1/2 and KO at time t4 (16h).
The sample is the first replica (r12) of the biological condition Cry1/2 and KO at time t5 (20h).
The sample is the first replica (r13) of the biological condition Cry1/2 and wild-type at time t0 (00h).
The sample is the first replica (r13) of the biological condition Cry1/2 and wild-type at time t1 (04h).
The sample is the first replica (r13) of the biological condition Cry1/2 and wild-type at time t2 (08h).
The sample is the first replica (r13) of the biological condition Cry1/2 and wild-type at time t3 (12h).
The sample is the first replica (r13) of the biological condition Cry1/2 and wild-type at time t4 (16h).
The sample is the first replica (r13) of the biological condition Cry1/2 and wild-type at time t5 (20h).
The sample is the first replica (r14) of the biological condition Cry1/2 and wild-type at time t0 (00h).
The sample is the first replica (r14) of the biological condition Cry1/2 and wild-type at time t1 (04h).
The sample is the first replica (r14) of the biological condition Cry1/2 and wild-type at time t2 (08h).
The sample is the first replica (r14) of the biological condition Cry1/2 and wild-type at time t3 (12h).
The sample is the first replica (r14) of the biological condition Cry1/2 and wild-type at time t4 (16h).
The sample is the first replica (r14) of the biological condition Cry1/2 and wild-type at time t5 (20h).
The sample is the first replica (r15) of the biological condition Cry1/2 and wild-type at time t0 (00h).
The sample is the first replica (r15) of the biological condition Cry1/2 and wild-type at time t1 (04h).
The sample is the first replica (r15) of the biological condition Cry1/2 and wild-type at time t2 (08h).
The sample is the first replica (r15) of the biological condition Cry1/2 and wild-type at time t3 (12h).
The sample is the first replica (r15) of the biological condition Cry1/2 and wild-type at time t4 (16h).
The sample is the first replica (r15) of the biological condition Cry1/2 and wild-type at time t5 (20h).
The sample is the first replica (r16) of the biological condition Cry1/2 and wild-type at time t0 (00h).
The sample is the first replica (r16) of the biological condition Cry1/2 and wild-type at time t1 (04h).
The sample is the first replica (r16) of the biological condition Cry1/2 and wild-type at time t2 (08h).
The sample is the first replica (r16) of the biological condition Cry1/2 and wild-type at time t3 (12h).
The sample is the first replica (r16) of the biological condition Cry1/2 and wild-type at time t4 (16h).
The sample is the first replica (r16) of the biological condition Cry1/2 and wild-type at time t5 (20h).
The data is used in order to describe our algorithm in the case where samples belong to different time points.
We kept 500 genes only in order to increase the speed for each example.
Mouse times series dataset with four biological conditions and with 6 time measurements.
This dataset comes from Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE135898. The name of the samples was renamed in order to be used with our package.
Weger BD, Gobet C, David FPA, Atger F et al. 'Systematic analysis of differential rhythmic liver gene expression mediated by the circadian clock and feeding rhythms'. Proc Natl Acad Sci USA 2021 Jan 19;118(3). PMID:33452134. GEO:GSE135898.
data(RawCounts_Weger2021_MOUSEsub500)
data(RawCounts_Weger2021_MOUSEsub500)
The function simulates an in silico RNA-seq raw counts data
inspired from the model used in the DESeq2
package.
It is used in some examples of other functions.
RawCountsSimulation(Nb.Group, Nb.Time, Nb.per.GT, Nb.Gene)
RawCountsSimulation(Nb.Group, Nb.Time, Nb.per.GT, Nb.Gene)
Nb.Group |
Non negative integer. Number of biological condition (minimum 1). |
Nb.Time |
Non negative integer. Number of time points (minimum 1). |
Nb.per.GT |
Non negative integer. Number of sample for each condition and time (minimum 1). |
Nb.Gene |
Non negative integer. Number of genes (minimum 1) |
A simulated RNA-seq raw counts data.
RawCountsSimulation(Nb.Group=3, Nb.Time=5, Nb.per.GT=7, Nb.Gene=50) ## RawCountsSimulation(Nb.Group=1, Nb.Time=5, Nb.per.GT=7, Nb.Gene=50) ## RawCountsSimulation(Nb.Group=3, Nb.Time=1, Nb.per.GT=7, Nb.Gene=50)
RawCountsSimulation(Nb.Group=3, Nb.Time=5, Nb.per.GT=7, Nb.Gene=50) ## RawCountsSimulation(Nb.Group=1, Nb.Time=5, Nb.per.GT=7, Nb.Gene=50) ## RawCountsSimulation(Nb.Group=3, Nb.Time=1, Nb.per.GT=7, Nb.Gene=50)
The list Results_DEanalysis_sub500 contains the results of
DEanalysisGlobal()
for each of the following raw counts :
RawCounts_Weger2021_MOUSEsub500,
RawCounts_Leong2014_FISSIONsub500wt and
RawCounts_Schleiss2021_CLLsub500
data(Results_DEanalysis_sub500)
data(Results_DEanalysis_sub500)
A list of 3 SummarizedExperiment class object
Each list in Results_DEanalysis_sub500 contains only the necessary outputs
of DEanalysisGlobal()
,
needed for the functions:
DEplotVolcanoMA()
,
DEplotHeatmaps()
,
GSEApreprocessing()
,
and
GSEAQuickAnalysis()
,
for each of the following raw counts :
RawCounts_Weger2021_MOUSEsub500,
RawCounts_Leong2014_FISSIONsub500wt and
RawCounts_Schleiss2021_CLLsub500
Results_DEanalysis_sub500
contains the outputs of
DEanalysisGlobal()
of:
RawCounts_Weger2021_MOUSEsub500,
RawCounts_Leong2014_FISSIONsub500wt and
RawCounts_Schleiss2021_CLLsub500
data(Results_DEanalysis_sub500)
data(Results_DEanalysis_sub500)
The database is a data.frame which contains transcript length of homo sapiens genes (40452 genes).
data(Transcript_HomoSapiens_Database)
data(Transcript_HomoSapiens_Database)
A data frame with 500 rows (genes) and 13 columns (samples). The column names are as follow
ENSEMBL gene names.
The sample is the first replica (r1) of the biological condition N1wt and T1wt.
The first column contains genes symbol of the homo sapiens organism and the second column contains the median of transcript length for each gene of the first column.
Mouse dataset with four biological conditions.
HGNC, ENSEMBL and NCBI database.
data(Transcript_HomoSapiens_Database)
data(Transcript_HomoSapiens_Database)