Title: | Transcription factor Inference through Gaussian process Reconstruction of Expression |
---|---|
Description: | The tigre package implements our methodology of Gaussian process differential equation models for analysis of gene expression time series from single input motif networks. The package can be used for inferring unobserved transcription factor (TF) protein concentrations from expression measurements of known target genes, or for ranking candidate targets of a TF. |
Authors: | Antti Honkela, Pei Gao, Jonatan Ropponen, Miika-Petteri Matikainen, Magnus Rattray, Neil D. Lawrence |
Maintainer: | Antti Honkela <[email protected]> |
License: | AGPL-3 |
Version: | 1.61.0 |
Built: | 2024-12-26 06:05:00 UTC |
Source: | https://github.com/bioc/tigre |
This package implements the method of Gao et al. (2008) and Honkela et al. (2010) for Gaussian process modelling single input motif regulatory systems with time-series expression data. The method can be used to rank potential targets of transcription factors based on such data.
Package: | tigre |
Type: | Package |
Version: | 1.12.0 |
Date: | 2012-10-02 |
License: | A-GPL Version 3 |
For details of using the package please refer to the Vignette.
Antti Honkela, Pei Gao, Jonatan Ropponen, Miika-Petteri Matikainen, Magnus Rattray, Neil D. Lawrence
Maintainer: Antti Honkela <[email protected]>
A.~Honkela, P.~Gao, J.~Ropponen, M.~Rattray, and N.~D.~Lawrence. tigre: Transcription factor Inference through Gaussian process Reconstruction of Expression for Bioconductor. Bioinformatics 27(7):1026-1027, 2011. DOI: 10.1093/bioinformatics/btr057.
P.~Gao, A.~Honkela, M.~Rattray, and N.~D.~Lawrence. Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics 24(16):i70–i75, 2008. DOI: 10.1093/bioinformatics/btn278.
A.~Honkela, C.~Girardot, E.~H. Gustafson, Y.-H. Liu, E.~E.~M. Furlong, N.~D. Lawrence, and M.~Rattray. Model-based method for transcription factor target identification with limited data. Proc Natl Acad Sci USA 107(17):7793-7798, 2010. DOI: 10.1073/pnas.0914285107.
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbe <- get('FBgn0035257', env=fbgnMapping) # Learn the model model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE) # Plot it GPPlot(model, nameMapping=getAnnMap("FLYBASE", annotation(drosophila_gpsim_fragment))) ## End(Not run)
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbe <- get('FBgn0035257', env=fbgnMapping) # Learn the model model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE) # Plot it GPPlot(model, nameMapping=getAnnMap("FLYBASE", annotation(drosophila_gpsim_fragment))) ## End(Not run)
Four genes from the 12 time point Drosophila embryonic development Affymetrix microarray gene expression data set by Tomancak et al. (2002).
The data has been processed using mmgmos
from puma
package and processData
.
data(drosophila_gpsim_fragment)
data(drosophila_gpsim_fragment)
An ExpressionTimeSeries
object with
3 repeats of the 12 time points for 4 probes.
ftp://ftp.fruitfly.org/pub/embryo_tc_array_data/
Tomancak, P et al. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol 3:RESEARCH0088, 2002.
Four genes from the 12 time point Drosophila embryonic development Affymetrix microarray gene expression data set by Tomancak et al. (2002).
The data has been processed using mmgmos
from the
puma
package.
data(drosophila_mmgmos_fragment)
data(drosophila_mmgmos_fragment)
A puma
package exprReslt
object with
3 repeats of the 12 time points for 4 probes.
ftp://ftp.fruitfly.org/pub/embryo_tc_array_data/
Tomancak, P et al. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol 3:RESEARCH0088, 2002.
Exports the results to an SQLite database which can then be browsed with a result browser. The function will export log likelihoods, z-scores, model figures and gene aliases.
export.scores(scores, datasetName='', experimentSet='', databaseFile='database.sqlite', preprocData=NULL, models=NULL, figpath=NULL, aliasTypes=c("SYMBOL", "GENENAME", "ENTREZID"), datasetSource='', datasetDescription='', datasetSaveLocation='', datasetFigureFilename='', experimentTimestamp=as.character(Sys.Date()), figureDesc='', figurePrio=0, regulator=NULL)
export.scores(scores, datasetName='', experimentSet='', databaseFile='database.sqlite', preprocData=NULL, models=NULL, figpath=NULL, aliasTypes=c("SYMBOL", "GENENAME", "ENTREZID"), datasetSource='', datasetDescription='', datasetSaveLocation='', datasetFigureFilename='', experimentTimestamp=as.character(Sys.Date()), figureDesc='', figurePrio=0, regulator=NULL)
scores |
The scoreList to export. |
datasetName |
Name of the dataset in the database. |
experimentSet |
Name of the experiment set in the database. |
databaseFile |
Filename of the database. New database is created if the file does not exist. |
preprocData |
Preprocessed data. This is required in order to generate models and figures and to calculate z-scores. Also, inserting aliases requires preprocessed data. |
models |
Learned models. If not given, the function will generate models if preprocessed data is available. |
figpath |
Figure path. If this is defined, the function will generate figures to the given path instead of inserting them to the database. |
aliasTypes |
Types of aliases that are inserted to the database. |
datasetSource |
Additional information that is inserted to the database if defined. |
datasetDescription |
Additional information that is inserted to the database if defined. |
datasetSaveLocation |
Additional information that is inserted to the database if defined. |
datasetFigureFilename |
Additional information that is inserted to the database if defined. |
experimentTimestamp |
Timestamp that is inserted to the database. The default value is current date in ISO-8601 format. |
figureDesc |
Additional information that is inserted to the database if defined. |
figurePrio |
Additional information that is inserted to the database if defined. |
regulator |
If defined, override the regulator name from scoreList. |
Miika-Petteri Matikainen, Antti Honkela
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # FBgn names of target genes targets <- c('FBgn0003486', 'FBgn0033188', 'FBgn0035257') # Load gene annotations library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) # Get the probe identifier for TF 'twi' twi <- get('twi', env=aliasMapping) # Load alternative gene annotations fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) # Get the probe identifiers for target genes targetProbes <- mget(targets, env=fbgnMapping) # Rank the targets, filtering weakly expressed genes with average # expression z-score below 1.8 scores <- GPRankTargets(drosophila_gpsim_fragment, TF=twi, testTargets=targetProbes, options=list(quiet=TRUE), filterLimit=1.8) # Export data from scoreList and preprocessed data to a database export.scores(scores, datasetName='Drosophila', experimentSet='GPSIM/GPDISIM', database='database.sqlite', preprocData=drosophila_gpsim_fragment, aliasTypes=c('SYMBOL', 'GENENAME', 'FLYBASE', 'ENTREZID')) ## End(Not run)
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # FBgn names of target genes targets <- c('FBgn0003486', 'FBgn0033188', 'FBgn0035257') # Load gene annotations library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) # Get the probe identifier for TF 'twi' twi <- get('twi', env=aliasMapping) # Load alternative gene annotations fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) # Get the probe identifiers for target genes targetProbes <- mget(targets, env=fbgnMapping) # Rank the targets, filtering weakly expressed genes with average # expression z-score below 1.8 scores <- GPRankTargets(drosophila_gpsim_fragment, TF=twi, testTargets=targetProbes, options=list(quiet=TRUE), filterLimit=1.8) # Export data from scoreList and preprocessed data to a database export.scores(scores, datasetName='Drosophila', experimentSet='GPSIM/GPDISIM', database='database.sqlite', preprocData=drosophila_gpsim_fragment, aliasTypes=c('SYMBOL', 'GENENAME', 'FLYBASE', 'ENTREZID')) ## End(Not run)
Container for time series expression assays and
experimental metadata. ExpressionTimeSeries
class is derived
from ExpressionSet
, and requires fields
experiments
and modeltime
in phenoData
.
Directly extends class ExpressionSet
.
new("ExpressionTimeSeries")
new("ExpressionTimeSeries",
phenoData = new("AnnotatedDataFrame"),
featureData = new("AnnotatedDataFrame"),
experimentData = new("MIAME"),
annotation = character(0),
protocolData = phenoData[,integer(0)],
exprs = new("matrix"),
var.exprs = new("matrix"))
This creates an ExpressionTimeSeries
with assayData
implicitly
created to contain exprs
and var.exprs
.
new("ExpressionTimeSeries",
assayData = assayDataNew(exprs=new("matrix")),
phenoData = new("AnnotatedDataFrame"),
featureData = new("AnnotatedDataFrame"),
experimentData = new("MIAME"),
annotation = character(0),
protocolData = phenoData[,integer(0)])
This creates an ExpressionTimeSeries
with assayData
provided
explicitly. In this form, the only required named argument is
assayData
.
ExpressionTimeSeries
instances are usually created through
new("ExpressionTimeSeries", ...)
. Usually the arguments to new
include exprs
(a matrix of expression data, with features
corresponding to rows and samples to columns), var.exprs
,
phenoData
,
featureData
, experimentData
, annotation
, and
protocolData
.
phenoData
, featureData
, experimentData
,
annotation
, and protocolData
can be missing, in which case
they are assigned default values.
assayData
:Inherited from
ExpressionSet
.
The models in gpsim
package assume that exprs
contains absolute (i.e. non-logarithmic) expression values.
The member var.exprs
may contain variances of the values.
phenoData
:Inherited from
ExpressionSet
.
The following fields are required: experiments
which
contains integers from 1 to N with measurements from the same
biological assay having the same number; modeltime
which contains observation times in model units.
featureData
:Inherited from ExpressionSet
.
experimentData
:Inherited from ExpressionSet
.
annotation
:Inherited from ExpressionSet
.
protocolData
:Inherited from ExpressionSet
.
.__classVersion__
:Inherited from ExpressionSet
.
See also methods for ExpressionSet
.
var.exprs(object)
, var.exprs(object)<- value
Access and set var.exprs
initialize("ExpressionTimeSeries")
Object instantiation, used by new; not to be called directly by the user.
Antti Honkela, Jonatan Ropponen
showClass("ExpressionTimeSeries")
showClass("ExpressionTimeSeries")
contains commands to constrain parameters to be positive via exponentiation or within a fixed interval via the sigmoid function.
expTransform(x, transform) sigmoidTransform(x, transform) boundedTransform(x, transform, bounds)
expTransform(x, transform) sigmoidTransform(x, transform) boundedTransform(x, transform, bounds)
x |
input argument. |
transform |
type of transform, 'atox' maps a value into the transformed space (i.e. makes it positive). 'xtoa' maps the parameter back from transformed space to the original space. 'gradfact' gives the factor needed to correct gradients with respect to the transformed parameter. |
bounds |
a 2-vector of bounds of allowed values in boundedTransform |
Return value as selected by tranform
# Transform unconstrained parameter -4 to a positive value expTransform(-4, 'atox') # Transform a bounded parameter in (1,3) to an unconstrained one boundedTransform(2, 'xtoa', c(1, 3))
# Transform unconstrained parameter -4 to a positive value expTransform(-4, 'atox') # Transform a bounded parameter in (1,3) to an unconstrained one boundedTransform(2, 'xtoa', c(1, 3))
'generateModels' recreates models based on the parameters stored in a scoreList.
generateModels(preprocData, scores)
generateModels(preprocData, scores)
preprocData |
The preprocessed data to be used. |
scores |
A scoreList object containing data of the models to be generated. |
'generateModels' returns a list of the generated models.
Antti Honkela, Jonatan Ropponen
GPLearn, GPRankTargets, GPRankTFs, scoreList
.
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names targets <- c('FBgn0003486', 'FBgn0033188', 'FBgn0035257') library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbes <- mget(targets, env=fbgnMapping) scores <- GPRankTargets(drosophila_gpsim_fragment, TF=twi, testTargets=targetProbes, options=list(quiet=TRUE), filterLimit=1.8) models <- generateModels(drosophila_gpsim_fragment, scores) ## End(Not run)
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names targets <- c('FBgn0003486', 'FBgn0033188', 'FBgn0035257') library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbes <- mget(targets, env=fbgnMapping) scores <- GPRankTargets(drosophila_gpsim_fragment, TF=twi, testTargets=targetProbes, options=list(quiet=TRUE), filterLimit=1.8) models <- generateModels(drosophila_gpsim_fragment, scores) ## End(Not run)
Forms an optimized model of the desired genes. The function can form a model with GPsim or GPdisim and it's also possible to use initial parameters or fix parameters for future use. The genes can also be filtered based on ratios calculated from the expression values. The given data can also be searched for the data of specific genes.
GPLearn(preprocData, TF = NULL, targets = NULL, useGpdisim = !is.null(TF), randomize = FALSE, addPriors = FALSE, fixedParams = FALSE, initParams = NULL, initialZero = TRUE, fixComps = NULL, dontOptimise = FALSE, allowNegativeSensitivities = FALSE, quiet = FALSE, gpsimOptions = NULL, allArgs = NULL)
GPLearn(preprocData, TF = NULL, targets = NULL, useGpdisim = !is.null(TF), randomize = FALSE, addPriors = FALSE, fixedParams = FALSE, initParams = NULL, initialZero = TRUE, fixComps = NULL, dontOptimise = FALSE, allowNegativeSensitivities = FALSE, quiet = FALSE, gpsimOptions = NULL, allArgs = NULL)
preprocData |
The preprocessed data to be used. |
TF |
The probe corresponding to the transcription factor (TF) mRNA if TF protein translation model is used, or NULL (default) if the translation model is not used. |
targets |
The target genes of the model. |
useGpdisim |
A logical value determining whether a model of translation is included. By default TRUE if TF is set, FALSE if TF is unset. |
randomize |
A logical value determining whether the parameters of the model are randomized before optimization. |
addPriors |
A logical value determining whether priors are added to the model. |
fixedParams |
A logical value determining whether the initial parameters are fixed. |
initParams |
The initial parameters for the model. In combination with fixedParams a value NA denotes parameters to learn. |
initialZero |
Assume a zero initial TF protein concentration, default = TRUE. |
fixComps |
The blocks of the kernel the parameters of which are to be fixed. To be used together with fixedParams and initParams. |
dontOptimise |
Just create the model, do not run optimisation. |
allowNegativeSensitivities |
Allow sensitivities to go negative. This is an experimental feature, and the negative values have no physical interpretation. |
quiet |
Suppress optimiser output. |
gpsimOptions |
Internal: additional options to pass to gp[di]simCreate. |
allArgs |
A list of arguments that can be used to override ones with the same name. |
Returns the optimized model.
Antti Honkela, Pei Gao, Jonatan Ropponen, Magnus Rattray, Neil D. Lawrence
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbe <- get('FBgn0035257', env=fbgnMapping) # Create the model but do not optimise (rarely needed...) model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) ## Not run: # Create and learn the model model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE) ## End(Not run)
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbe <- get('FBgn0035257', env=fbgnMapping) # Create the model but do not optimise (rarely needed...) model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) ## Not run: # Create and learn the model model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE) ## End(Not run)
The class is a container for the internal representation
of models used by the gpsim
package.
Objects can be created by calls of the form new("GPModel", model)
.
model
:A model object used internally by the code of
the gpsim
package
type
:Type of the model object
modelStruct(object)
, modelStruct(object)<-
value
Access and set the internal model
structure
modelType(object)
Access the internal type
values
show(object)
Informatively display object contents.
is.GPModel(object)
Check if object is a GPModel.
initialize("GPModel")
Object instantiation, used by new; not to be called directly by the user.
Antti Honkela, Jonatan Ropponen
GPLearn
, GPRankTargets
, GPRankTFs
,
generateModels
, modelExtractParam
,
modelLogLikelihood
.
showClass("GPModel")
showClass("GPModel")
Plots GP(DI)SIM models.
GPPlot(data, savepath = '', nameMapping = NULL, predt = NULL, fileOutput=FALSE, plotTime=NULL)
GPPlot(data, savepath = '', nameMapping = NULL, predt = NULL, fileOutput=FALSE, plotTime=NULL)
data |
The model to plot as returned by GPLearn. |
savepath |
The location in the file system where the images are saved. |
nameMapping |
The annotation used for mapping the names of the genes for the figures. |
predt |
The set of time points to use in plotting (default: the time interval covering the data). |
fileOutput |
Is the plot being saved to a file? If yes, do not open new interactive devices for each plot. |
plotTime |
The times of observations to use in the plot. Should usually not be changed! |
The function plots the fitted expression level of the transcription factor (if applicable), the inferred activity of the transcription factor, and the fitted expression level of the target(s).
Antti Honkela
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbe <- get('FBgn0035257', env=fbgnMapping) # Learn the model model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE) # Plot it GPPlot(model, nameMapping=getAnnMap("FLYBASE", annotation(drosophila_gpsim_fragment))) ## End(Not run)
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbe <- get('FBgn0035257', env=fbgnMapping) # Learn the model model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE) # Plot it GPPlot(model, nameMapping=getAnnMap("FLYBASE", annotation(drosophila_gpsim_fragment))) ## End(Not run)
GPRankTargets
ranks possible target genes by forming
optimized models with a fixed
transcription factor, a set of known target genes and targets to be
tested. The transcription factor and the known targets are always
included in the models while the tested targets are tested by
including them in the models one at a time. The function determines
itself whether to use GPSIM or GPDISIM based on the input
arguments.
GPRankTargets(preprocData, TF = NULL, knownTargets = NULL, testTargets = NULL, filterLimit = 1.8, returnModels = FALSE, options = NULL, scoreSaveFile = NULL, datasetName = "", experimentSet = "") GPRankTFs(preprocData, TFs, targets, filterLimit = 1.8, returnModels = FALSE, options = NULL, scoreSaveFile = NULL, datasetName = "", experimentSet = "")
GPRankTargets(preprocData, TF = NULL, knownTargets = NULL, testTargets = NULL, filterLimit = 1.8, returnModels = FALSE, options = NULL, scoreSaveFile = NULL, datasetName = "", experimentSet = "") GPRankTFs(preprocData, TFs, targets, filterLimit = 1.8, returnModels = FALSE, options = NULL, scoreSaveFile = NULL, datasetName = "", experimentSet = "")
preprocData |
The preprocessed data to be used. |
TF |
The transcription factor (TF) probe present in all models when TF protein translation model is used. Set to NULL (default) when translation model is not used. |
knownTargets |
The target genes present in all models. |
testTargets |
Target genes that are tested by including them in the models one at a time. Can be names of genes, or a set of indices in preprocData. |
filterLimit |
Genes with an average expression z-score above this figure are accepted after filtering. If this value is 0, all genes will be accepted. |
returnModels |
A logical value determining whether the function returns the calculated models. |
options |
A list of additional arguments to pass to GPLearn. |
scoreSaveFile |
Name of file to save the scores to after processing each gene. |
TFs |
The transcription factors that are tested by including them in the models one at a time. |
targets |
The target genes present in all models. |
datasetName |
For exporting the scores using
|
experimentSet |
For exporting the scores using
|
The models are formed by calling GPLearn
.
If there is no value given to the transcription
factor, a model without protein translation is used.
Without protein translation model, some
known targets are needed. If known
targets are given, a model is first created with only the transcription
factor and the known targets. The parameters extracted from this model
are used as initial parameters of the models with test targets.
GPRankTFs
is very similar to GPRankTargets
, except
it loops over candidate regulators, not candidate targets.
The function returns a scoreList containing the genes, parameters and log-likelihoods of the models If returnModels is true, the function returns a list of the calculated models.
Antti Honkela, Jonatan Ropponen, Magnus Rattray, Neil D. Lawrence
GPLearn, scoreList, generateModels,
export.scores
.
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names targets <- c('FBgn0003486', 'FBgn0033188', 'FBgn0035257') library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbes <- mget(targets, env=fbgnMapping) scores <- GPRankTargets(drosophila_gpsim_fragment, TF=twi, testTargets=targetProbes, options=list(quiet=TRUE), filterLimit=1.8) ## End(Not run)
## Not run: # Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Get the target probe names targets <- c('FBgn0003486', 'FBgn0033188', 'FBgn0035257') library(annotate) aliasMapping <- getAnnMap("ALIAS2PROBE", annotation(drosophila_gpsim_fragment)) twi <- get('twi', env=aliasMapping) fbgnMapping <- getAnnMap("FLYBASE2PROBE", annotation(drosophila_gpsim_fragment)) targetProbes <- mget(targets, env=fbgnMapping) scores <- GPRankTargets(drosophila_gpsim_fragment, TF=twi, testTargets=targetProbes, options=list(quiet=TRUE), filterLimit=1.8) ## End(Not run)
creates a model for single input motifs with Gaussian processes.
gpsimCreate(Ngenes, Ntf, times, y, yvar, options, genes=NULL, annotation=NULL) gpdisimCreate(Ngenes, Ntf, times, y, yvar, options, genes=NULL, annotation=NULL)
gpsimCreate(Ngenes, Ntf, times, y, yvar, options, genes=NULL, annotation=NULL) gpdisimCreate(Ngenes, Ntf, times, y, yvar, options, genes=NULL, annotation=NULL)
Ngenes |
number of genes to be modelled in the system. |
Ntf |
number of proteins to be modelled in the system. |
times |
the time points where the data is to be modelled. |
y |
the values of each gene at the different time points. |
yvar |
the variances of each gene at the different time points. |
options |
options structure (optional). |
genes |
names of the probes the model is for |
annotation |
(optional) annotation for the probe names |
These functions are meant to be used through GPLearn
.
model |
model structure containing default parameterisation. |
modelExtractParam, modelOptimise, GPLearn
.
## missing, see GPLearn
## missing, see GPLearn
Compute the kernel given the parameters and X.
kernCompute(kern, x, x2) kernDiagCompute(kern, x)
kernCompute(kern, x, x2) kernDiagCompute(kern, x)
kern |
kernel structure to be computed. |
x |
first or only input data matrix (rows are data points) to the kernel computation. |
x2 |
(optional) second input matrix to the kernel computation (forms the columns of the kernel). |
K <- kernCompute(kern, X)
computes a kernel matrix for the given kernel type given an input data matrix.
K <- kernCompute(kern, X1, X2)
computes a kernel matrix for the given kernel type given two input data matrices, one for the rows and one for the columns.
K <- kernDiagCompute(kern, X)
computes the diagonal of a kernel matrix for the given kernel.
K <- *X*kernCompute(kern1, kern2, X)
K <- *X*kernCompute(kern1, kern2, X1, X2)
same as above, but for cross combinations of two kernels, kern1
and kern2
.
K |
computed elements of the kernel structure. |
Kd |
vector containing computed diagonal elements of the kernel structure. |
kern <- kernCreate(1, 'rbf') K <- kernCompute(kern, as.matrix(3:8))
kern <- kernCreate(1, 'rbf') K <- kernCompute(kern, as.matrix(3:8))
Initialise a kernel structure.
kernCreate(x, kernType, kernOptions=NULL)
kernCreate(x, kernType, kernOptions=NULL)
x |
If list, array or matrix: input data values (from which kernel will later be computed). If scalar: input dimension of the design matrix (i.e. number of features in the design matrix). |
kernType |
Type of kernel to be created, some standard types are
'rbf', 'white', 'sim' and 'disim'. If a list of the form
|
kernOptions |
(optional) list of kernel options |
kern <- kernCreate(X, type)
input points and a kernel type.
kern <- kernCreate(dim, type)
creates a kernel matrix structure given the dimensions of the design matrix and the kernel type.
The *KernParamInit functions perform initialisation specific to different types of kernels. They should not be called directly.
kern |
The kernel structure. |
# Create a multi kernel with two rbf blocks with bounded inverse widths invWidthBounds <- c(0.5, 2) kernType <- list(type="multi", comp=list()) for (i in 1:2) kernType$comp[[i]] <- list(type="parametric", realType="rbf", options=list(isNormalised=TRUE, inverseWidthBounds=invWidthBounds)) kern <- kernCreate(1, kernType) # Tie the inverse with parameters of the component RBF kernels kern <- modelTieParam(kern, list(tieWidth="inverseWidth")) kernDisplay(kern)
# Create a multi kernel with two rbf blocks with bounded inverse widths invWidthBounds <- c(0.5, 2) kernType <- list(type="multi", comp=list()) for (i in 1:2) kernType$comp[[i]] <- list(type="parametric", realType="rbf", options=list(isNormalised=TRUE, inverseWidthBounds=invWidthBounds)) kern <- kernCreate(1, kernType) # Tie the inverse with parameters of the component RBF kernels kern <- modelTieParam(kern, list(tieWidth="inverseWidth")) kernDisplay(kern)
computes the gradient of the (diagonal of the) kernel matrix with respect to the elements of the design matrix given in X.
kernDiagGradX(kern, x) kernGradX(kern, x, x2)
kernDiagGradX(kern, x) kernGradX(kern, x, x2)
kern |
the kernel structure for which gradients are being computed. |
x |
if only argument: the input data in the form of a design matrix, if two arguments: row locations against which gradients are being computed. |
x2 |
(optional) column locations against which gradients are being computed. |
gX |
the gradients of the diagonal with respect to each element of X. The returned matrix has the same dimensions as X. |
gX2 |
the returned gradients. The gradients are returned in a matrix which is numData x numInputs x numData. Where numData is the number of data points and numInputs is the number of input dimensions in X. |
kern <- kernCreate(1, 'mlp') g <- kernDiagGradX(kern, as.matrix(3:8))
kern <- kernCreate(1, 'mlp') g <- kernDiagGradX(kern, as.matrix(3:8))
Compute the gradient wrt the kernel parameters.
kernGradient(kern, x, ...)
kernGradient(kern, x, ...)
kern |
the kernel structure for which the gradients are being computed. |
x |
the input locations for which the gradients are being computed, specifically those associated with the rows of the kernel matrix if there are two arguments of input locations. |
... |
optional arguments including potentially: the input locations associated with the columns of the kernel matrix; matrix of partial derivatives of the function of interest with respect to the kernel matrix. With single input, the argument takes the form of a square matrix of dimension numData, where numData is the number of rows in x, with two input arguments the matrix should have the same number of rows as the first and the same number of columns as the second has rows. |
g <- kernGradient(kern, x, partial)
g <- *kernGradient(kern, x, partial)
computes the gradient of functions with respect to the kernel parameters. As well as the kernel structure and the input positions, the user provides a matrix PARTIAL which gives the partial derivatives of the function with respect to the relevant elements of the kernel matrix.
g <- kernGradient(kern, x1, x2, partial_)
g <- *kernGradient(kern, x1, x2, partial_)
computes the derivatives as above, but input locations are now provided in two matrices associated with rows and columns of the kernel matrix.
g <- *X*kernGradient(kern1, kern2, x, partial)
g <- *X*kernGradient(kern1, kern2, x1, x2, partial_)
same as above, but for cross combinations of two kernels, kern1
and kern2
.
g |
gradients of the function of interest with respect to the kernel parameters. The ordering of the vector should match that provided by the function kernExtractParam. |
kernCompute
, kernExtractParam
.
kern <- kernCreate(1, 'rbf') g <- kernGradient(kern, as.matrix(c(1, 4)), array(1, c(2, 2)))
kern <- kernCreate(1, 'rbf') g <- kernGradient(kern, as.matrix(c(1, 4)), array(1, c(2, 2)))
Helper function for computing the log of difference
lnDiffErfs(x1, x2)
lnDiffErfs(x1, x2)
x1 |
argument of the positive erf |
x2 |
argument of the negative erf |
v <- lnDiffErfs(x1, x2)
computes the log of the difference of two erfs in a numerically stable manner.
v |
list(c(log(abs(erf(x1) - erf(x2))), sign(erf(x1) - erf(x2)))) |
lnDiffErfs(100, 10)
lnDiffErfs(100, 10)
displays the parameters of the model/kernel and the model/kernel type to the console.
modelDisplay(model, ...)
modelDisplay(model, ...)
model |
the model/kernel structure to be displayed. |
... |
(optional) indent level for the display. |
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # The probe identifier for TF 'twi' twi <- "143396_at" # The probe identifier for the target gene targetProbe <- "152715_at" # Create the model, but do not optimise model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) # Display the initial model modelDisplay(model)
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # The probe identifier for TF 'twi' twi <- "143396_at" # The probe identifier for the target gene targetProbe <- "152715_at" # Create the model, but do not optimise model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) # Display the initial model modelDisplay(model)
Update a model structure or component with new parameters, or update the posterior processes.
modelExpandParam(model, params) modelUpdateProcesses(model, predt=NULL)
modelExpandParam(model, params) modelUpdateProcesses(model, predt=NULL)
model |
the model structure to be updated. |
params |
vector of parameters. |
predt |
(optional) a vector of times to infer the posterior at. By default this is 100 points spanning the time range of the data. |
model <- modelExpandParam(model, param)
returns a model structure filled with the parameters in the given
vector. This is used as a helper function to enable parameters to be
optimised in, for example, the optimisation functions.
model <- modelUpdateProcesses(model)
updates posterior processes of the given model.
model |
updated model structure. |
## Not run: # Learn the model model <- GPLearn(...) params <- modelExtractParam(model, only.values=TRUE) params[1] <- 0 new_model <- modelExpandParam(model, params) new_model <- modelUpdateProcesses(new_model) ## End(Not run)
## Not run: # Learn the model model <- GPLearn(...) params <- modelExtractParam(model, only.values=TRUE) params[1] <- 0 new_model <- modelExpandParam(model, params) new_model <- modelUpdateProcesses(new_model) ## End(Not run)
Extract parameters from the model into a vector of parameters for optimisation.
modelExtractParam(model, only.values=TRUE, untransformed.values=FALSE)
modelExtractParam(model, only.values=TRUE, untransformed.values=FALSE)
model |
the model structure containing the parameters to be extracted. |
only.values |
include parameter names in the returned vector. |
untransformed.values |
return actual values, not transformed values used by the optimisers. |
param |
vector of parameters extracted from the model. |
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # The probe identifier for TF 'twi' twi <- "143396_at" # The probe identifier for the target gene targetProbe <- "152715_at" # Create the model, but do not optimise model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) # Get the initial parameter values params <- modelExtractParam(model, only.values=FALSE)
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # The probe identifier for TF 'twi' twi <- "143396_at" # The probe identifier for the target gene targetProbe <- "152715_at" # Create the model, but do not optimise model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) # Get the initial parameter values params <- modelExtractParam(model, only.values=FALSE)
modeGradient
gives the gradient of the objective function for a
model. By default the objective function (modelObjective
) is
a negative log likelihood (modelLogLikelihood
).
modelObjective(params, model, ...) modelLogLikelihood(model) modelGradient(params, model, ...)
modelObjective(params, model, ...) modelLogLikelihood(model) modelGradient(params, model, ...)
params |
parameter vector to evaluate at. |
model |
model structure. |
... |
optional additional arguments. |
g |
the gradient of the error function to be minimised. |
v |
the objective function value (lower is better). |
ll |
the log-likelihood value. |
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # The probe identifier for TF 'twi' twi <- "143396_at" # The probe identifier for the target gene targetProbe <- "152715_at" # Create the model but do not optimise model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) params <- modelExtractParam(model, only.values=FALSE) ll <- modelLogLikelihood(model) paramValues <- modelExtractParam(model, only.values=TRUE) modelGradient(paramValues, model)
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # The probe identifier for TF 'twi' twi <- "143396_at" # The probe identifier for the target gene targetProbe <- "152715_at" # Create the model but do not optimise model <- GPLearn(drosophila_gpsim_fragment, TF=twi, targets=targetProbe, useGpdisim=TRUE, quiet=TRUE, dontOptimise=TRUE) params <- modelExtractParam(model, only.values=FALSE) ll <- modelLogLikelihood(model) paramValues <- modelExtractParam(model, only.values=TRUE) modelGradient(paramValues, model)
groups of parameters of a model to be seen as one parameter during optimisation of the model.
modelTieParam(model, paramsList)
modelTieParam(model, paramsList)
model |
the model for which parameters are being tied together. |
paramsList |
indices of parameteres to group together. The indices are provided in a list. Each element in the list contains a vector of indices of parameters that should be considered as one parameter. Each group of parameters in each cell should obviously be mutually exclusive. Alternatively, the specification may consist of strings, which are
interpreted as regular expressions that are matched against the
parameter names returned by |
model |
the model with the parameters grouped together. |
modelExtractParam, modelExpandParam, modelGradient
.
# Create a multi kernel with two rbf blocks with bounded inverse widths invWidthBounds <- c(0.5, 2) kernType <- list(type="multi", comp=list()) for (i in 1:2) kernType$comp[[i]] <- list(type="parametric", realType="rbf", options=list(isNormalised=TRUE, inverseWidthBounds=invWidthBounds)) kern <- kernCreate(1, kernType) # Tie the inverse with parameters of the component RBF kernels kern <- modelTieParam(kern, list(tieWidth="inverseWidth")) kernDisplay(kern)
# Create a multi kernel with two rbf blocks with bounded inverse widths invWidthBounds <- c(0.5, 2) kernType <- list(type="multi", comp=list()) for (i in 1:2) kernType$comp[[i]] <- list(type="parametric", realType="rbf", options=list(isNormalised=TRUE, inverseWidthBounds=invWidthBounds)) kern <- kernCreate(1, kernType) # Tie the inverse with parameters of the component RBF kernels kern <- modelTieParam(kern, list(tieWidth="inverseWidth")) kernDisplay(kern)
returns the current default function for constraining a parameter.
optimiDefaultConstraint(constraint)
optimiDefaultConstraint(constraint)
constraint |
the type of constraint you want to place on the parameter, options include 'positive' (gives an 'exp' constraint) and 'zeroone' (gives a 'sigmoid' constraint). |
val |
a list with two components: 'func' for the name of function used to apply the constraint, and 'hasArgs' for a boolean flag if the function requires additional arguments. |
expTransform, sigmoidTransform
.
optimiDefaultConstraint('positive') optimiDefaultConstraint('bounded')
optimiDefaultConstraint('positive') optimiDefaultConstraint('bounded')
Plots ExpressionTimeSeries data.
plotTimeseries(data, nameMapping = NULL)
plotTimeseries(data, nameMapping = NULL)
data |
An ExpressionTimeSeries object. |
nameMapping |
The annotation used for mapping the names of the genes for the figures. By default, the SYMBOL annotation for the array is used, if available. |
The function plots the expression levels from an ExpressionTimeSeries object and the associated standard deviations. If the object includes multiple time series, they will be plotted in the same figure, but slightly shifted.
Antti Honkela
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Plot the first two genes plotTimeseries(drosophila_gpsim_fragment[1:2,])
# Load a mmgmos preprocessed fragment of the Drosophila developmental # time series data(drosophila_gpsim_fragment) # Plot the first two genes plotTimeseries(drosophila_gpsim_fragment[1:2,])
processData
further processes time series data preprocessed
by puma
or lumi
.
processRawData
performs similar processing for other data.
Both functions return ExpressionTimeSeries
objects that can be used as input for the functions
GPLearn
and GPRankTargets
.
processData(data, times = NULL, experiments = NULL, do.normalisation = TRUE) processRawData(rawData, times, experiments = NULL, is.logged = TRUE, do.normalisation = ifelse(is.logged, TRUE, FALSE))
processData(data, times = NULL, experiments = NULL, do.normalisation = TRUE) processRawData(rawData, times, experiments = NULL, is.logged = TRUE, do.normalisation = ifelse(is.logged, TRUE, FALSE))
data |
The preprocessed data from |
rawData |
Raw data matrix to be used. Each row corresponds to a gene and each column to a data point. |
times |
Observation times of each data point.
If unspecified or NULL, |
experiments |
The replicate structure of the data indicating which expression data points arise from which experiments. This should be an array in integers from 1 to N with length equal to the number of data points. By default all the data points are assumed to be from same replicate. |
is.logged |
Indicates whether the expression values are on log scale or not. Normalisation of non-logged data is unsupported. |
do.normalisation |
Indicates whether to perform the normalisation. |
The expression data (and percentiles, if available) are normalized
by equalising the mean of log-expression in each time points.
In processData
, a normal
distribution is then fitted into the data with distfit.
An ExpressionTimeSeries
object containing all provided information.
Antti Honkela, Jonatan Ropponen
## Load a mmgmos preprocessed fragment of the Drosophila developmental ## time series data(drosophila_mmgmos_fragment) ## Process the data (3 experiments containing 12 time points each) drosophila_gpsim_fragment <- processData(drosophila_mmgmos_fragment, experiments=rep(1:3, each=12))
## Load a mmgmos preprocessed fragment of the Drosophila developmental ## time series data(drosophila_mmgmos_fragment) ## Process the data (3 experiments containing 12 time points each) drosophila_gpsim_fragment <- processData(drosophila_mmgmos_fragment, experiments=rep(1:3, each=12))
Optimise the given function using (scaled) conjugate gradients.
optimiDefaultOptions() SCGoptim(x, fn, grad, options, ...) CGoptim(x, fn, grad, options, ...) modelOptimise(model, options, ...)
optimiDefaultOptions() SCGoptim(x, fn, grad, options, ...) CGoptim(x, fn, grad, options, ...) modelOptimise(model, options, ...)
model |
the model to be optimised. |
x |
initial parameter values. |
fn |
objective function to minimise |
grad |
gradient function of the objective |
options |
options structure like one returned by
|
... |
extra arguments to pass to fn and grad |
options |
an options structure |
newParams |
optimised parameter values |
model |
the optimised model. |
## Not run to speed up package checks # model <- GPLearn(..., dontOptimise=TRUE) # options <- optimiDefaultOptions() # model <- modelOptimise(model, options)
## Not run to speed up package checks # model <- GPLearn(..., dontOptimise=TRUE) # options <- optimiDefaultOptions() # model <- modelOptimise(model, options)
'scoreList' is an object which contain the genes, parameters, log-likelihoods and arguments of models. With the data in a scoreList item and the original data used for creating the models, the models can be reconstructed with the function 'generateModels'.
Objects can be created by calls of the form scoreList(params, loglikelihoods,
genes, modelArgs, knownTargets, TF, sharedModel)
.
params
:The parameters of the models.
loglikelihoods
:The log-likelihoods of the models.
baseloglikelihoods
:The log-likelihoods of corresponding null models.
genes
:The genes used in the models.
modelArgs
:A list of arguments used to generate the models.
knownTargets
:The list of known targets used in the ranking.
TF
:The TF used in the ranking.
sharedModel
:Shared model for known targets.
datasetName
:Dataset name, used when exporting scores to a database.
experimentSet
:Experiment set name, used when exporting scores to a database.
Class-specific methods:
write.scores(object, ...)
Writes the log-likelihoods
and null log-likelihoods. Accepts any options write.table
does.
genes(object)
, genes(object)<- value
Access and set genes
knownTargets(object)
, knownTargets(object)<- value
Access and set knownTargets
loglikelihoods(object)
, loglikelihoods(object)<- value
Access and set loglikelihoods
baseloglikelihoods(object)
, baseloglikelihoods(object)<- value
Access and set baseloglikelihoods
modelArgs(object)
, modelArgs(object)<- value
Access and set modelArgs
params(object)
, params(object)<- value
Access and set params
sharedModel(object)
, sharedModel(object)<- value
Access and set sharedModel
TF(object)
, TF(object)<- value
Access and set TF
datasetName(object)
, datasetName(object)<- value
Access and set datasetName
experimentSet(object)
, experimentSet(object)<- value
Access and set experimentSet
Standard generic methods:
object[(index)
Conducts subsetting of the scoreList.
c(object, ...)
Concatenates scoreLists.
length(object)
Returns the length of the list.
show(object)
Informatively display object contents.
sort(object, decreasing=FALSE)
Sort the list according to log-likelihood
Antti Honkela, Jonatan Ropponen
GPRankTargets
, GPRankTFs
,
generateModels
, write.table
.
showClass("scoreList")
showClass("scoreList")