Title: | A unifying bioinformatics framework for spatial proteomics |
---|---|
Description: | The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation. |
Authors: | Laurent Gatto, Oliver Crook and Lisa M. Breckels with contributions from Thomas Burger and Samuel Wieczorek |
Maintainer: | Laurent Gatto <[email protected]> |
License: | GPL-2 |
Version: | 1.47.0 |
Built: | 2024-11-18 04:18:05 UTC |
Source: | https://github.com/bioc/pRoloc |
Adds GO annotations to the feature data
addGoAnnotations( object, params, evidence, useID = FALSE, fcol = "GOAnnotations", ... )
addGoAnnotations( object, params, evidence, useID = FALSE, fcol = "GOAnnotations", ... )
object |
An instance of class |
params |
An instance of class |
evidence |
GO evidence filtering. |
useID |
Logical. Should GO term names or identifiers be used?
If |
fcol |
Character. Name of the matrix of annotations to be added to the
|
... |
Other arguments passed to |
An updated MSnSet
with new feature data column
called GOAnnotations
containing a matrix of GO
annotations
Lisa M Breckels
library(pRolocdata) data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein sets/annotation information xx <- addGoAnnotations(dunkley2006, par) dim(fData(xx)$GOAnnotations) ## filter sets xx <- filterMinMarkers(xx, n = 50) dim(fData(xx)$GOAnnotations) xx <- filterMaxMarkers(xx, p = .25) dim(fData(xx)$GOAnnotations) ## Subset for specific protein sets sub <- subsetMarkers(xx, keep = c("vacuole")) ## Order protein sets res <- orderGoAnnotations(xx, k = 1:3, p = 1/3, verbose = FALSE) if (interactive()) { pRolocVis(res, fcol = "GOAnnotations") }
library(pRolocdata) data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein sets/annotation information xx <- addGoAnnotations(dunkley2006, par) dim(fData(xx)$GOAnnotations) ## filter sets xx <- filterMinMarkers(xx, n = 50) dim(fData(xx)$GOAnnotations) xx <- filterMaxMarkers(xx, p = .25) dim(fData(xx)$GOAnnotations) ## Subset for specific protein sets sub <- subsetMarkers(xx, keep = c("vacuole")) ## Order protein sets res <- orderGoAnnotations(xx, k = 1:3, p = 1/3, verbose = FALSE) if (interactive()) { pRolocVis(res, fcol = "GOAnnotations") }
Adds a legend to a plot2D
figure.
addLegend( object, fcol = "markers", where = c("bottomleft", "bottom", "bottomright", "left", "topleft", "top", "topright", "right", "center", "other"), col, bty = "n", ... )
addLegend( object, fcol = "markers", where = c("bottomleft", "bottom", "bottomright", "left", "topleft", "top", "topright", "right", "center", "other"), col, bty = "n", ... )
object |
An instance of class |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different
colours. Default is |
where |
One of |
col |
A |
bty |
Box type, as in |
... |
Additional parameters passed to |
The function has been updated in version 1.3.6 to recycle the
default colours when more organelle classes are provided. See
plot2D
for details.
Invisibly returns NULL
Laurent Gatto
The function adds a 'markers' feature variable. These markers are
read from a comma separated values (csv) spreadsheet file. This
markers file is expected to have 2 columns (others are ignored)
where the first is the name of the marker features and the second
the group label. Alternatively, a markers named vector as provided
by the pRolocmarkers
function can also be used.
addMarkers(object, markers, mcol = "markers", fcol, verbose = TRUE)
addMarkers(object, markers, mcol = "markers", fcol, verbose = TRUE)
object |
An instance of class |
markers |
A |
mcol |
A |
fcol |
An optional feature variable to be used to match against the markers. If missing, the feature names are used. |
verbose |
A |
It is essential to assure that featureNames(object)
(or
fcol
, see below) and marker names (first column) match,
i.e. the same feature identifiers and case fold are used.
A new instance of class MSnSet
with an additional
markers
feature variable.
Laurent Gatto
See pRolocmarkers
for a list of spatial
markers and markers
for details about markers
encoding.
library("pRolocdata") data(dunkley2006) atha <- pRolocmarkers("atha") try(addMarkers(dunkley2006, atha)) ## markers already exists fData(dunkley2006)$markers.org <- fData(dunkley2006)$markers fData(dunkley2006)$markers <- NULL marked <- addMarkers(dunkley2006, atha) fvarLabels(marked) ## if 'makers' already exists marked <- addMarkers(marked, atha, mcol = "markers2") fvarLabels(marked) stopifnot(all.equal(fData(marked)$markers, fData(marked)$markers2)) plot2D(marked) addLegend(marked, where = "topleft", cex = .7)
library("pRolocdata") data(dunkley2006) atha <- pRolocmarkers("atha") try(addMarkers(dunkley2006, atha)) ## markers already exists fData(dunkley2006)$markers.org <- fData(dunkley2006)$markers fData(dunkley2006)$markers <- NULL marked <- addMarkers(dunkley2006, atha) fvarLabels(marked) ## if 'makers' already exists marked <- addMarkers(marked, atha, mcol = "markers2") fvarLabels(marked) stopifnot(all.equal(fData(marked)$markers, fData(marked)$markers2)) plot2D(marked) addLegend(marked, where = "topleft", cex = .7)
"AnnotationParams"
Class to store annotation parameters to automatically query a Biomart
server, retrieve relevant annotation for a set of features of
interest using, for example getGOFromFeatures
and
makeGoSet
.
Objects can be created and set with the setAnnotationParams
function. Object are created by calling without any arguments
setAnnotationParams()
, which will open an interactive
interface. Depending on the value of "many.graphics"
option, a
graphical of a text-based menu will open (the text interface can be
forced by setting the graphics
argument to FALSE
:
setAnnotationParams(graphics = FALSE)
). The menu will allow to
select the species of interest first and the type of features (ENSEMBL
gene identifier, Entrez id, ...) second.
The species that are available are those for which ENSEMBL data is available in Biomart and have a set of attributes of interest available. The compatible identifiers for downstream queries are then automatically filtered and displayed for user selection.
It is also possible to pass a parameter inputs
, a character
vector of length 2 containing a pattern uniquely matching the species
of interest (in position 1) and a patterns uniquely matching the
feature types (in position 2). If the matches are not unique, an error
will be thrown.
A new instance of the AnnotationParams
will be created
to enable easy and automatic query of the Mart
instance. The
instance is invisibly returned and stored in a global variable in the
pRoloc package's private environment for automatic retrieval. If
a variable containing an AnnotationParams
instance is already
available, it can be set globally by passing it as argument to the
setAnnotationParams
function. Globally set
AnnotationParams
instances can be accessed with the
getAnnotationParams
function.
See the pRoloc-theta
vignette for details.
mart
:Object of class "Mart"
from the
biomaRt package.
martname
:Object of class "character"
with the
name of the mart
instance.
dataset
:Object of class "character"
with the
data set of the mart
instance.
filter
:Object of class "character"
with the
filter to be used when querying the mart
instance.
date
:Object of class "character"
indicating
when the current instance was created.
biomaRtVersion
:Object of class "character"
with the biomaRt version used to create the
AnnotationParams
instance.
.__classVersion__
:Object of class "Versions"
with the version of the AnnotationParams
class of the
current instance.
signature(object = "AnnotationParams")
: to
display objects.
Laurent Gatto <[email protected]>
getGOFromFeatures
, makeGoSet
and the
pRoloc-theta
vignette.
data(andy2011params) andy2011params data(dunkley2006params) dunkley2006params try(setAnnotationParams(inputs = c("nomatch1", "nomatch2"))) setAnnotationParams(inputs = c("Human genes", "UniProtKB/Swiss-Prot ID")) getAnnotationParams()
data(andy2011params) andy2011params data(dunkley2006params) dunkley2006params try(setAnnotationParams(inputs = c("nomatch1", "nomatch2"))) setAnnotationParams(inputs = c("Human genes", "UniProtKB/Swiss-Prot ID")) getAnnotationParams()
Checks the marker and unknown feature overlap of two MSnSet
instances.
checkFeatureNamesOverlap(x, y, fcolx = "markers", fcoly, verbose = TRUE)
checkFeatureNamesOverlap(x, y, fcolx = "markers", fcoly, verbose = TRUE)
x |
An |
y |
An |
fcolx |
The feature variable to separate unknown
( |
fcoly |
As |
verbose |
If |
Invisibly returns a named list of common markers, unique
x
markers, unique y
markers in, common unknowns,
unique x
unknowns and unique y
unknowns.
Laurent Gatto
library("pRolocdata") data(andy2011) data(andy2011goCC) checkFeatureNamesOverlap(andy2011, andy2011goCC) featureNames(andy2011goCC)[1] <- "ABC" res <- checkFeatureNamesOverlap(andy2011, andy2011goCC) res$markersX res$markersY
library("pRolocdata") data(andy2011) data(andy2011goCC) checkFeatureNamesOverlap(andy2011, andy2011goCC) featureNames(andy2011goCC)[1] <- "ABC" res <- checkFeatureNamesOverlap(andy2011, andy2011goCC) res$markersX res$markersY
Extracts qualitative feature variables from two MSnSet
instances and compares with a contingency table.
checkFvarOverlap(x, y, fcolx = "markers", fcoly, verbose = TRUE)
checkFvarOverlap(x, y, fcolx = "markers", fcoly, verbose = TRUE)
x |
An |
y |
An |
fcolx |
The feature variable to separate unknown
( |
fcoly |
As |
verbose |
If |
Invisibly returns a named list with the values of the diagonal, upper and lower triangles of the contingency table.
Laurent Gatto
library("pRolocdata") data(dunkley2006) res <- checkFvarOverlap(dunkley2006, dunkley2006, "markers", "markers.orig") str(res)
library("pRolocdata") data(dunkley2006) res <- checkFvarOverlap(dunkley2006, dunkley2006, "markers", "markers.orig") str(res)
In the original protein correlation profiling (PCP), Andersen et
al. use the peptide normalised profiles along gradient fractions and
compared them with the reference profiles (or set of profiles) by
computing values,
,
where
is the normalised value of the peptide in fraction i
and
is the value of the marker (from Wiese et al., 2007). The
protein
is then computed as the median of the peptide
values. Peptides and proteins with similar profiles to the
markers will have small
values.
The chi2
methods implement this idea and compute such Chi^2
values for sets of proteins.
signature(x = "matrix", y = "matrix", method =
"character", fun = "NULL", na.rm = "logical")
Compute nrow(x)
times nrow(y)
values,
for each
x
, y
feature pair. Method is one of
"Andersen2003"
or "Wiese2007"
; the former (default)
computed the as
sum(y-x)^2/length(x)
, while the
latter uses sum((y-x)^2/x)
. na.rm
defines if missing
values (NA
and NaN
) should be removed prior to
summation. fun
defines how to summarise the
values; default,
NULL
, does not combine the
values.
signature(x = "matrix", y = "numeric", method =
"character", na.rm = "logical")
Computes nrow(x)
values, for all the
pairs. See above for the other arguments.
signature(x = "numeric", y = "matrix", method =
"character", na.rm = "logical")
Computes nrow(y)
values, for all the
pairs. See above for the other arguments.
signature(x = "numeric", y = "numeric", method =
"character", na.rm = "logical")
Computes the value for the
pairs. See
above for the other arguments.
Laurent Gatto <[email protected]>
Andersen, J. S., Wilkinson, C. J., Mayor, T., Mortensen, P. et al., Proteomic characterization of the human centrosome by protein correlation profiling. Nature 2003, 426, 570 - 574.
Wiese, S., Gronemeyer, T., Ofman, R., Kunze, M. et al., Proteomics characterization of mouse kidney peroxisomes by tandem mass spectrometry and protein correlation profiling. Mol. Cell. Proteomics 2007, 6, 2045 - 2057.
mrk <- rnorm(6) prot <- matrix(rnorm(60), ncol = 6) chi2(mrk, prot, method = "Andersen2003") chi2(mrk, prot, method = "Wiese2007") pepmark <- matrix(rnorm(18), ncol = 6) pepprot <- matrix(rnorm(60), ncol = 6) chi2(pepmark, pepprot) chi2(pepmark, pepprot, fun = sum)
mrk <- rnorm(6) prot <- matrix(rnorm(60), ncol = 6) chi2(mrk, prot, method = "Andersen2003") chi2(mrk, prot, method = "Wiese2007") pepmark <- matrix(rnorm(18), ncol = 6) pepprot <- matrix(rnorm(60), ncol = 6) chi2(pepmark, pepprot) chi2(pepmark, pepprot, fun = sum)
Calculates class weights to be used for parameter optimisation and
classification such as svmOptimisation
or
svmClassification
- see the pRoloc tutorial
vignette for an example. The weights are calculated for all
non-unknown classes the inverse of the number of
observations.
classWeights(object, fcol = "markers")
classWeights(object, fcol = "markers")
object |
An instance of class |
fcol |
The name of the features to be weighted |
A table
of class weights
Laurent Gatto
library("pRolocdata") data(hyperLOPIT2015) classWeights(hyperLOPIT2015) data(dunkley2006) classWeights(dunkley2006)
library("pRolocdata") data(hyperLOPIT2015) classWeights(hyperLOPIT2015) data(dunkley2006) classWeights(dunkley2006)
This function computes the mean (normalised) pairwise distances for pre-defined sets of proteins.
clustDist(object, k = 1:5, fcol = "GOAnnotations", n = 5, verbose = TRUE, seed)
clustDist(object, k = 1:5, fcol = "GOAnnotations", n = 5, verbose = TRUE, seed)
object |
An instance of class |
k |
The number of clusters to try fitting to the protein set.
Default is |
fcol |
The feature meta-data containing matrix of protein sets/
marker definitions. Default is |
n |
The minimum number of proteins per set. If protein sets
contain less than |
verbose |
A logical defining whether a progress bar is displayed. |
seed |
An optional seed for the random number generator. |
The input to the function is a MSnSet
dataset
containing a matrix appended to the feature data slot
identifying the membership of protein instances to
a pre-defined set(s) e.g. a specific Gene Ontology term etc.
For each protein set, the clustDist
function (i)
extracts all instances belonging to the set, (ii) using
the kmeans
algorithm fits and tests k = c(1:5)
(default) cluster components to each set, (iii) calculates
the mean pairwise distance for each k
tested.
Note: currently distances are calcualted in Euclidean space, but other distance metrics will be supported in the future).
The output is a list
of ClustDist
objects,
one per information cluster. The ClustDist
class summarises the algorithm information such as the number of k's
tested for the kmeans, and mean and normalised pairwise Euclidean
distances per numer of component clusters tested. See ?ClustDist
for more details.
An instance of "ClustDistList"
containing
a "ClustDist"
instance for every protein set, which
summarises the algorithm information such as the number of k's tested
for the kmeans, and mean and normalised pairwise Euclidean distances
per numer of component clusters tested.
Lisa Breckels
For class definitions see "ClustDistList"
and "ClustDist"
.
library(pRolocdata) data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein sets/annotation information xx <- addGoAnnotations(dunkley2006, par) ## filter xx <- filterMinMarkers(xx, n = 50) xx <- filterMaxMarkers(xx, p = .25) ## get distances for protein sets dd <- clustDist(xx) ## plot clusters for first 'ClustDist' object ## in the 'ClustDistList' plot(dd[[1]], xx) ## plot distances for all protein sets plot(dd) ## Extract normalised distances ## Normalise by n^1/3 minDist <- getNormDist(dd, p = 1/3) ## Get new order according to lowest distance o <- order(minDist) ## Re-order GOAnnotations fData(xx)$GOAnnotations <- fData(xx)$GOAnnotations[, o] if (interactive()) { pRolocVis(xx, fcol = "GOAnnotations") }
library(pRolocdata) data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein sets/annotation information xx <- addGoAnnotations(dunkley2006, par) ## filter xx <- filterMinMarkers(xx, n = 50) xx <- filterMaxMarkers(xx, p = .25) ## get distances for protein sets dd <- clustDist(xx) ## plot clusters for first 'ClustDist' object ## in the 'ClustDistList' plot(dd[[1]], xx) ## plot distances for all protein sets plot(dd) ## Extract normalised distances ## Normalise by n^1/3 minDist <- getNormDist(dd, p = 1/3) ## Get new order according to lowest distance o <- order(minDist) ## Re-order GOAnnotations fData(xx)$GOAnnotations <- fData(xx)$GOAnnotations[, o] if (interactive()) { pRolocVis(xx, fcol = "GOAnnotations") }
"ClustDist"
The ClustDist
summaries algorithm information, from
running the clustDist
function, such as the number
of k's tested for the kmeans, and mean and normalised
pairwise (Euclidean) distances per numer of component
clusters tested.
Object of this class are created with the clustDist
function.
k
:Object of class "numeric"
storing
the number of k clusters tested.
dist
:Object of class "list"
storing
the list of distance matrices.
term
:Object of class "character"
describing
GO term name.
id
:Object of class "character"
describing
the GO term ID.
nrow
:Object of class "numeric"
showing
the number of instances in the set
clustsz
:Object of class "list"
describing
the number of instances for each cluster for each k tested
components
:Object of class "vector"
storing
the class membership of each protein for each k tested.
fcol
:Object of class "character"
showing
the feature column name in the corresponding MSnSet
where the protein set information is stored.
Plots the kmeans clustering results.
Shows the object.
Lisa M Breckels <[email protected]>
showClass("ClustDist") library('pRolocdata') data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein set/annotation information xx <- addGoAnnotations(dunkley2006, par) ## filter xx <- filterMinMarkers(xx, n = 50) xx <- filterMaxMarkers(xx, p = .25) ## get distances for protein sets dd <- clustDist(xx) ## plot clusters for first 'ClustDist' object ## in the 'ClustDistList' plot(dd[[1]], xx) ## plot distances for all protein sets plot(dd)
showClass("ClustDist") library('pRolocdata') data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein set/annotation information xx <- addGoAnnotations(dunkley2006, par) ## filter xx <- filterMinMarkers(xx, n = 50) xx <- filterMaxMarkers(xx, p = .25) ## get distances for protein sets dd <- clustDist(xx) ## plot clusters for first 'ClustDist' object ## in the 'ClustDistList' plot(dd[[1]], xx) ## plot distances for all protein sets plot(dd)
A class for storing lists of ClustDist
instances.
Object of this class are created with the clustDist
function.
x
:Object of class list
containing valid
ClustDist
instances.
log
:Object of class list
containing an object
creation log, containing among other elements the call that
generated the object.
.__classVersion__
:The version of the instance. For development purposes only.
"[["
Extracts a single ClustDist
at position.
"["
Extracts one of more ClustDists
as
ClustDistList
.
length
Returns the number of ClustDists
.
names
Returns the names of ClustDists
, if
available. The replacement method is also available.
show
Display the object by printing a short summary.
lapply(x, FUN, ...)
Apply function FUN
to each
element of the input x
. If the application of FUN
returns and ClustDist
, then the return value is an
ClustDistList
, otherwise a list
.
plot
Plots a boxplot of the distance results per protein set.
Lisa M Breckels <[email protected]>
library('pRolocdata') data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein set/annotation information xx <- addGoAnnotations(dunkley2006, par) ## filter xx <- filterMinMarkers(xx, n = 50) xx <- filterMaxMarkers(xx, p = .25) ## get distances for protein sets dd <- clustDist(xx) ## plot distances for all protein sets plot(dd) names(dd) ## Extract a sub-list of ClustDist objects dd[1] ## Extract 1st ClustDist object dd[[1]]
library('pRolocdata') data(dunkley2006) par <- setAnnotationParams(inputs = c("Arabidopsis thaliana genes", "Gene stable ID")) ## add protein set/annotation information xx <- addGoAnnotations(dunkley2006, par) ## filter xx <- filterMinMarkers(xx, n = 50) xx <- filterMaxMarkers(xx, p = .25) ## get distances for protein sets dd <- clustDist(xx) ## plot distances for all protein sets plot(dd) names(dd) ## Extract a sub-list of ClustDist objects dd[1] ## Extract 1st ClustDist object dd[[1]]
protein correlations.
Andersen et al. (2003) used a fixed threshold of 0.05 to
identify organelle-specific candidates. This function computes
empirical p-values by permutation the markers relative intensities and
computed null
values.
empPvalues(marker, corMatrix, n = 100, ...)
empPvalues(marker, corMatrix, n = 100, ...)
marker |
A |
corMatrix |
A |
n |
The number of iterations. |
... |
Additional parameters to be passed to |
A numeric
of length nrow(corMatrix)
.
Laurent Gatto <[email protected]>
Andersen, J. S., Wilkinson, C. J., Mayor, T., Mortensen, P. et al., Proteomic characterization of the human centrosome by protein correlation profiling. Nature 2003, 426, 570 - 574.
chi2
for calculation.
set.seed(1) mrk <- rnorm(6, 5, 1) prot <- rbind(matrix(rnorm(120, 5, 1), ncol = 6), mrk + rnorm(6)) mrk <- mrk/sum(mrk) prot <- prot/rowSums(prot) empPvalues(mrk, prot)
set.seed(1) mrk <- rnorm(6, 5, 1) prot <- rbind(matrix(rnorm(120, 5, 1), ncol = 6), mrk + rnorm(6)) mrk <- mrk/sum(mrk) prot <- prot/rowSums(prot) empPvalues(mrk, prot)
This function replaces a string or regular expression in a feature
variable using the sub
function.
fDataToUnknown(object, fcol = "markers", from = "^$", to = "unknown", ...)
fDataToUnknown(object, fcol = "markers", from = "^$", to = "unknown", ...)
object |
An instance of class |
fcol |
Feature variable to be modified. Default is
|
from |
A |
to |
A replacement for matched pattern. Default is
|
... |
Additional arguments passed to |
An updated MSnSet
.
Laurent Gatto
library("pRolocdata") data(dunkley2006) getMarkers(dunkley2006, "markers") dunkley2006 <- fDataToUnknown(dunkley2006, from = "unknown", to = "unassigned") getMarkers(dunkley2006, "markers")
library("pRolocdata") data(dunkley2006) getMarkers(dunkley2006, "markers") dunkley2006 <- fDataToUnknown(dunkley2006, from = "unknown", to = "unassigned") getMarkers(dunkley2006, "markers")
Removes columns or rows that have a certain proportion or absolute number of 0 values.
filterBinMSnSet(object, MARGIN = 2, t, q, verbose = TRUE)
filterBinMSnSet(object, MARGIN = 2, t, q, verbose = TRUE)
object |
An |
MARGIN |
1 or 2. Default is 2. |
t |
Rows/columns that have |
q |
If a row has a higher quantile than defined by |
verbose |
A |
A filtered MSnSet
.
Laurent Gatto
zerosInBinMSnSet
,
filterZeroCols
, filterZeroRows
.
set.seed(1) m <- matrix(sample(0:1, 25, replace=TRUE), 5) m[1, ] <- 0 m[, 1] <- 0 rownames(m) <- colnames(m) <- letters[1:5] fd <- data.frame(row.names = letters[1:5]) x <- MSnSet(exprs = m, fData = fd, pData = fd) exprs(x) ## Remove columns with no 1s exprs(filterBinMSnSet(x, MARGIN = 2, t = 0)) ## Remove columns with one 1 or less exprs(filterBinMSnSet(x, MARGIN = 2, t = 1)) ## Remove columns with two 1s or less exprs(filterBinMSnSet(x, MARGIN = 2, t = 2)) ## Remove columns with three 1s exprs(filterBinMSnSet(x, MARGIN = 2, t = 3)) ## Remove columns that have half or less of 1s exprs(filterBinMSnSet(x, MARGIN = 2, q = 0.5))
set.seed(1) m <- matrix(sample(0:1, 25, replace=TRUE), 5) m[1, ] <- 0 m[, 1] <- 0 rownames(m) <- colnames(m) <- letters[1:5] fd <- data.frame(row.names = letters[1:5]) x <- MSnSet(exprs = m, fData = fd, pData = fd) exprs(x) ## Remove columns with no 1s exprs(filterBinMSnSet(x, MARGIN = 2, t = 0)) ## Remove columns with one 1 or less exprs(filterBinMSnSet(x, MARGIN = 2, t = 1)) ## Remove columns with two 1s or less exprs(filterBinMSnSet(x, MARGIN = 2, t = 2)) ## Remove columns with three 1s exprs(filterBinMSnSet(x, MARGIN = 2, t = 3)) ## Remove columns that have half or less of 1s exprs(filterBinMSnSet(x, MARGIN = 2, q = 0.5))
fData
.Removes annotation information that contain more that a certain number/percentage of proteins
filterMaxMarkers(object, n, p = 0.2, fcol = "GOAnnotations", verbose = TRUE)
filterMaxMarkers(object, n, p = 0.2, fcol = "GOAnnotations", verbose = TRUE)
object |
An instance of class |
n |
Maximum number of proteins allowed per class/information term. |
p |
Maximum percentage of proteins per column. Default is 0.2 i.e. remove columns that have information for greater than 20 of the total number of proteins in the dataset (note: this is useful for example, if information is GO terms, for removing very general and uninformative terms). |
fcol |
The name of the matrix of marker information. Default is
|
verbose |
Number of marker candidates retained after filtering. |
An updated MSnSet
addGoAnnotations
and example therein.
fData
.Removes annotation information that contain less that a certain number/percentage of proteins
filterMinMarkers(object, n = 10, p, fcol = "GOAnnotations", verbose = TRUE)
filterMinMarkers(object, n = 10, p, fcol = "GOAnnotations", verbose = TRUE)
object |
An instance of class |
n |
Minimum number of proteins allowed per column. Default is 10. |
p |
Minimum percentage of proteins per column. |
fcol |
The name of the matrix of marker information. Default is
|
verbose |
Number of marker candidates retained after filtering. |
An updated MSnSet
.
Lisa M Breckels
addGoAnnotations
and example therein.
Removes all assay data columns/rows that are composed of only 0,
i.e. have a colSum
/rowSum
of 0.
filterZeroCols(object, verbose = TRUE) filterZeroRows(object, verbose = TRUE)
filterZeroCols(object, verbose = TRUE) filterZeroRows(object, verbose = TRUE)
object |
A |
verbose |
Print a message with the number of filtered out columns/row (if any). |
An MSnSet
.
Laurent Gatto
library("pRolocdata") data(andy2011goCC) any(colSums(exprs(andy2011goCC)) == 0) exprs(andy2011goCC)[, 1:5] <- 0 ncol(andy2011goCC) ncol(filterZeroCols(andy2011goCC))
library("pRolocdata") data(andy2011goCC) any(colSums(exprs(andy2011goCC)) == 0) exprs(andy2011goCC)[, 1:5] <- 0 ncol(andy2011goCC) ncol(filterZeroCols(andy2011goCC))
"GenRegRes"
and "ThetaRegRes"
Regularisation framework containers.
Object of this class are created with the respective regularisation
function: knnOptimisation
,
svmOptimisation
, plsdaOptimisation
,
knntlOptimisation
, ...
algorithm
:Object of class "character"
storing
the machine learning algorithm name.
hyperparameters
:Object of class "list"
with
the respective algorithm hyper-parameters tested.
design
:Object of class "numeric"
describing
the cross-validation design, the test data size and the number of
replications.
log
:Object of class "list"
with warnings
thrown during the hyper-parameters regularisation.
seed
:Object of class "integer"
with the random
number generation seed.
results
:Object of class "matrix"
of
dimenstions times
(see design
) by number of
hyperparameters + 1 storing the macro F1 values for the respective
best hyper-parameters for each replication.
f1Matrices
:Object of class "list"
with
respective times
cross-validation F1 matrices.
cmMatrices
:Object of class "list"
with
respective times
contingency matrices.
testPartitions
:Object of class "list"
with
respective times
test partitions.
datasize
:Object of class "list"
with details
about the respective inner and outter training and testing data
sizes.
Only in ThetaRegRes
:
predictions
: A list
of predictions for the
optimisation iterations.
otherWeights
: Alternative best theta weigts: a vector
per iterations, NULL
if no other best weights were found.
Returns a matrix of F1 scores for the optimisation parameters.
signature(object = "GenRegRes", t =
"numeric")
and signature(object = "ThetaRegRes", t =
"numeric")
: Constructs a table of all possible parameter
combination and count how many have an F1 scores greater or equal
than t
. When t
is missing (default), the best F1
score is used. This method is useful in conjunctin with
plot
.
Returns the best parameters. It is however strongly
recommended to inspect the optimisation results. For a
ThetaRegRes
optimisation result, the method to chose the
best parameters can be "median"
(default) or "mean"
(the median or mean of the best weights is chosen), "max"
(the first weights with the highest macro-F1 score, considering
that multiple max scoring combinations are possible) or
"count"
(the observed weight that get the maximum number of
observations, see f1Count
). The favourP
argument can
be used to prioritise weights that favour the primary data
(i.e. heigh weights). See favourPrimary
below.
Returns the seed used for the optimisation run.
signature(object = "GenRegRes")
: Returns a
vector of recorded warnings.
signature(object = "GenRegRes")
: Plots a
heatmap of of the optimisation results. Only for
"GenRegRes"
instances.
Plots the optisisation results.
Shows the object.
Only for ThetaRegRes
:
combineThetaRegRes(object)
Takes a list
of
ThetaRegRes
instances to be combined and returnes a new
ThetaRegRes
instance.
favourPrimary(primary, auxiliary, object, verbose =
TRUE)
Takes the primary
and auxiliary
data
sources (two MSnSet
instances) and a
ThetaRegRes
object and returns and updated
ThetaRegRes
instance containing best parameters/weigths
(see the getParams
function) favouring the primary data
when multiple best theta weights are available.
Laurent Gatto <[email protected]>
showClass("GenRegRes") showClass("ThetaRegRes")
showClass("GenRegRes") showClass("ThetaRegRes")
The function pulls the gene ontology (GO) terms for a set of feature names.
getGOFromFeatures( id, namespace = "cellular_component", evidence = NULL, params = NULL, verbose = FALSE, nmax = 500 )
getGOFromFeatures( id, namespace = "cellular_component", evidence = NULL, params = NULL, verbose = FALSE, nmax = 500 )
id |
An |
namespace |
The GO namespace. One of
|
evidence |
The GO evidence code. See
|
params |
An instance of class
|
verbose |
A |
nmax |
As described in
https://support.bioconductor.org/p/86358/, the Biomart
result can be unreliable for large queries. This argument
splits the input in chunks of length |
A data.frame
with relevant GO terms.
Laurent Gatto
library(pRolocdata) data(dunkley2006) data(dunkley2006params) dunkley2006params fn <- featureNames(dunkley2006)[1:5] getGOFromFeatures(fn, params = dunkley2006params)
library(pRolocdata) data(dunkley2006) data(dunkley2006params) dunkley2006params fn <- featureNames(dunkley2006)[1:5] getGOFromFeatures(fn, params = dunkley2006params)
Convenience accessor to the organelle classes in an 'MSnSet'.
This function returns the organelle classes of an
MSnSet
instance. As a side effect, it prints out the classes.
getMarkerClasses(object, fcol = "markers", ...)
getMarkerClasses(object, fcol = "markers", ...)
object |
An instance of class |
fcol |
The name of the markers column in the |
... |
Additional parameters passed to |
A character
vector of the organelle classes in the data.
Lisa Breckels and Laurent Gatto
getMarkers
to extract the marker
proteins. See markers
for details about spatial
markers storage and encoding.
library("pRolocdata") data(dunkley2006) organelles <- getMarkerClasses(dunkley2006) ## same if markers encoded as a matrix dunkley2006 <- mrkVecToMat(dunkley2006, mfcol = "Markers") organelles2 <- getMarkerClasses(dunkley2006, fcol = "Markers") stopifnot(all.equal(organelles, organelles2))
library("pRolocdata") data(dunkley2006) organelles <- getMarkerClasses(dunkley2006) ## same if markers encoded as a matrix dunkley2006 <- mrkVecToMat(dunkley2006, mfcol = "Markers") organelles2 <- getMarkerClasses(dunkley2006, fcol = "Markers") stopifnot(all.equal(organelles, organelles2))
MSnSet
Convenience accessor to the organelle markers in an MSnSet
.
This function returns the organelle markers of an MSnSet
instance. As a side effect, it print out a marker table.
getMarkers(object, fcol = "markers", names = TRUE, verbose = TRUE)
getMarkers(object, fcol = "markers", names = TRUE, verbose = TRUE)
object |
An instance of class |
fcol |
The name of the markers column in the |
names |
A |
verbose |
If |
A character
(matrix
) of length (ncol)
ncol(object)
, depending on the vector or matrix encoding of
the markers.
Laurent Gatto
See getMarkerClasses
to get the classes
only. See markers
for details about spatial markers
storage and encoding.
library("pRolocdata") data(dunkley2006) ## marker vectors myVmarkers <- getMarkers(dunkley2006) head(myVmarkers) ## marker matrix dunkley2006 <- mrkVecToMat(dunkley2006, mfcol = "Markers") myMmarkers <- getMarkers(dunkley2006, fcol = "Markers") head(myMmarkers)
library("pRolocdata") data(dunkley2006) ## marker vectors myVmarkers <- getMarkers(dunkley2006) head(myVmarkers) ## marker matrix dunkley2006 <- mrkVecToMat(dunkley2006, mfcol = "Markers") myMmarkers <- getMarkers(dunkley2006, fcol = "Markers") head(myMmarkers)
"ClustDistList"
objectThis function computes and outputs normalised distances from a
"ClustDistList"
object.
getNormDist(object, p = 1/3)
getNormDist(object, p = 1/3)
object |
An instance of class |
p |
The normalisation factor. Default is 1/3. |
An numeric of normalised distances, one per protein set in the
ClustDistList
.
Lisa Breckels
"ClustDistList"
, "ClustDist"
,
and examples in clustDist
.
Convenience accessor to the predicted feature localisation in an 'MSnSet'.
This function returns the predictions of an
MSnSet
instance. As a side effect, it prints out a prediction table.
getPredictions(object, fcol, scol, mcol = "markers", t = 0, verbose = TRUE)
getPredictions(object, fcol, scol, mcol = "markers", t = 0, verbose = TRUE)
object |
An instance of class |
fcol |
The name of the prediction column in the
|
scol |
The name of the prediction score column in the
|
mcol |
The feature meta data column containing the labelled training data. |
t |
The score threshold. Predictions with score < t are set
to 'unknown'. Default is 0. It is also possible to define
thresholds for each prediction class, in which case, |
verbose |
If |
An instance of class "MSnSet" with fcol.pred
feature
variable storing the prediction results according to the chosen threshold.
Laurent Gatto and Lisa Breckels
orgQuants
for calculating organelle-specific
thresholds.
library("pRolocdata") data(dunkley2006) res <- svmClassification(dunkley2006, fcol = "pd.markers", sigma = 0.1, cost = 0.5) fData(res)$svm[500:510] fData(res)$svm.scores[500:510] getPredictions(res, fcol = "svm", t = 0) ## all predictions getPredictions(res, fcol = "svm", t = .9) ## single threshold ## 50% top predictions per class ts <- orgQuants(res, fcol = "svm", t = .5) getPredictions(res, fcol = "svm", t = ts)
library("pRolocdata") data(dunkley2006) res <- svmClassification(dunkley2006, fcol = "pd.markers", sigma = 0.1, cost = 0.5) fData(res)$svm[500:510] fData(res)$svm.scores[500:510] getPredictions(res, fcol = "svm", t = 0) ## all predictions getPredictions(res, fcol = "svm", t = .9) ## single threshold ## 50% top predictions per class ts <- orgQuants(res, fcol = "svm", t = .5) getPredictions(res, fcol = "svm", t = ts)
Converts GO identifiers to/from GO terms, either explicitly or by
checking if (any items in) the input contains "GO:"
.
goIdToTerm(x, names = TRUE, keepNA = TRUE) goTermToId(x, names = TRUE, keepNA = TRUE) flipGoTermId(x, names = TRUE, keepNA = TRUE) prettyGoTermId(x)
goIdToTerm(x, names = TRUE, keepNA = TRUE) goTermToId(x, names = TRUE, keepNA = TRUE) flipGoTermId(x, names = TRUE, keepNA = TRUE) prettyGoTermId(x)
x |
A |
names |
Should a named character be returned? Default is
|
keepNA |
Should any GO term/id names that are missing or obsolete
be replaced with a |
A character
of GO terms (ids) if x
were ids
(terms).
Laurent Gatto
goIdToTerm("GO:0000001") goIdToTerm("GO:0000001", names = FALSE) goIdToTerm(c("GO:0000001", "novalid")) goIdToTerm(c("GO:0000001", "GO:0000002", "notvalid")) goTermToId("mitochondrion inheritance") goTermToId("mitochondrion inheritance", name = FALSE) goTermToId(c("mitochondrion inheritance", "notvalid")) prettyGoTermId("mitochondrion inheritance") prettyGoTermId("GO:0000001") flipGoTermId("mitochondrion inheritance") flipGoTermId("GO:0000001") flipGoTermId("GO:0000001", names = FALSE)
goIdToTerm("GO:0000001") goIdToTerm("GO:0000001", names = FALSE) goIdToTerm(c("GO:0000001", "novalid")) goIdToTerm(c("GO:0000001", "GO:0000002", "notvalid")) goTermToId("mitochondrion inheritance") goTermToId("mitochondrion inheritance", name = FALSE) goTermToId(c("mitochondrion inheritance", "notvalid")) prettyGoTermId("mitochondrion inheritance") prettyGoTermId("GO:0000001") flipGoTermId("mitochondrion inheritance") flipGoTermId("GO:0000001") flipGoTermId("GO:0000001", names = FALSE)
Highlights a set of features of interest given as a
FeaturesOfInterest
instance on a PCA plot produced by
plot2D
or plot3D
. If none of the features of interest
are found in the MSnset
's featureNames
, an warning
is thrown.
highlightOnPlot(object, foi, labels, args = list(), ...) highlightOnPlot3D(object, foi, labels, args = list(), radius = 0.1 * 3, ...)
highlightOnPlot(object, foi, labels, args = list(), ...) highlightOnPlot3D(object, foi, labels, args = list(), radius = 0.1 * 3, ...)
object |
The main dataset described as an |
foi |
An instance of |
labels |
A |
args |
A named list of arguments to be passed to
|
... |
Additional parameters passed to |
radius |
Radius of the spheres to be added to the
visualisation produced by |
NULL; used for its side effects.
Laurent Gatto
library("pRolocdata") data("tan2009r1") x <- FeaturesOfInterest(description = "A test set of features of interest", fnames = featureNames(tan2009r1)[1:10], object = tan2009r1) ## using FeaturesOfInterest or feature names par(mfrow = c(2, 1)) plot2D(tan2009r1) highlightOnPlot(tan2009r1, x) plot2D(tan2009r1) highlightOnPlot(tan2009r1, featureNames(tan2009r1)[1:10]) .pca <- plot2D(tan2009r1) head(.pca) highlightOnPlot(.pca, x, col = "red") highlightOnPlot(tan2009r1, x, col = "red", cex = 1.5) highlightOnPlot(tan2009r1, x, labels = TRUE) .pca <- plot2D(tan2009r1, dims = c(1, 3)) highlightOnPlot(.pca, x, pch = "+", dims = c(1, 3)) highlightOnPlot(tan2009r1, x, args = list(dims = c(1, 3))) .pca2 <- plot2D(tan2009r1, mirrorX = TRUE, dims = c(1, 3)) ## previous pca matrix, need to mirror X axis highlightOnPlot(.pca, x, pch = "+", args = list(mirrorX = TRUE)) ## new pca matrix, with X mirrors (and 1st and 3rd PCs) highlightOnPlot(.pca2, x, col = "red") plot2D(tan2009r1) highlightOnPlot(tan2009r1, x) highlightOnPlot(tan2009r1, x, labels = TRUE, pos = 3) highlightOnPlot(tan2009r1, x, labels = "Flybase.Symbol", pos = 1) ## in 3 dimensions if (interactive()) { plot3D(tan2009r1, radius1 = 0.05) highlightOnPlot3D(tan2009r1, x, labels = TRUE) highlightOnPlot3D(tan2009r1, x) }
library("pRolocdata") data("tan2009r1") x <- FeaturesOfInterest(description = "A test set of features of interest", fnames = featureNames(tan2009r1)[1:10], object = tan2009r1) ## using FeaturesOfInterest or feature names par(mfrow = c(2, 1)) plot2D(tan2009r1) highlightOnPlot(tan2009r1, x) plot2D(tan2009r1) highlightOnPlot(tan2009r1, featureNames(tan2009r1)[1:10]) .pca <- plot2D(tan2009r1) head(.pca) highlightOnPlot(.pca, x, col = "red") highlightOnPlot(tan2009r1, x, col = "red", cex = 1.5) highlightOnPlot(tan2009r1, x, labels = TRUE) .pca <- plot2D(tan2009r1, dims = c(1, 3)) highlightOnPlot(.pca, x, pch = "+", dims = c(1, 3)) highlightOnPlot(tan2009r1, x, args = list(dims = c(1, 3))) .pca2 <- plot2D(tan2009r1, mirrorX = TRUE, dims = c(1, 3)) ## previous pca matrix, need to mirror X axis highlightOnPlot(.pca, x, pch = "+", args = list(mirrorX = TRUE)) ## new pca matrix, with X mirrors (and 1st and 3rd PCs) highlightOnPlot(.pca2, x, col = "red") plot2D(tan2009r1) highlightOnPlot(tan2009r1, x) highlightOnPlot(tan2009r1, x, labels = TRUE, pos = 3) highlightOnPlot(tan2009r1, x, labels = "Flybase.Symbol", pos = 1) ## in 3 dimensions if (interactive()) { plot3D(tan2009r1, radius1 = 0.05) highlightOnPlot3D(tan2009r1, x, labels = TRUE) highlightOnPlot3D(tan2009r1, x) }
Classification using for the k-nearest neighbours algorithm.
knnClassification( object, assessRes, scores = c("prediction", "all", "none"), k, fcol = "markers", ... )
knnClassification( object, assessRes, scores = c("prediction", "all", "none"), k, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
k |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to |
An instance of class "MSnSet"
with
knn
and knn.scores
feature variables storing the
classification results and scores respectively.
Laurent Gatto
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- knnOptimisation(dunkley2006, k = c(3, 10), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- knnClassification(dunkley2006, params) getPredictions(res, fcol = "knn") getPredictions(res, fcol = "knn", t = 0.75) plot2D(res, fcol = "knn")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- knnOptimisation(dunkley2006, k = c(3, 10), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- knnClassification(dunkley2006, params) getPredictions(res, fcol = "knn") getPredictions(res, fcol = "knn", t = 0.75) plot2D(res, fcol = "knn")
Classification parameter optimisation for the k-nearest neighbours algorithm.
knnOptimisation( object, fcol = "markers", k = seq(3, 15, 2), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
knnOptimisation( object, fcol = "markers", k = seq(3, 15, 2), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
k |
The hyper-parameter. Default values are |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
knnClassification
and example therein.
Classification using a variation of the KNN implementation of Wu and Dietterich's transfer learning schema
knntlClassification( primary, auxiliary, fcol = "markers", bestTheta, k, scores = c("prediction", "all", "none"), seed )
knntlClassification( primary, auxiliary, fcol = "markers", bestTheta, k, scores = c("prediction", "all", "none"), seed )
primary |
An instance of class |
auxiliary |
An instance of class
|
fcol |
The feature meta-data containing marker definitions.
Default is |
bestTheta |
Best theta vector as output from
|
k |
Numeric vector of length 2, containing the best |
scores |
One of |
seed |
The optional random number generator seed. |
A character vector of the classifications for the unknowns
Lisa Breckels
library(pRolocdata) data(andy2011) data(andy2011goCC) ## reducing calculation time of k by pre-running knnOptimisation x <- c(andy2011, andy2011goCC) k <- lapply(x, function(z) knnOptimisation(z, times=5, fcol = "markers.orig", verbose = FALSE)) k <- sapply(k, function(z) getParams(z)) k ## reducing parameter search with theta = 1, ## weights of only 1 or 0 will be considered opt <- knntlOptimisation(andy2011, andy2011goCC, fcol = "markers.orig", times = 2, by = 1, k = k) opt th <- getParams(opt) plot(opt) res <- knntlClassification(andy2011, andy2011goCC, fcol = "markers.orig", th, k) res
library(pRolocdata) data(andy2011) data(andy2011goCC) ## reducing calculation time of k by pre-running knnOptimisation x <- c(andy2011, andy2011goCC) k <- lapply(x, function(z) knnOptimisation(z, times=5, fcol = "markers.orig", verbose = FALSE)) k <- sapply(k, function(z) getParams(z)) k ## reducing parameter search with theta = 1, ## weights of only 1 or 0 will be considered opt <- knntlOptimisation(andy2011, andy2011goCC, fcol = "markers.orig", times = 2, by = 1, k = k) opt th <- getParams(opt) plot(opt) res <- knntlClassification(andy2011, andy2011goCC, fcol = "markers.orig", th, k) res
Classification parameter optimisation for the KNN implementation of Wu and Dietterich's transfer learning schema
knntlOptimisation( primary, auxiliary, fcol = "markers", k, times = 50, test.size = 0.2, xval = 5, by = 0.5, length.out, th, xfolds, BPPARAM = BiocParallel::bpparam(), method = "Breckels", log = FALSE, seed )
knntlOptimisation( primary, auxiliary, fcol = "markers", k, times = 50, test.size = 0.2, xval = 5, by = 0.5, length.out, th, xfolds, BPPARAM = BiocParallel::bpparam(), method = "Breckels", log = FALSE, seed )
primary |
An instance of class |
auxiliary |
An instance of class
|
fcol |
The feature meta-data containing marker definitions.
Default is |
k |
Numeric vector of length 2, containing the best |
times |
The number of times cross-validation is performed. Default is 50. |
test.size |
The size of test (validation) data. Default is 0.2 (20 percent). |
xval |
The number of rounds of cross-validation to perform. |
by |
The increment for theta, must be one of |
length.out |
Alternative to using |
th |
A matrix of theta values to test for each class as
generated from the function |
xfolds |
Option to pass specific folds for the cross validation. |
BPPARAM |
Required for parallelisation. If not specified
selects a default |
method |
The k-NN transfer learning method to use. The default is 'Breckels' as described in the Breckels et al (2016). If 'Wu' is specificed then the original method implemented Wu and Dietterich (2004) is implemented. |
log |
A |
seed |
The optional random number generator seed. |
knntlOptimisation
implements a variation of Wu and
Dietterich's transfer learning schema: P. Wu and
T. G. Dietterich. Improving SVM accuracy by training on auxiliary
data sources. In Proceedings of the Twenty-First International
Conference on Machine Learning, pages 871 - 878. Morgan Kaufmann,
2004. A grid search for the best theta is performed.
A list of containing the theta combinations tested, associated macro F1 score and accuracy for each combination over each round (specified by times).
Lisa Breckels
Breckels LM, Holden S, Wonjar D, Mulvey CM, Christoforou A, Groen AJ, Kohlbacher O, Lilley KS, Gatto L. Learning from heterogeneous data sources: an application in spatial proteomics. bioRxiv. doi: http://dx.doi.org/10.1101/022152
Wu P, Dietterich TG. Improving SVM Accuracy by Training on Auxiliary Data Sources. Proceedings of the 21st International Conference on Machine Learning (ICML); 2004.
knntlClassification
and example therein.
Classification using the support vector machine algorithm.
ksvmClassification( object, assessRes, scores = c("prediction", "all", "none"), cost, fcol = "markers", ... )
ksvmClassification( object, assessRes, scores = c("prediction", "all", "none"), cost, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
cost |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to |
An instance of class "MSnSet"
with
ksvm
and ksvm.scores
feature variables storing
the classification results and scores respectively.
Laurent Gatto
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- ksvmOptimisation(dunkley2006, cost = 2^seq(-1,4,5), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- ksvmClassification(dunkley2006, params) getPredictions(res, fcol = "ksvm") getPredictions(res, fcol = "ksvm", t = 0.75) plot2D(res, fcol = "ksvm")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- ksvmOptimisation(dunkley2006, cost = 2^seq(-1,4,5), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- ksvmClassification(dunkley2006, params) getPredictions(res, fcol = "ksvm") getPredictions(res, fcol = "ksvm", t = 0.75) plot2D(res, fcol = "ksvm")
Classification parameter optimisation for the support vector machine algorithm.
ksvmOptimisation( object, fcol = "markers", cost = 2^(-4:4), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
ksvmOptimisation( object, fcol = "markers", cost = 2^(-4:4), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
cost |
The hyper-parameter. Default values are |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
ksvmClassification
and example therein.
MSnSet
Creates a new "MSnSet"
instance populated
with a GO term binary matrix based on an original object
.
makeGoSet(object, params, namespace = "cellular_component", evidence = NULL)
makeGoSet(object, params, namespace = "cellular_component", evidence = NULL)
object |
An instance of class |
params |
An instance of class |
namespace |
The ontology name space. One or several of
|
evidence |
GO evidence filtering. |
A new "MSnSet"
with the GO terms
for the respective features in the original object
.
Laurent Gatto
library("pRolocdata") data(dunkley2006) data(dunkley2006params) goset <- makeGoSet(dunkley2006[1:10, ], dunkley2006params) goset exprs(goset) image(goset)
library("pRolocdata") data(dunkley2006) data(dunkley2006params) goset <- makeGoSet(dunkley2006[1:10, ], dunkley2006params) goset exprs(goset) image(goset)
These functions implement the T augmented Gaussian mixture (TAGM) model for mass spectrometry-based spatial proteomics datasets using the maximum a posteriori (MAP) optimisation routine.
## S4 method for signature 'MAPParams' show(object) logPosteriors(x) tagmMapTrain( object, fcol = "markers", method = "MAP", numIter = 100, mu0 = NULL, lambda0 = 0.01, nu0 = NULL, S0 = NULL, beta0 = NULL, u = 2, v = 10, seed = NULL ) tagmMapPredict( object, params, fcol = "markers", probJoint = FALSE, probOutlier = TRUE )
## S4 method for signature 'MAPParams' show(object) logPosteriors(x) tagmMapTrain( object, fcol = "markers", method = "MAP", numIter = 100, mu0 = NULL, lambda0 = 0.01, nu0 = NULL, S0 = NULL, beta0 = NULL, u = 2, v = 10, seed = NULL ) tagmMapPredict( object, params, fcol = "markers", probJoint = FALSE, probOutlier = TRUE )
object |
An |
x |
An object of class 'MAPParams'. |
fcol |
The feature meta-data containing marker definitions.
Default is |
method |
A |
numIter |
The number of iterations of the expectation-maximisation algorithm. Default is 100. |
mu0 |
The prior mean. Default is |
lambda0 |
The prior shrinkage. Default is 0.01. |
nu0 |
The prior degreed of freedom. Default is
|
S0 |
The prior inverse-wishary scale matrix. Empirical prior used by default. |
beta0 |
The prior Dirichlet distribution concentration. Default is 1 for each class. |
u |
The prior shape parameter for Beta(u, v). Default is 2 |
v |
The prior shape parameter for Beta(u, v). Default is 10. |
seed |
The optional random number generator seed. |
params |
An instance of class |
probJoint |
A |
probOutlier |
A |
The tagmMapTrain
function generates the MAP parameters (object or class
MAPParams
) based on an annotated quantitative spatial proteomics dataset
(object of class MSnbase::MSnSet
). Both are then passed to the
tagmPredict
function to predict the sub-cellular localisation of protein
of unknown localisation. See the pRoloc-bayesian vignette for details and
examples. In this implementation, if numerical instability is detected in
the covariance matrix of the data a small multiple of the identity is
added. A message is printed if this conditioning step is performed.
tagmMapTrain
returns an instance of class MAPParams()
.
tagmPredict
returns an instance of class
MSnbase::MSnSet
containing the localisation predictions as
a new tagm.map.allocation
feature variable.
method
A character()
storing the TAGM method name.
priors
A list()
with the priors for the parameters
seed
An integer()
with the random number generation seed.
posteriors
A list()
with the updated posterior parameters
and log-posterior of the model.
datasize
A list()
with details about size of data
Laurent Gatto
Oliver M. Crook
A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269
The plotEllipse()
function can be used to visualise
TAGM models on PCA plots with ellipses. The tagmMapTrain()
function to use the TAGM MAP method.
These function extract the marker or unknown proteins into a new
MSnSet
.
markerMSnSet(object, fcol = "markers") unknownMSnSet(object, fcol = "markers")
markerMSnSet(object, fcol = "markers") unknownMSnSet(object, fcol = "markers")
object |
An instance of class |
fcol |
The name of the feature data column, that will be used
to separate the markers from the proteins of unknown
localisation. When the markers are encoded as vectors, features of
unknown localisation are defined as |
An new MSnSet
with marker/unknown proteins only.
Laurent Gatto
sampleMSnSet
testMSnSet
and
markers
for markers encoding.
library("pRolocdata") data(dunkley2006) mrk <- markerMSnSet(dunkley2006) unk <- unknownMSnSet(dunkley2006) dim(dunkley2006) dim(mrk) dim(unk) table(fData(dunkley2006)$markers) table(fData(mrk)$markers) table(fData(unk)$markers) ## matrix-encoded markers dunkley2006 <- mrkVecToMat(dunkley2006) dim(markerMSnSet(dunkley2006, "Markers")) stopifnot(all.equal(featureNames(markerMSnSet(dunkley2006, "Markers")), featureNames(markerMSnSet(dunkley2006, "markers")))) dim(unknownMSnSet(dunkley2006, "Markers")) stopifnot(all.equal(featureNames(unknownMSnSet(dunkley2006, "Markers")), featureNames(unknownMSnSet(dunkley2006, "markers"))))
library("pRolocdata") data(dunkley2006) mrk <- markerMSnSet(dunkley2006) unk <- unknownMSnSet(dunkley2006) dim(dunkley2006) dim(mrk) dim(unk) table(fData(dunkley2006)$markers) table(fData(mrk)$markers) table(fData(unk)$markers) ## matrix-encoded markers dunkley2006 <- mrkVecToMat(dunkley2006) dim(markerMSnSet(dunkley2006, "Markers")) stopifnot(all.equal(featureNames(markerMSnSet(dunkley2006, "Markers")), featureNames(markerMSnSet(dunkley2006, "markers")))) dim(unknownMSnSet(dunkley2006, "Markers")) stopifnot(all.equal(featureNames(unknownMSnSet(dunkley2006, "Markers")), featureNames(unknownMSnSet(dunkley2006, "markers"))))
"MartInstance"
Internal infrastructure to query/handle several individual mart
instance. See MartInterface.R
for details.
Laurent Gatto <[email protected]>
Helper function to get the number of outlier at each MCMC iteration.
Helper function to get mean component allocation at each MCMC iteration.
Helper function to get mean probability of belonging to outlier at each iteration.
Wrapper for the geweke diagnostics from coda package also return p-values.
Helper function to pool chains together after processing
Helper function to burn n iterations from the front of the chains
Helper function to subsample the chains, known informally as thinning.
Produces a violin plot with the protein posterior probabilities distributions for all organelles.
mcmc_get_outliers(x) mcmc_get_meanComponent(x) mcmc_get_meanoutliersProb(x) geweke_test(k) mcmc_pool_chains(param) mcmc_burn_chains(x, n = 50) mcmc_thin_chains(x, freq = 5) ## S4 method for signature 'MCMCParams,character' plot(x, y, ...)
mcmc_get_outliers(x) mcmc_get_meanComponent(x) mcmc_get_meanoutliersProb(x) geweke_test(k) mcmc_pool_chains(param) mcmc_burn_chains(x, n = 50) mcmc_thin_chains(x, freq = 5) ## S4 method for signature 'MCMCParams,character' plot(x, y, ...)
x |
Object of class |
k |
A |
param |
An object of class |
n |
|
freq |
Thinning frequency. The function retains every 'freq'th iteration and is an 'integer(1)'. The default thinning frequency is '5'. |
y |
A 'character(1)' with a protein name. |
... |
Currently ignored. |
A list
of length length(x)
.
A list
of length length(x)
.
A list
of length length(x)
.
A matrix
with the test z- and p-values for each chain.
A pooled MCMCParams
object.
An updated MCMCParams
object.
A thinned 'MCMCParams' object.
A ggplot2 object.
Laurent Gatto
The MCMCParams
infrastructure is used to store and process
Marchov chain Monte Carlo results for the T-Augmented Gaussian
Mixture model (TAGM) from Crook et al. (2018).
chains(object) ## S4 method for signature 'MCMCParams' show(object) ## S4 method for signature 'ComponentParam' show(object) ## S4 method for signature 'MCMCChain' show(object) ## S4 method for signature 'MCMCChains' length(x) ## S4 method for signature 'MCMCParams' length(x) ## S4 method for signature 'MCMCChains,ANY,ANY' x[[i, j = "missing", drop = "missing"]] ## S4 method for signature 'MCMCParams,ANY,ANY' x[[i, j = "missing", drop = "missing"]] ## S4 method for signature 'MCMCChains,ANY,ANY,ANY' x[i, j = "missing", drop = "missing"] ## S4 method for signature 'MCMCParams,ANY,ANY,ANY' x[i, j = "missing", drop = "missing"] ## S4 method for signature 'MCMCChains' show(object)
chains(object) ## S4 method for signature 'MCMCParams' show(object) ## S4 method for signature 'ComponentParam' show(object) ## S4 method for signature 'MCMCChain' show(object) ## S4 method for signature 'MCMCChains' length(x) ## S4 method for signature 'MCMCParams' length(x) ## S4 method for signature 'MCMCChains,ANY,ANY' x[[i, j = "missing", drop = "missing"]] ## S4 method for signature 'MCMCParams,ANY,ANY' x[[i, j = "missing", drop = "missing"]] ## S4 method for signature 'MCMCChains,ANY,ANY,ANY' x[i, j = "missing", drop = "missing"] ## S4 method for signature 'MCMCParams,ANY,ANY,ANY' x[i, j = "missing", drop = "missing"] ## S4 method for signature 'MCMCChains' show(object)
object |
An instance of appropriate class. |
x |
Object to be subset. |
i |
An |
j |
Missing. |
drop |
Missing. |
Objects of the MCMCParams
class are created with the
tagmMcmcTrain()
function. These objects store the priors of
the generative TAGM model and the results of the MCMC chains,
which themselves are stored as an instance of class MCMCChains
and can be accessed with the chains()
function. A summary of the
MCMC chains (or class MCMCSummary
) can be further computed with
the tagmMcmcProcess()
function.
See the pRoloc-bayesian vignette for examples.
chains
list()
containing the individual full MCMC chain
results in an MCMCChains
instance. Each element must be a
valid MCMCChain
instance.
posteriorEstimates
A data.frame
documenting the prosterior
priors in an MCMCSummary
instance. It contains N rows and
columns tagm.allocation
, tagm.probability
, tagm.outlier
,
tagm.probability.lowerquantile
,
tagm.probability.upperquantile
and tagm.mean.shannon
.
diagnostics
A matrix
of dimensions 1 by 2 containing the
MCMCSummary
diagnostics.
tagm.joint
A matrix
of dimensions N by K storing the joint
probability in an MCMCSummary
instance.
method
character(1)
describing the method in the
MCMCParams
object.
chains
Object of class MCMCChains
containing the full MCMC
chain results stored in the MCMCParams
object.
priors
list()
summary
Object of class MCMCSummary
the summarised MCMC
results available in the MCMCParams
instance.
n
integer(1)
indicating the number of MCMC interactions.
Stored in an MCMCChain
instance.
K
integer(1)
indicating the number of components. Stored
in an MCMCChain
instance.
N
integer(1)
indicating the number of proteins. Stored in
an MCMCChain
instance.
Component
matrix(N, n)
component allocation results of an
MCMCChain
instance.
ComponentProb
matrix(N, n, K)
component allocation
probabilities of an MCMCChain
instance.
Outlier
matrix(N, n)
outlier allocation results.
OutlierProb
matrix(N, n, 2)
outlier allocation
probabilities of an MCMCChain
instance.
The function tagmMcmcTrain()
to construct object of
this class.
This function updates an MSnSet
instances and sets
markers class to unknown
if there are less than n
instances.
minMarkers(object, n = 10, fcol = "markers")
minMarkers(object, n = 10, fcol = "markers")
object |
An instance of class |
n |
Minumum of marker instances per class. |
fcol |
The name of the markers column in the |
An instance of class "MSnSet"
with a new
feature variables, named after the original fcol
variable and
the n
value.
Laurent Gatto
getPredictions
to filter based on
classification scores.
library(pRolocdata) data(dunkley2006) d2 <- minMarkers(dunkley2006, 20) getMarkers(dunkley2006) getMarkers(d2, fcol = "markers20")
library(pRolocdata) data(dunkley2006) d2 <- minMarkers(dunkley2006, 20) getMarkers(dunkley2006) getMarkers(d2, fcol = "markers20")
Model calibration model with posterior z-scores and posterior shrinkage
mixing_posterior_check(object, params, priors, fcol = "markers")
mixing_posterior_check(object, params, priors, fcol = "markers")
object |
A valid object of class |
params |
A valid object of class |
priors |
The prior that were used in the model |
fcol |
The columns of the feature data which contain the marker data. |
Used for side effect of producing plot. Invisibily returns an ggplot object that can be further manipulated
Oliver M. Crook <[email protected]>
## Not run: library("pRoloc") data("tan2009r1") tanres <- tagmMcmcTrain(object = tan2009r1) tanres <- tagmMcmcProcess(tanres) tan2009r1 <- tagmMcmcPredict(object = tan2009r1, params = tanres, probJoint = TRUE) myparams <- chains(e14Tagm_converged_pooled)[[1]] myparams2 <- chains(mcmc_pool_chains(tanres))[[1]] priors <- tanres@priors pRoloc:::mixing_posterior_check(object = tan2009r1, params = myparams2, priors = priors) ## End(Not run)
## Not run: library("pRoloc") data("tan2009r1") tanres <- tagmMcmcTrain(object = tan2009r1) tanres <- tagmMcmcProcess(tanres) tan2009r1 <- tagmMcmcPredict(object = tan2009r1, params = tanres, probJoint = TRUE) myparams <- chains(e14Tagm_converged_pooled)[[1]] myparams2 <- chains(mcmc_pool_chains(tanres))[[1]] priors <- tanres@priors pRoloc:::mixing_posterior_check(object = tan2009r1, params = myparams2, priors = priors) ## End(Not run)
MLearn
interface for machine learningThis method implements MLInterfaces
'
MLean
method for instances of the class
"MSnSet"
.
signature(formula = "formula", data = "MSnSet", .method
= "learnerSchema", trainInd = "numeric")
The learning problem is stated with the formula
and applies
the .method
schema on the MSnSet
data
input
using the trainInd
numeric indices as train data.
signature(formula = "formula", data = "MSnSet", .method
= "learnerSchema", trainInd = "xvalSpec")
In this case, an instance of xvalSpec
is used for
cross-validation.
signature(formula = "formula", data = "MSnSet", .method
= "clusteringSchema", trainInd = "missing")
Hierarchical (hclustI
), k-means (kmeansI
) and
partitioning around medoids (pamI
) clustering algorithms
using MLInterface
's MLearn
interface.
The MLInterfaces
package documentation, in particular MLearn
.
Given two MSnSet
instances of one MSnSetList
with at
least two items, this function produces an animation that shows
the transition from the first data to the second.
move2Ds(object, pcol, fcol = "markers", n = 25, hl)
move2Ds(object, pcol, fcol = "markers", n = 25, hl)
object |
An |
pcol |
If |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different colours. Default
is |
n |
Number of frames, Default is 25. |
hl |
An optional instance of class
|
Used for its side effect of producing a short animation.
Laurent Gatto
plot2Ds
to a single figure with the two
datasets.
library("pRolocdata") data(dunkley2006) ## Create a relevant MSnSetList using the dunkley2006 data xx <- split(dunkley2006, "replicate") xx1 <- xx[[1]] xx2 <- xx[[2]] fData(xx1)$markers[374] <- "Golgi" fData(xx2)$markers[412] <- "unknown" xx@x[[1]] <- xx1 xx@x[[2]] <- xx2 ## The features we want to track foi <- FeaturesOfInterest(description = "test", fnames = featureNames(xx[[1]])[c(374, 412)]) ## (1) visualise each experiment separately par(mfrow = c(2, 1)) plot2D(xx[[1]], main = "condition A") highlightOnPlot(xx[[1]], foi) plot2D(xx[[2]], mirrorY = TRUE, main = "condition B") highlightOnPlot(xx[[2]], foi, args = list(mirrorY = TRUE)) ## (2) plot both data on the same plot par(mfrow = c(1, 1)) tmp <- plot2Ds(xx) highlightOnPlot(data1(tmp), foi, lwd = 2) highlightOnPlot(data2(tmp), foi, pch = 5, lwd = 2) ## (3) create an animation move2Ds(xx, pcol = "replicate") move2Ds(xx, pcol = "replicate", hl = foi)
library("pRolocdata") data(dunkley2006) ## Create a relevant MSnSetList using the dunkley2006 data xx <- split(dunkley2006, "replicate") xx1 <- xx[[1]] xx2 <- xx[[2]] fData(xx1)$markers[374] <- "Golgi" fData(xx2)$markers[412] <- "unknown" xx@x[[1]] <- xx1 xx@x[[2]] <- xx2 ## The features we want to track foi <- FeaturesOfInterest(description = "test", fnames = featureNames(xx[[1]])[c(374, 412)]) ## (1) visualise each experiment separately par(mfrow = c(2, 1)) plot2D(xx[[1]], main = "condition A") highlightOnPlot(xx[[1]], foi) plot2D(xx[[2]], mirrorY = TRUE, main = "condition B") highlightOnPlot(xx[[2]], foi, args = list(mirrorY = TRUE)) ## (2) plot both data on the same plot par(mfrow = c(1, 1)) tmp <- plot2Ds(xx) highlightOnPlot(data1(tmp), foi, lwd = 2) highlightOnPlot(data2(tmp), foi, pch = 5, lwd = 2) ## (3) create an animation move2Ds(xx, pcol = "replicate") move2Ds(xx, pcol = "replicate", hl = foi)
A function to calculate average marker profiles.
mrkConsProfiles(object, fcol = "markers", method = mean)
mrkConsProfiles(object, fcol = "markers", method = mean)
object |
An instance of class |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different
colours. Default is |
method |
A |
A matrix
of dimensions number of clusters
(exluding unknowns) by number of fractions.
Laurent Gatto and Lisa M. Breckels
The mrkHClust
function to produce a
hierarchical cluster.
library("pRolocdata") data(dunkley2006) mrkConsProfiles(dunkley2006) mrkConsProfiles(dunkley2006, method = median) mm <- mrkConsProfiles(dunkley2006) ## Reorder fractions o <- order(dunkley2006$fraction) ## Plot mean organelle profiles using the ## default pRoloc colour palette. matplot(t(mm[, o]), type = "l", xlab = "Fractions", ylab = "Relative intensity", main = "Mean organelle profiles", col = getStockcol(), lwd = 2, lty = 1) ## Add a legend addLegend(markerMSnSet(dunkley2006), where = "topleft")
library("pRolocdata") data(dunkley2006) mrkConsProfiles(dunkley2006) mrkConsProfiles(dunkley2006, method = median) mm <- mrkConsProfiles(dunkley2006) ## Reorder fractions o <- order(dunkley2006$fraction) ## Plot mean organelle profiles using the ## default pRoloc colour palette. matplot(t(mm[, o]), type = "l", xlab = "Fractions", ylab = "Relative intensity", main = "Mean organelle profiles", col = getStockcol(), lwd = 2, lty = 1) ## Add a legend addLegend(markerMSnSet(dunkley2006), where = "topleft")
This functions calculates an average protein profile for each
marker class (proteins of unknown localisation are ignored) and
then generates a dendrogram representing the relation between
marker classes. The colours used for the dendrogram labels are
taken from the default colours (see getStockcol
) so
as to match the colours with other spatial proteomics
visualisations such as plot2D
.
mrkHClust( object, fcol = "markers", distargs, hclustargs, method = mean, plot = TRUE, ... )
mrkHClust( object, fcol = "markers", distargs, hclustargs, method = mean, plot = TRUE, ... )
object |
An instance of class |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different
colours. Default is |
distargs |
A |
hclustargs |
A |
method |
A |
plot |
A |
... |
Additional parameters passed when plotting the
|
Invisibly returns a dendrogram object, containing the
hierarchical cluster as computed by hclust
.
Laurent Gatto
library("pRolocdata") data(dunkley2006) mrkHClust(dunkley2006)
library("pRolocdata") data(dunkley2006) mrkHClust(dunkley2006)
Functions producing a new vector (matrix) marker vector set from an existing matrix (vector) marker set.
mrkVecToMat(object, vfcol = "markers", mfcol = "Markers") mrkMatToVec(object, mfcol = "Markers", vfcol = "markers") mrkMatAndVec(object, vfcol = "markers", mfcol = "Markers") showMrkMat(object, mfcol = "Markers") isMrkMat(object, fcol = "Markers") isMrkVec(object, fcol = "markers") mrkEncoding(object, fcol = "markers")
mrkVecToMat(object, vfcol = "markers", mfcol = "Markers") mrkMatToVec(object, mfcol = "Markers", vfcol = "markers") mrkMatAndVec(object, vfcol = "markers", mfcol = "Markers") showMrkMat(object, mfcol = "Markers") isMrkMat(object, fcol = "Markers") isMrkVec(object, fcol = "markers") mrkEncoding(object, fcol = "markers")
object |
An |
vfcol |
The name of the vector marker feature
variable. Default is |
mfcol |
The name of the matrix marker feature
variable. Default is |
fcol |
A marker feature variable name. |
Sub-cellular markers can be encoded in two different ways. Sets of
spatial markers can be represented as character vectors
(character
or factor
, to be accurate), stored as
feature metadata, and proteins of unknown or uncertain
localisation (unlabelled, to be classified) are marked with the
"unknown"
character. While very handy, this encoding
suffers from some drawbacks, in particular the difficulty to label
proteins that reside in multiple (possible or actual)
localisations. The markers vector feature data is typically named
markers
. A new matrix encoding is also
supported. Each spatial compartment is defined in a column in a
binary markers matrix and the resident proteins are encoded with
1s. The markers matrix feature data is typically named
Markers
. If proteins are assigned unique localisations only
(i.e. no multi-localisation) or their localisation is unknown
(unlabelled), then both encodings are equivalent. When the markers
are encoded as vectors, features of unknown localisation are
defined as fData(object)[, fcol] == "unknown"
. For
matrix-encoded markers, unlabelled proteins are defined as
rowSums(fData(object)[, fcol]) == 0
.
The mrkMatToVec
and mrkVecToMat
functions enable the
conversion from matrix (vector) to vector (matrix). The
mrkMatAndVec
function generates the missing encoding from
the existing one. If the destination encoding already exists, or,
more accurately, if the feature variable of the destination
encoding exists, an error is thrown. During the conversion from
matrix to vector, if multiple possible label exists, they are
dropped, i.e. they are converted to "unknown"
. Function
isMrkVec
and isMrkMat
can be used to test if a
marker set is encoded as a vector or a matrix. mrkEncoding
returns either "vector"
or "matrix"
depending on the
nature of the markers.
An updated MSnSet
with a new vector (matrix) marker
set.
Laurent Gatto and Lisa Breckels
Other functions that operate on markers are
getMarkers
, getMarkerClasses
and
markerMSnSet
. To add markers to an existing
MSnSet
, see the addMarkers
function and
pRolocmarkers
, for a list of suggested markers.
library("pRolocdata") data(dunkley2006) dunk <- mrkVecToMat(dunkley2006) head(fData(dunk)$Markers) fData(dunk)$markers <- NULL dunk <- mrkMatToVec(dunk) stopifnot(all.equal(fData(dunkley2006)$markers, fData(dunk)$markers))
library("pRolocdata") data(dunkley2006) dunk <- mrkVecToMat(dunkley2006) head(fData(dunk)$Markers) fData(dunk)$markers <- NULL dunk <- mrkMatToVec(dunk) stopifnot(all.equal(fData(dunkley2006)$markers, fData(dunk)$markers))
Classification using the naive Bayes algorithm.
nbClassification( object, assessRes, scores = c("prediction", "all", "none"), laplace, fcol = "markers", ... )
nbClassification( object, assessRes, scores = c("prediction", "all", "none"), laplace, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
laplace |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to
|
An instance of class "MSnSet"
with
nb
and nb.scores
feature variables storing the
classification results and scores respectively.
Laurent Gatto
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- nbOptimisation(dunkley2006, laplace = c(0, 5), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- nbClassification(dunkley2006, params) getPredictions(res, fcol = "naiveBayes") getPredictions(res, fcol = "naiveBayes", t = 1) plot2D(res, fcol = "naiveBayes")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- nbOptimisation(dunkley2006, laplace = c(0, 5), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- nbClassification(dunkley2006, params) getPredictions(res, fcol = "naiveBayes") getPredictions(res, fcol = "naiveBayes", t = 1) plot2D(res, fcol = "naiveBayes")
Classification algorithm parameter for the naive Bayes algorithm.
nbOptimisation( object, fcol = "markers", laplace = seq(0, 5, 0.5), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
nbOptimisation( object, fcol = "markers", laplace = seq(0, 5, 0.5), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
laplace |
The hyper-parameter. Default values are |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
nbClassification
and example therein.
Produces a pca plot with uncertainty in organelle means projected onto the PCA plot with contours.
nicheMeans2D( object, params, priors, dims = c(1, 2), fcol = "markers", aspect = 0.5 )
nicheMeans2D( object, params, priors, dims = c(1, 2), fcol = "markers", aspect = 0.5 )
object |
A valid object of class |
params |
A valid object of class |
priors |
The prior that were used in the model |
dims |
The PCA dimension in which to project he data, default is
|
fcol |
The columns of the feature data which contain the marker data. |
aspect |
A argument to change the plotting aspect of the PCA |
Used for side effect of producing plot. Invisibily returns an ggplot object that can be further manipulated
Oliver M. Crook <[email protected]>
## Not run: library("pRolocdata") data("tan2009r1") tanres <- tagmMcmcTrain(object = tan2009r1) tanres <- tagmMcmcProcess(tanres) tan2009r1 <- tagmMcmcPredict(object = tan2009r1, params = tanres, probJoint = TRUE) myparams <- chains(e14Tagm_converged_pooled)[[1]] myparams2 <- chains(mcmc_pool_chains(tanres))[[1]] priors <- tanres@priors pRoloc:::nicheMeans2D(object = tan2009r1, params = myparams2, priors = priors) ## End(Not run)
## Not run: library("pRolocdata") data("tan2009r1") tanres <- tagmMcmcTrain(object = tan2009r1) tanres <- tagmMcmcProcess(tanres) tan2009r1 <- tagmMcmcPredict(object = tan2009r1, params = tanres, probJoint = TRUE) myparams <- chains(e14Tagm_converged_pooled)[[1]] myparams2 <- chains(mcmc_pool_chains(tanres))[[1]] priors <- tanres@priors pRoloc:::nicheMeans2D(object = tan2009r1, params = myparams2, priors = priors) ## End(Not run)
Methods computing the nearest neighbour indices and distances for
matrix
and MSnSet
instances.
signature(object = "matrix", k = "numeric", dist =
"character", ...)
Calculates indices and distances to the
k
(default is 3) nearest neighbours of each feature (row)
in the input matrix object
. The distance dist
can
be either of "euclidean"
or
"mahalanobis"
. Additional parameters can be passed to the
internal function FNN::get.knn
. Output is a matrix with
2 * k
columns and nrow(object)
rows.
signature(object = "MSnSet", k = "numeric", dist =
"character", ...)
As above, but for an MSnSet
input. The indices and distances to the k
nearest
neighbours are added to the object's feature metadata.
signature(object = "matrix", query = "matrix", k =
"numeric", ...)
If two matrix
instances are provided as
input, the k
(default is 3) indices and distances of the
nearest neighbours of query
in object
are returned
as a matrix of dimensions 2 * k
by
nrow(query)
. Additional parameters are passed to
FNN::get.knnx
. Only euclidean distance is available.
library("pRolocdata") data(dunkley2006) ## Using a matrix as input m <- exprs(dunkley2006) m[1:4, 1:3] head(nndist(m, k = 5)) tail(nndist(m[1:100, ], k = 2, dist = "mahalanobis")) ## Same as above for MSnSet d <- nndist(dunkley2006, k = 5) head(fData(d)) d <- nndist(dunkley2006[1:100, ], k = 2, dist = "mahalanobis") tail(fData(d)) ## Using a query nndist(m[1:100, ], m[101:110, ], k = 2)
library("pRolocdata") data(dunkley2006) ## Using a matrix as input m <- exprs(dunkley2006) m[1:4, 1:3] head(nndist(m, k = 5)) tail(nndist(m[1:100, ], k = 2, dist = "mahalanobis")) ## Same as above for MSnSet d <- nndist(dunkley2006, k = 5) head(fData(d)) d <- nndist(dunkley2006[1:100, ], k = 2, dist = "mahalanobis") tail(fData(d)) ## Using a query nndist(m[1:100, ], m[101:110, ], k = 2)
Classification using the artificial neural network algorithm.
nnetClassification( object, assessRes, scores = c("prediction", "all", "none"), decay, size, fcol = "markers", ... )
nnetClassification( object, assessRes, scores = c("prediction", "all", "none"), decay, size, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
decay |
If |
size |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to |
An instance of class "MSnSet"
with
nnet
and nnet.scores
feature variables storing
the classification results and scores respectively.
Laurent Gatto
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- nnetOptimisation(dunkley2006, decay = 10^(c(-1, -5)), size = c(5, 10), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- nnetClassification(dunkley2006, params) getPredictions(res, fcol = "nnet") getPredictions(res, fcol = "nnet", t = 0.75) plot2D(res, fcol = "nnet")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- nnetOptimisation(dunkley2006, decay = 10^(c(-1, -5)), size = c(5, 10), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- nnetClassification(dunkley2006, params) getPredictions(res, fcol = "nnet") getPredictions(res, fcol = "nnet", t = 0.75) plot2D(res, fcol = "nnet")
Classification parameter optimisation for artificial neural network algorithm.
nnetOptimisation( object, fcol = "markers", decay = c(0, 10^(-1:-5)), size = seq(1, 10, 2), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
nnetOptimisation( object, fcol = "markers", decay = c(0, 10^(-1:-5)), size = seq(1, 10, 2), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
decay |
The hyper-parameter. Default values are |
size |
The hyper-parameter. Default values are |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
nnetClassification
and example therein.
For a given matrix of annotation information, this function returns the information ordered according to the best fit with the data.
orderGoAnnotations( object, fcol = "GOAnnotations", k = 1:5, n = 5, p = 1/3, verbose = TRUE, seed )
orderGoAnnotations( object, fcol = "GOAnnotations", k = 1:5, n = 5, p = 1/3, verbose = TRUE, seed )
object |
An instance of class |
fcol |
The name of the annotations matrix. Default is
|
k |
The number of clusters to test. Default is |
n |
The minimum number of proteins per component cluster. |
p |
The normalisation factor, per |
verbose |
A |
seed |
An optional random number generation seed. |
As there are typically many protein/annotation sets that may fit the data we order protein sets by best fit i.e. cluster tightness, by computing the mean normalised Euclidean distance for all instances per protein set.
For each protein set i.e. proteins that have been labelled
with a specified term/information criteria, we find the best
k
cluster components for the set (the default is to
testk = 1:5
) according to the minimum mean normalised
pairwise Euclidean distance over all component clusters.
(Note: when testing k
if any components are found to
have less than n
proteins these components are not
included and k
is reduced by 1).
Each component cluster is normalised by N^p
(where
N
is the total number of proteins per component,
and p
is the power). Hueristally, p = 1/3
and normalising by N^1/3
has been found the optimum
normalisation factor.
Candidates in the matrix are ordered according to lowest mean normalised pairwise Euclidean distance as we expect high density, tight clusters to have the smallest mean normalised distance.
This function is a wrapper for running clustDist
,
getNormDist
, see the "Annotating spatial proteomics data"
vignette for more details.
An updated MSnSet
containing the newly ordered
fcol
matrix.
Lisa M Breckels
addGoAnnotations
and example therein.
This function produces organelle-specific quantiles corresponding to the given classification scores.
orgQuants(object, fcol, scol, mcol = "markers", t, verbose = TRUE)
orgQuants(object, fcol, scol, mcol = "markers", t, verbose = TRUE)
object |
An instance of class |
fcol |
The name of the prediction column in the
|
scol |
The name of the prediction score column in the
|
mcol |
The name of the column containing the training data in the
|
t |
The quantile threshold. |
verbose |
If |
A named vector
of organelle thresholds.
Lisa Breckels
getPredictions
to get organelle predictions based
on calculated thresholds.
library("pRolocdata") data(dunkley2006) res <- svmClassification(dunkley2006, fcol = "pd.markers", sigma = 0.1, cost = 0.5) ## 50% top predictions per class ts <- orgQuants(res, fcol = "svm", t = .5) getPredictions(res, fcol = "svm", t = ts)
library("pRolocdata") data(dunkley2006) res <- svmClassification(dunkley2006, fcol = "pd.markers", sigma = 0.1, cost = 0.5) ## 50% top predictions per class ts <- orgQuants(res, fcol = "svm", t = .5) getPredictions(res, fcol = "svm", t = ts)
Classification using the PerTurbo algorithm.
perTurboClassification( object, assessRes, scores = c("prediction", "all", "none"), pRegul, sigma, inv, reg, fcol = "markers" )
perTurboClassification( object, assessRes, scores = c("prediction", "all", "none"), pRegul, sigma, inv, reg, fcol = "markers" )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
pRegul |
If |
sigma |
If |
inv |
The type of algorithm used to invert the matrix.
Values are : "Inversion Cholesky" ( |
reg |
The type of regularisation of matrix. Values are
"none", "trunc" or "tikhonov". Default value is
|
fcol |
The feature meta-data containing marker definitions.
Default is |
An instance of class "MSnSet"
with
perTurbo
and perTurbo.scores
feature variables
storing the classification results and scores respectively.
Thomas Burger and Samuel Wieczorek
N. Courty, T. Burger, J. Laurent. "PerTurbo: a new classification algorithm based on the spectrum perturbations of the Laplace-Beltrami operator", The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011), D. Gunopulos et al. (Eds.): ECML PKDD 2011, Part I, LNAI 6911, pp. 359 - 374, Athens, Greece, September 2011.
library(pRolocdata) data(dunkley2006) ## reducing parameter search space params <- perTurboOptimisation(dunkley2006, pRegul = 2^seq(-2,2,2), sigma = 10^seq(-1, 1, 1), inv = "Inversion Cholesky", reg ="tikhonov", times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- perTurboClassification(dunkley2006, params) getPredictions(res, fcol = "perTurbo") getPredictions(res, fcol = "perTurbo", t = 0.75) plot2D(res, fcol = "perTurbo")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space params <- perTurboOptimisation(dunkley2006, pRegul = 2^seq(-2,2,2), sigma = 10^seq(-1, 1, 1), inv = "Inversion Cholesky", reg ="tikhonov", times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- perTurboClassification(dunkley2006, params) getPredictions(res, fcol = "perTurbo") getPredictions(res, fcol = "perTurbo", t = 0.75) plot2D(res, fcol = "perTurbo")
Classification parameter optimisation for the PerTurbo algorithm
perTurboOptimisation( object, fcol = "markers", pRegul = 10^(seq(from = -1, to = 0, by = 0.2)), sigma = 10^(seq(from = -1, to = 1, by = 0.5)), inv = c("Inversion Cholesky", "Moore Penrose", "solve", "svd"), reg = c("tikhonov", "none", "trunc"), times = 1, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE )
perTurboOptimisation( object, fcol = "markers", pRegul = 10^(seq(from = -1, to = 0, by = 0.2)), sigma = 10^(seq(from = -1, to = 1, by = 0.5)), inv = c("Inversion Cholesky", "Moore Penrose", "solve", "svd"), reg = c("tikhonov", "none", "trunc"), times = 1, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
pRegul |
The hyper-parameter for the regularisation (values are in ]0,1] ). If reg =="trunc", pRegul is for the percentage of eigen values in matrix. If reg =="tikhonov", then 'pRegul' is the parameter for the tikhonov regularisation. Available configurations are : "Inversion Cholesky" - ("tikhonov" / "none"), "Moore Penrose" - ("tikhonov" / "none"), "solve" - ("tikhonov" / "none"), "svd" - ("tikhonov" / "none" / "trunc"). |
sigma |
The hyper-parameter. |
inv |
The type of algorithm used to invert the matrix.
Values are :
"Inversion Cholesky" ( |
reg |
The type of regularisation of matrix.
Values are "none", "trunc" or "tikhonov".
Default value is |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Thomas Burger and Samuel Wieczorek
perTurboClassification
and example therein.
phenoDisco
algorithm.phenoDisco
is a semi-supervised iterative approach to
detect new protein clusters.
phenoDisco( object, fcol = "markers", times = 100, GS = 10, allIter = FALSE, p = 0.05, ndims = 2, modelNames = mclust.options("emModelNames"), G = 1:9, BPPARAM, tmpfile, seed, verbose = TRUE, dimred = c("PCA", "t-SNE"), ... )
phenoDisco( object, fcol = "markers", times = 100, GS = 10, allIter = FALSE, p = 0.05, ndims = 2, modelNames = mclust.options("emModelNames"), G = 1:9, BPPARAM, tmpfile, seed, verbose = TRUE, dimred = c("PCA", "t-SNE"), ... )
object |
An instance of class |
fcol |
A |
times |
Number of runs of tracking. Default is 100. |
GS |
Group size, i.e how many proteins make a group. Default is 10 (the minimum group size is 4). |
allIter |
|
p |
Significance level for outlier detection. Default is 0.05. |
ndims |
Number of principal components to use as input for the disocvery analysis. Default is 2. Added in version 1.3.9. |
modelNames |
A vector of characters indicating the models to
be fitted in the EM phase of clustering using
|
G |
An integer vector specifying the numbers of mixture
components (clusters) for which the BIC is to be
calculated. The default is |
BPPARAM |
Support for parallel processing using the
|
tmpfile |
An optional |
seed |
An optional |
verbose |
Logical, indicating if messages are to be printed out during execution of the algorithm. |
dimred |
A |
... |
Additional arguments passed to the dimensionality
reduction method. For both PCA and t-SNE, the data is scaled
and centred by default, and these parameters ( |
The algorithm performs a phenotype discovery analysis as described in Breckels et al. Using this approach one can identify putative subcellular groupings in organelle proteomics experiments for more comprehensive validation in an unbiased fashion. The method is based on the work of Yin et al. and used iterated rounds of Gaussian Mixture Modelling using the Expectation Maximisation algorithm combined with a non-parametric outlier detection test to identify new phenotype clusters.
One requires 2 or more classes to be labelled in the data and at a
very minimum of 6 markers per class to run the algorithm. The
function will check and remove features with missing values using
the filterNA
method.
A parallel implementation, relying on the BiocParallel
package, has been added in version 1.3.9. See the BPPARAM
arguent for details.
Important: Prior to version 1.1.2 the row order in the output was different from the row order in the input. This has now been fixed and row ordering is now the same in both input and output objects.
An instance of class MSnSet
containing the
phenoDisco
predictions.
Lisa M. Breckels <[email protected]>
Yin Z, Zhou X, Bakal C, Li F, Sun Y, Perrimon N, Wong ST. Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens. BMC Bioinformatics. 2008 Jun 5;9:264. PubMed PMID: 18534020.
Breckels LM, Gatto L, Christoforou A, Groen AJ, Lilley KS and Trotter MWB. The Effect of Organelle Discovery upon Sub-Cellular Protein Localisation. J Proteomics. 2013 Aug 2;88:129-40. doi: 10.1016/j.jprot.2013.02.019. Epub 2013 Mar 21. PubMed PMID: 23523639.
## Not run: library(pRolocdata) data(tan2009r1) pdres <- phenoDisco(tan2009r1, fcol = "PLSDA") getPredictions(pdres, fcol = "pd", scol = NULL) plot2D(pdres, fcol = "pd") ## to pre-process the data with t-SNE instead of PCA pdres <- phenoDisco(tan2009r1, fcol = "PLSDA", dimred = "t-SNE") ## End(Not run)
## Not run: library(pRolocdata) data(tan2009r1) pdres <- phenoDisco(tan2009r1, fcol = "PLSDA") getPredictions(pdres, fcol = "pd", scol = NULL) plot2D(pdres, fcol = "pd") ## to pre-process the data with t-SNE instead of PCA pdres <- phenoDisco(tan2009r1, fcol = "PLSDA", dimred = "t-SNE") ## End(Not run)
Generate 2 or 3 dimensional feature distribution plots to
illustrate localistation clusters. Rows/features containing
NA
values are removed prior to dimension reduction except
for the "nipals"
method. For this method, it is advised to
set the method argument 'ncomp' to a low number of dimensions to
avoid computing all components when analysing large datasets.
plot2D( object, fcol = "markers", fpch, unknown = "unknown", dims = 1:2, score = 1, method = "PCA", methargs, axsSwitch = FALSE, mirrorX = FALSE, mirrorY = FALSE, col, pch, cex, index = FALSE, idx.cex = 0.75, addLegend, identify = FALSE, plot = TRUE, grid = TRUE, ... ) ## S4 method for signature 'MSnSet' plot3D( object, fcol = "markers", dims = c(1, 2, 3), radius1 = 0.1, radius2 = radius1 * 2, plot = TRUE, ... )
plot2D( object, fcol = "markers", fpch, unknown = "unknown", dims = 1:2, score = 1, method = "PCA", methargs, axsSwitch = FALSE, mirrorX = FALSE, mirrorY = FALSE, col, pch, cex, index = FALSE, idx.cex = 0.75, addLegend, identify = FALSE, plot = TRUE, grid = TRUE, ... ) ## S4 method for signature 'MSnSet' plot3D( object, fcol = "markers", dims = c(1, 2, 3), radius1 = 0.1, radius2 = radius1 * 2, plot = TRUE, ... )
object |
An instance of class |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different
colours. Default is |
fpch |
Featre meta-data label (fData column name) desining the groups to be differentiated using different point symbols. |
unknown |
A |
dims |
A |
score |
A numeric specifying the minimum organelle assignment score to consider features to be assigned an organelle. (not yet implemented). |
method |
A
If none is used, the data is plotted as is, i.e. without any
transformation. In this case, Available methods are listed in |
methargs |
A |
axsSwitch |
A |
mirrorX |
A |
mirrorY |
A |
col |
A |
pch |
A |
cex |
Character expansion. |
index |
A |
idx.cex |
A |
addLegend |
A character indicating where to add the
legend. See |
identify |
A logical (default is |
plot |
A |
grid |
A |
... |
Additional parameters passed to |
radius1 |
A |
radius2 |
A |
plot3D
relies on the ##' rgl
package, that will be
loaded automatically.
Note that plot2D
has been update in version 1.3.6 to
support more organelle classes than colours defined in
getStockcol
. In such cases, the default
colours are recycled using the default plotting characters
defined in getStockpch
. See the example for
an illustration. The alpha
argument is also
depreciated in version 1.3.6. Use setStockcol
to set
colours with transparency instead. See example below.
Version 1.11.3: to plot data as is, i.e. without any
transformation, method
can be set to "none" (as
opposed to passing pre-computed values to method
as a
matrix
, in previous versions). If object
is an
MSnSet
, the untransformed values in the assay data
will be plotted. If object
is a matrix
with
coordinates, then a matching MSnSet
must be passed to
methargs
.
Used for its side effects of generating a plot. Invisibly returns the 2 or 3 dimensions that are plotted.
Laurent Gatto <[email protected]>
addLegend
to add a legend to plot2D
figures (the legend is added by default on plot3D
) and
plotDist
for alternative graphical
representation of quantitative organelle proteomics
data. plot2Ds
to overlay 2 data sets on the same
PCA plot. The plotEllipse
function can be used
to visualise TAGM models on PCA plots with ellipses.
library("pRolocdata") data(dunkley2006) plot2D(dunkley2006, fcol = NULL) plot2D(dunkley2006, fcol = NULL, col = "black") plot2D(dunkley2006, fcol = "markers") addLegend(dunkley2006, fcol = "markers", where = "topright", cex = 0.5, bty = "n", ncol = 3) title(main = "plot2D example") ## available methods plot2Dmethods plot2D(dunkley2006, fcol = NULL, method = "kpca", col = "black") plot2D(dunkley2006, fcol = NULL, method = "kpca", col = "black", methargs = list(kpar = list(sigma = 1))) plot2D(dunkley2006, method = "lda") plot2D(dunkley2006, method = "hexbin") ## Using transparent colours setStockcol(paste0(getStockcol(), "80")) plot2D(dunkley2006, fcol = "markers") ## New behavious in 1.3.6 when not enough colours setStockcol(c("blue", "red", "green")) getStockcol() ## only 3 colours to be recycled getMarkers(dunkley2006) plot2D(dunkley2006) ## reset colours setStockcol(NULL) plot2D(dunkley2006, method = "none") ## plotting along 2 first fractions plot2D(dunkley2006, dims = c(3, 5), method = "none") ## plotting along fractions 3 and 5 ## pre-calculate PC1 and PC2 coordinates pca <- plot2D(dunkley2006, plot=FALSE) head(pca) plot2D(pca, method = "none", methargs = list(dunkley2006)) ## plotting in 3 dimenstions plot3D(dunkley2006) plot3D(dunkley2006, radius2 = 0.3) plot3D(dunkley2006, dims = c(2, 4, 6))
library("pRolocdata") data(dunkley2006) plot2D(dunkley2006, fcol = NULL) plot2D(dunkley2006, fcol = NULL, col = "black") plot2D(dunkley2006, fcol = "markers") addLegend(dunkley2006, fcol = "markers", where = "topright", cex = 0.5, bty = "n", ncol = 3) title(main = "plot2D example") ## available methods plot2Dmethods plot2D(dunkley2006, fcol = NULL, method = "kpca", col = "black") plot2D(dunkley2006, fcol = NULL, method = "kpca", col = "black", methargs = list(kpar = list(sigma = 1))) plot2D(dunkley2006, method = "lda") plot2D(dunkley2006, method = "hexbin") ## Using transparent colours setStockcol(paste0(getStockcol(), "80")) plot2D(dunkley2006, fcol = "markers") ## New behavious in 1.3.6 when not enough colours setStockcol(c("blue", "red", "green")) getStockcol() ## only 3 colours to be recycled getMarkers(dunkley2006) plot2D(dunkley2006) ## reset colours setStockcol(NULL) plot2D(dunkley2006, method = "none") ## plotting along 2 first fractions plot2D(dunkley2006, dims = c(3, 5), method = "none") ## plotting along fractions 3 and 5 ## pre-calculate PC1 and PC2 coordinates pca <- plot2D(dunkley2006, plot=FALSE) head(pca) plot2D(pca, method = "none", methargs = list(dunkley2006)) ## plotting in 3 dimenstions plot3D(dunkley2006) plot3D(dunkley2006, radius2 = 0.3) plot3D(dunkley2006, dims = c(2, 4, 6))
Takes 2 linkS4class{MSnSet}
instances as input to plot the
two data sets on the same PCA plot. The second data points are
projected on the PC1 and PC2 dimensions calculated for the first
data set.
plot2Ds( object, pcol, fcol = "markers", cex.x = 1, cex.y = 1, pch.x = 21, pch.y = 23, col, mirrorX = FALSE, mirrorY = FALSE, plot = TRUE, ... )
plot2Ds( object, pcol, fcol = "markers", cex.x = 1, cex.y = 1, pch.x = 21, pch.y = 23, col, mirrorX = FALSE, mirrorY = FALSE, plot = TRUE, ... )
object |
An |
pcol |
If |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different
colours. Default is |
cex.x |
Character expansion for the first data set. Default is 1. |
cex.y |
Character expansion for the second data set. Default is 1. |
pch.x |
Plotting character for the first data set. Default is 21. |
pch.y |
Plotting character for the second data set. Default is 23. |
col |
A vector of colours to highlight the different classes
defined by |
mirrorX |
A |
mirrorY |
A |
plot |
If |
... |
Additinal parameters passed to |
Used for its side effects of producing a plot. Invisibly
returns an object of class plot2Ds
, which is a list
with the PCA analyses results (see prcomp
) of
the first data set and the new coordinates of the second data
sets, as used to produce the plot and the respective point
colours. Each of these elements can be accessed with
data1
, data2
, col1
and code2
respectively.
Laurent Gatto
See plot2D
to plot a single data set and
move2Ds
for a animation.
library("pRolocdata") data(tan2009r1) data(tan2009r2) msnl <- MSnSetList(list(tan2009r1, tan2009r2)) plot2Ds(msnl) ## tweaking the parameters plot2Ds(list(tan2009r1, tan2009r2), fcol = NULL, cex.x = 1.5) ## input is 1 MSnSet containing 2 data sets data(dunkley2006) plot2Ds(dunkley2006, pcol = "replicate") ## no plot, just the data res <- plot2Ds(dunkley2006, pcol = "replicate", plot = FALSE) res head(data1(res)) head(col1(res))
library("pRolocdata") data(tan2009r1) data(tan2009r2) msnl <- MSnSetList(list(tan2009r1, tan2009r2)) plot2Ds(msnl) ## tweaking the parameters plot2Ds(list(tan2009r1, tan2009r2), fcol = NULL, cex.x = 1.5) ## input is 1 MSnSet containing 2 data sets data(dunkley2006) plot2Ds(dunkley2006, pcol = "replicate") ## no plot, just the data res <- plot2Ds(dunkley2006, pcol = "replicate", plot = FALSE) res head(data1(res)) head(col1(res))
The function plots marker consensus profiles obtained from mrkConsProfile
plotConsProfiles(object, order = NULL, plot = TRUE)
plotConsProfiles(object, order = NULL, plot = TRUE)
object |
A |
order |
Order for markers (optional). |
plot |
A |
Invisibly returns ggplot2
object.
Tom Smith
library("pRolocdata") data(E14TG2aS1) hc <- mrkHClust(E14TG2aS1, plot = FALSE) mm <- getMarkerClasses(E14TG2aS1) ord <- levels(factor(mm))[order.dendrogram(hc)] fmat <- mrkConsProfiles(E14TG2aS1) plotConsProfiles(fmat, order = ord)
library("pRolocdata") data(E14TG2aS1) hc <- mrkHClust(E14TG2aS1, plot = FALSE) mm <- getMarkerClasses(E14TG2aS1) ord <- levels(factor(mm))[order.dendrogram(hc)] fmat <- mrkConsProfiles(E14TG2aS1) plotConsProfiles(fmat, order = ord)
Produces a line plot showing the feature abundances across the fractions.
plotDist( object, markers, fcol = NULL, mcol = "steelblue", pcol = getUnknowncol(), alpha = 0.3, type = "b", lty = 1, fractions = sampleNames(object), ylab = "Intensity", xlab = "Fractions", ylim, ... )
plotDist( object, markers, fcol = NULL, mcol = "steelblue", pcol = getUnknowncol(), alpha = 0.3, type = "b", lty = 1, fractions = sampleNames(object), ylab = "Intensity", xlab = "Fractions", ylim, ... )
object |
An instance of class |
markers |
A |
fcol |
Feature meta-data label (fData column name) defining
the groups to be differentiated using different colours. If
|
mcol |
A |
pcol |
A |
alpha |
A numeric defining the alpha channel (transparency)
of the points, where |
type |
Character string defining the type of lines. For
example |
lty |
Vector of line types for the marker profiles. Default
is 1 (solid). See |
fractions |
A |
ylab |
y-axis label. Default is "Intensity". |
xlab |
x-axis label. Default is "Fractions". |
ylim |
A numeric vector of length 2, giving the y coordinates range. |
... |
Additional parameters passed to |
Used for its side effect of producing a feature distribution plot. Invisibly returns the data matrix.
Laurent Gatto
library("pRolocdata") data(tan2009r1) j <- which(fData(tan2009r1)$markers == "mitochondrion") i <- which(fData(tan2009r1)$PLSDA == "mitochondrion") plotDist(tan2009r1[i, ], markers = featureNames(tan2009r1)[j]) plotDist(tan2009r1[i, ], markers = featureNames(tan2009r1)[j], fractions = "Fractions") ## plot and colour all marker profiles tanmrk <- markerMSnSet(tan2009r1) plotDist(tanmrk, fcol = "markers")
library("pRolocdata") data(tan2009r1) j <- which(fData(tan2009r1)$markers == "mitochondrion") i <- which(fData(tan2009r1)$PLSDA == "mitochondrion") plotDist(tan2009r1[i, ], markers = featureNames(tan2009r1)[j]) plotDist(tan2009r1[i, ], markers = featureNames(tan2009r1)[j], fractions = "Fractions") ## plot and colour all marker profiles tanmrk <- markerMSnSet(tan2009r1) plotDist(tanmrk, fcol = "markers")
Note that when running PCA, this function does not scale the data (centring is performed), as opposed to [plot2D()]. Only marker proteins are displayed; the protein of unknown location, that are not used to estimate the MAP parameters, are filtered out.
plotEllipse(object, params, dims = c(1, 2), method = "MAP", ...)
plotEllipse(object, params, dims = c(1, 2), method = "MAP", ...)
object |
An ['MSnbase::MSnset'] containing quantitative spatial proteomics data. |
params |
An ['MAPParams'] with the TAGM-MAP parameters, as generated by 'tagmMapTrain'. |
dims |
A 'numeric(2)' with the principal components along which to project the data. Default is 'c(1, 2)'. |
method |
The method used. Currently '"MAP"' only. |
... |
Additional parameters passed to [plot2D()]. |
A PCA plot of the marker data with probability ellipises. The outer ellipse contains 99 probability whilst the middle and inner ellipses contain 95 and 90 clusters are represented by black circumpunct (circled dot).
[plot2D()] to visualise spatial proteomics data using various dimensionality reduction methods. For details about TAGM models, see [tagmPredict()] and the *pRoloc-bayesian* vignette.
Classification using the partial least square distcriminant analysis algorithm.
plsdaClassification( object, assessRes, scores = c("prediction", "all", "none"), ncomp, fcol = "markers", ... )
plsdaClassification( object, assessRes, scores = c("prediction", "all", "none"), ncomp, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
ncomp |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to |
An instance of class "MSnSet"
with
plsda
and plsda.scores
feature variables storing
the classification results and scores respectively.
Laurent Gatto
## not running this one for time considerations library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- plsdaOptimisation(dunkley2006, ncomp = c(3, 10), times = 2) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- plsdaClassification(dunkley2006, params) getPredictions(res, fcol = "plsda") getPredictions(res, fcol = "plsda", t = 0.9) plot2D(res, fcol = "plsda")
## not running this one for time considerations library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- plsdaOptimisation(dunkley2006, ncomp = c(3, 10), times = 2) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- plsdaClassification(dunkley2006, params) getPredictions(res, fcol = "plsda") getPredictions(res, fcol = "plsda", t = 0.9) plot2D(res, fcol = "plsda")
Classification parameter optimisation for the partial least square distcriminant analysis algorithm.
plsdaOptimisation( object, fcol = "markers", ncomp = 2:6, times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
plsdaOptimisation( object, fcol = "markers", ncomp = 2:6, times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
ncomp |
The hyper-parameter. Default values are |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
plsdaClassification
and example therein.
This function retrieves a list of organelle markers or, if no species
is provided, prints a description of available marker sets. The markers can
be added to and MSnSet
using the addMarkers
function. Several marker version are provided (see Details for additional
information).
pRolocmarkers(species, version = "2")
pRolocmarkers(species, version = "2")
species |
|
version |
|
Version 1 of the markers have been contributed by various members of the
Cambridge Centre for Proteomics, in particular Dr Dan Nightingale for yeast,
Dr Andy Christoforou and Dr Claire Mulvey for human, Dr Arnoud Groen for
Arabodopsis and Dr Claire Mulvey for mouse. In addition, original (curated)
markers from the pRolocdata
datasets have been extracted (see
pRolocdata
for details and references). Curation involved
verification of publicly available subcellular localisation annotation based
on the curators knowledge of the organelles/proteins considered and tracing
the original statement in the literature.
Version 2 of the markers (current default) have been updated by Charlotte Hutchings from the Cambridge Centre for Proteomics. Reference species marker sets are the same as those in version 1 with minor corrections and an updated naming system. Version 2 also contains additional marker sets from spatial proteomics publications. References for the source publications are provided below:
Geladaki, A., Britovsek, N.K., Breckels, L.M., Smith, T.S., Vennard, O.L., Mulvey, C.M., Crook, O.M., Gatto, L. and Lilley, K.S. (2019) Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics. Nature Communications. 10 (1). doi:10.1038/s41467-018-08191-w
Christopher, J.A., Breckels, L.M., Crook, O.M., Vazquez–Chantada, M., Barratt, D. and Lilley, K.S. (2024) Global proteomics indicates subcellular-specific anti-ferroptotic responses to ionizing radiation.p.2024.09.12.611851. doi:10.1101/2024.09.12.611851
Itzhak, D.N., Tyanova, S., Cox, J. and Borner, G.H. (2016) Global, quantitative and dynamic mapping of protein subcellular localization. eLife. 5. doi:10.7554/elife.16950
Villanueva, E., Smith, T., Pizzinga, M., Elzek, M., Queiroz, R.M.L., Harvey, R.F., Breckels, L.M., Crook, O.M., Monti, M., Dezi, V., Willis, A.E. and Lilley, K.S. (2023) System-wide analysis of RNA and protein subcellular localization dynamics. Nature Methods. 1-12. doi:10.1038/s41592-023-02101-9
Christoforou, A., Mulvey, C.M., Breckels, L.M., Geladaki, A., Hurrell, T., Hayward, P.C., Naake, T., Gatto, L., Viner, R., Arias, A.M. and Lilley, K.S. (2016) A draft map of the mouse pluripotent stem cell spatial proteome. Nature Communications. 7 (1). doi:10.1038/ncomms9992
Barylyuk, K., Koreny, L., Ke, H., Butterworth, S., Crook, O.M., Lassadi, I., Gupta, V., Tromer, E., Mourier, T., Stevens, T.J., Breckels, L.M., Pain, A., Lilley, K.S. and Waller, R.F. (2020) A Comprehensive Subcellular Atlas of the Toxoplasma Proteome via hyperLOPIT Provides Spatial Context for Protein Functions. Cell Host and Microbe. 28 (5), 752-766.e9. doi:10.1016/j.chom.2020.09.011
Moloney, N.M., Barylyuk, K., Tromer, E., Crook, O.M., Breckels, L.M., Lilley, K.S., Waller, R.F. and MacGregor, P. (2023) Mapping diversity in African trypanosomes using high resolution spatial proteomics. Nature Communications. 14 (1), 4401. doi:10.1038/s41467-023-40125-z
Note: These markers are provided as a starting point to generate reliable sets of organelle markers but still need to be verified against any new data in the light of the quantitative data and the study conditions.
Prints a description of the available marker lists if species
is missing or a named character with organelle markers.
Laurent Gatto
addMarkers
to add markers to an MSnSet
and
markers
for more information about marker encoding.
pRolocmarkers() pRolocmarkers("hsap") table(pRolocmarkers("hsap")) ## Old markers pRolocmarkers("hsap", version = "2")["Q9BPW9"] pRolocmarkers("hsap", version = "1")["Q9BPW9"]
pRolocmarkers() pRolocmarkers("hsap") table(pRolocmarkers("hsap")) ## Old markers pRolocmarkers("hsap", version = "2")["Q9BPW9"] pRolocmarkers("hsap", version = "1")["Q9BPW9"]
The QSep
infrastructure provide a way to quantify the
resolution of a spatial proteomics experiment, i.e. to quantify how
well annotated sub-cellular clusters are separated from each other.
The QSep
function calculates all between and within cluster
average distances. These distances are then divided column-wise by the
respective within cluster average distance. For example, for a dataset
with only 2 spatial clusters, we would obtain
|
|
|
|
|
|
|
|
|
Normalised distance represent the ratio of between to within average
distances, i.e. how much bigger the average distance between cluster
and
is compared to the average distance within
cluster
.
|
|
|
|
1 | |
|
|
1 |
Note that the normalised distance matrix is not symmetric anymore and the normalised distance ratios are proportional to the tightness of the reference cluster (along the columns).
Missing values only affect the fractions containing the NA
when
the distance is computed (see the example below) and further used when
calculating mean distances. Few missing values are expected to have
negligible effect, but data with a high proportion of missing data
will will produce skewed distances. In QSep
, we take a
conservative approach, using the data as provided by the user, and
expect that the data missingness is handled before proceeding with this
or any other analysis.
Objects can be created by calls using the constructor
QSep
(see below).
x
:Object of class "matrix"
containing the
pairwise distance matrix, accessible with qseq(., norm =
FALSE)
.
xnorm
:Object of class "matrix"
containing the
normalised pairwise distance matrix, accessible with qsep(.,
norm = TRUE)
or qsep(.)
.
object
:Object of class "character"
with the
variable name of MSnSet
object that was used
to generate the QSep
object.
.__classVersion__
:Object of class "Versions"
storing the class version of the object.
Class "Versioned"
, directly.
signature(object = "MSnSet", fcol = "character")
:
constructor for QSep
objects. The fcol
argument
defines the name of the feature variable that annotates the
sub-cellular clusters. Non-marker proteins, that are marked as
"unknown"
are automatically removed prior to distance
calculation.
signature{object = "QSep", norm = "logical"}
:
accessor for the normalised (when norm
is TRUE
,
which is default) and raw (when norm
is FALSE
)
pairwise distance matrices.
signature{object = "QSep"}
: method to retrieve
the names of the sub-celluar clusters originally defined in
QSep
's fcol
argument. A replacement method
names(.) <-
is also available.
signature(object = "QSep", ..., verbose =
"logical")
: Invisible return all between cluster average
distances and prints (when verbose
is TRUE
,
default) a summary of those.
signature(object = "QSep", norm = "logical",
...)
: plots an annotated heatmap of all normalised pairwise
distances. norm
(default is TRUE
) defines whether
normalised distances should be plotted. Additional arguments
...
are passed to the levelplot
.
signature(object = "QSep", norm = "logical"...)
:
produces a boxplot of all normalised pairwise distances. The red
points represent the within average distance and black points
between average distances. norm
(default is TRUE
)
defines whether normalised distances should be plotted.
Laurent Gatto <[email protected]>
Assessing sub-cellular resolution in spatial proteomics experiments Laurent Gatto, Lisa M Breckels, Kathryn S Lilley bioRxiv 377630; doi: https://doi.org/10.1101/377630
## Test data from Christoforou et al. 2016 library("pRolocdata") data(hyperLOPIT2015) ## Create the object and get a summary hlq <- QSep(hyperLOPIT2015) hlq summary(hlq) ## mean distance matrix qsep(hlq, norm = FALSE) ## normalised average distance matrix qsep(hlq) ## Update the organelle cluster names for better ## rendering on the plots names(hlq) <- sub("/", "\n", names(hlq)) names(hlq) <- sub(" - ", "\n", names(hlq)) names(hlq) ## Heatmap of the normalised intensities levelPlot(hlq) ## Boxplot of the normalised intensities par(mar = c(3, 10, 2, 1)) plot(hlq) ## Boxplot of all between cluster average distances x <- summary(hlq, verbose = FALSE) boxplot(x) ## Missing data example, for 4 proteins and 3 fractions x <- rbind(c(1.1, 1.2, 1.3), rep(1, 3), c(NA, 1, 1), c(1, 1, NA)) rownames(x) <- paste0("P", 1:4) colnames(x) <- paste0("F", 1:3) ## P1 is the reference, against which we will calculate distances. P2 ## has a complete profile, producing the *real* distance. P3 and P4 have ## missing values in the first and last fraction respectively. x ## If we drop F1 in P3, which represents a small difference of 0.1, the ## distance only considers F2 and F3, and increases. If we drop F3 in ## P4, which represents a large distance of 0.3, the distance only ## considers F1 and F2, and decreases. dist(x)
## Test data from Christoforou et al. 2016 library("pRolocdata") data(hyperLOPIT2015) ## Create the object and get a summary hlq <- QSep(hyperLOPIT2015) hlq summary(hlq) ## mean distance matrix qsep(hlq, norm = FALSE) ## normalised average distance matrix qsep(hlq) ## Update the organelle cluster names for better ## rendering on the plots names(hlq) <- sub("/", "\n", names(hlq)) names(hlq) <- sub(" - ", "\n", names(hlq)) names(hlq) ## Heatmap of the normalised intensities levelPlot(hlq) ## Boxplot of the normalised intensities par(mar = c(3, 10, 2, 1)) plot(hlq) ## Boxplot of all between cluster average distances x <- summary(hlq, verbose = FALSE) boxplot(x) ## Missing data example, for 4 proteins and 3 fractions x <- rbind(c(1.1, 1.2, 1.3), rep(1, 3), c(NA, 1, 1), c(1, 1, NA)) rownames(x) <- paste0("P", 1:4) colnames(x) <- paste0("F", 1:3) ## P1 is the reference, against which we will calculate distances. P2 ## has a complete profile, producing the *real* distance. P3 and P4 have ## missing values in the first and last fraction respectively. x ## If we drop F1 in P3, which represents a small difference of 0.1, the ## distance only considers F2 and F3, and increases. If we drop F3 in ## P4, which represents a large distance of 0.3, the distance only ## considers F1 and F2, and decreases. dist(x)
Classification using the random forest algorithm.
rfClassification( object, assessRes, scores = c("prediction", "all", "none"), mtry, fcol = "markers", ... )
rfClassification( object, assessRes, scores = c("prediction", "all", "none"), mtry, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
mtry |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to
|
An instance of class "MSnSet"
with
rf
and rf.scores
feature variables storing the
classification results and scores respectively.
Laurent Gatto
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- rfOptimisation(dunkley2006, mtry = c(2, 5, 10), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- rfClassification(dunkley2006, params) getPredictions(res, fcol = "rf") getPredictions(res, fcol = "rf", t = 0.75) plot2D(res, fcol = "rf")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- rfOptimisation(dunkley2006, mtry = c(2, 5, 10), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- rfClassification(dunkley2006, params) getPredictions(res, fcol = "rf") getPredictions(res, fcol = "rf", t = 0.75) plot2D(res, fcol = "rf")
Classification parameter optimisation for the random forest algorithm.
rfOptimisation( object, fcol = "markers", mtry = NULL, times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
rfOptimisation( object, fcol = "markers", mtry = NULL, times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
mtry |
The hyper-parameter. Default value is |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
rfClassification
and example therein.
MSnSet
This function extracts a stratified sample of an MSnSet
.
sampleMSnSet(object, fcol = "markers", size = 0.2, seed)
sampleMSnSet(object, fcol = "markers", size = 0.2, seed)
object |
An instance of class |
fcol |
The feature meta-data column name containing the
marker (vector or matrix) definitions on which the MSnSet will be
stratified. Default is |
size |
The size of the stratified sample to be extracted. Default is 0.2 (20 percent). |
seed |
The optional random number generator seed. |
A stratified sample (according to the defined fcol
)
which is an instance of class "MSnSet"
.
Lisa Breckels
testMSnSet
unknownMSnSet
markerMSnSet
. See markers
for details
about markers encoding.
library(pRolocdata) data(tan2009r1) dim(tan2009r1) smp <- sampleMSnSet(tan2009r1, fcol = "markers") dim(smp) getMarkers(tan2009r1) getMarkers(smp)
library(pRolocdata) data(tan2009r1) dim(tan2009r1) smp <- sampleMSnSet(tan2009r1, fcol = "markers") dim(smp) getMarkers(tan2009r1) getMarkers(smp)
These functions allow to get/set the colours and point character that are used when plotting organelle clusters and unknown features. These values are parametrised at the session level. Two palettes are available: the default palette (previously Lisa's colours) containing 30 colours and the old (original) palette, containing 13 colours.
setLisacol() getLisacol() getOldcol() setOldcol() getStockcol() setStockcol(cols) getStockpch() setStockpch(pchs) getUnknowncol() setUnknowncol(col) getUnknownpch() setUnknownpch(pch)
setLisacol() getLisacol() getOldcol() setOldcol() getStockcol() setStockcol(cols) getStockpch() setStockpch(pchs) getUnknowncol() setUnknowncol(col) getUnknownpch() setUnknownpch(pch)
cols |
A vector of colour |
pchs |
A vector of |
col |
A colour |
pch |
A |
The set
functions set (and invisibly returns)
colours. The get
functions returns a character
vector of colours. For the pch
functions, numeric
s
rather than character
s.
Laurent Gatto
## defaults for clusters getStockcol() getStockpch() ## unknown features getUnknownpch() getUnknowncol() ## an example library(pRolocdata) data(dunkley2006) par(mfrow = c(2, 1)) plot2D(dunkley2006, fcol = "markers", main = 'Default colours') setUnknowncol("black") plot2D(dunkley2006, fcol = "markers", main = 'setUnknowncol("black")') getUnknowncol() setUnknowncol(NULL) getUnknowncol() getStockcol() getOldcol()
## defaults for clusters getStockcol() getStockpch() ## unknown features getUnknownpch() getUnknowncol() ## an example library(pRolocdata) data(dunkley2006) par(mfrow = c(2, 1)) plot2D(dunkley2006, fcol = "markers", main = 'Default colours') setUnknowncol("black") plot2D(dunkley2006, fcol = "markers", main = 'setUnknowncol("black")') getUnknowncol() setUnknowncol(NULL) getUnknowncol() getStockcol() getOldcol()
This function prints a textual description of the Gene Ontology evidence codes.
showGOEvidenceCodes() getGOEvidenceCodes()
showGOEvidenceCodes() getGOEvidenceCodes()
These functions are used for their side effects of printing evidence codes and their description.
Laurent Gatto
showGOEvidenceCodes() getGOEvidenceCodes()
showGOEvidenceCodes() getGOEvidenceCodes()
Produces a pca plot with spatial variation in localisation probabilities
spatial2D( object, dims = c(1, 2), cov.function = fields::wendland.cov, theta = 1, derivative = 2, k = 1, breaks = c(0.99, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7), aspect = 0.5 )
spatial2D( object, dims = c(1, 2), cov.function = fields::wendland.cov, theta = 1, derivative = 2, k = 1, breaks = c(0.99, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7), aspect = 0.5 )
object |
A valid object of class |
dims |
The PCA dimension in which to project he data, default is
|
cov.function |
The covariance function used default is
wendland.cov. See |
theta |
A hyperparameter to the covariance function. See |
derivative |
The number of derivative of the wendland kernel. See
|
k |
A hyperparamter to the covariance function. See |
breaks |
Probability values at which to draw the contour bands. Default
is |
aspect |
A argument to change the plotting aspect of the PCA |
Used for side effect of producing plot. Invisibily returns an ggplot object that can be further manipulated
Oliver M. Crook <[email protected]>
## Not run: library("pRolocdata") data("tan2009r1") tanres <- tagmMcmcTrain(object = tan2009r1) tanres <- tagmMcmcProcess(tanres) tan2009r1 <- tagmMcmcPredict(object = tan2009r1, params = tanres, probJoint = TRUE) spatial2D(object = tan2009r1) ## End(Not run)
## Not run: library("pRolocdata") data("tan2009r1") tanres <- tagmMcmcTrain(object = tan2009r1) tanres <- tagmMcmcProcess(tanres) tan2009r1 <- tagmMcmcPredict(object = tan2009r1, params = tanres, probJoint = TRUE) spatial2D(object = tan2009r1) ## End(Not run)
SpatProtVis
A class for spatial proteomics visualisation, that upon instantiation,
pre-computes all defined visualisations. Objects can be created with
the SpatProtVis
constructor and visualised with the plot
method.
The class is essentially a wrapper around several calls to
plot2D
that stores the dimensionality reduction
outputs, and is likely to be updated in the future.
SpatProtVis(x, methods, dims, methargs, ...)
SpatProtVis(x, methods, dims, methargs, ...)
x |
An instance of class |
methods |
Dimensionality reduction methods to be used to
visualise the data. Must be contained in |
dims |
A list of numerics defining dimensions used for
plotting. Default are |
methargs |
A list of additional arguments to be passed for each
visualisation method. If provided, the length of this list must be
identical to the length of |
... |
Additional arguments. Currently ignored. |
vismats
:A "list"
of matrices containing the
feature projections in 2 dimensions.
data
:The original spatial proteomics data stored as
an "MSnSet"
.
methargs
:A "list"
of additional plotting
arguments.
objname
:A "character"
defining how to name the
dataset. By default, this is set using the variable name used at
object creation.
plot
:Generates the figures for the respective
methods
and additional arguments defined in the
constructor. If used in an interactive session, the user is
prompted to press 'Return' before new figures are displayed.
show
:A simple textual summary of the object.
Laurent Gatto <[email protected]>
The data for the individual visualisations is created by
plot2D
.
library("pRolocdata") data(dunkley2006) ## Default parameters for a set of methods ## (in the interest of time, don't use t-SNE) m <- c("PCA", "MDS", "kpca") vis <- SpatProtVis(dunkley2006, methods = m) vis plot(vis) plot(vis, legend = "topleft") ## Setting method arguments margs <- c(list(kpar = list(sigma = 0.1)), list(kpar = list(sigma = 1.0)), list(kpar = list(sigma = 10)), list(kpar = list(sigma = 100))) vis <- SpatProtVis(dunkley2006, methods = rep("kpca", 4), methargs = margs) par(mfrow = c(2, 2)) plot(vis) ## Multiple PCA plots but different PCs dims <- list(c(1, 2), c(3, 4)) vis <- SpatProtVis(dunkley2006, methods = c("PCA", "PCA"), dims = dims) plot(vis)
library("pRolocdata") data(dunkley2006) ## Default parameters for a set of methods ## (in the interest of time, don't use t-SNE) m <- c("PCA", "MDS", "kpca") vis <- SpatProtVis(dunkley2006, methods = m) vis plot(vis) plot(vis, legend = "topleft") ## Setting method arguments margs <- c(list(kpar = list(sigma = 0.1)), list(kpar = list(sigma = 1.0)), list(kpar = list(sigma = 10)), list(kpar = list(sigma = 100))) vis <- SpatProtVis(dunkley2006, methods = rep("kpca", 4), methargs = margs) par(mfrow = c(2, 2)) plot(vis) ## Multiple PCA plots but different PCs dims <- list(c(1, 2), c(3, 4)) vis <- SpatProtVis(dunkley2006, methods = c("PCA", "PCA"), dims = dims) plot(vis)
Subsets a matrix of markers by specific terms
subsetMarkers(object, fcol = "GOAnnotations", keep)
subsetMarkers(object, fcol = "GOAnnotations", keep)
object |
An instance of class |
fcol |
The name of the markers matrix. Default is
|
keep |
Integer or character vector specifying the columns to keep
in the markers matrix, as defined by |
An updated MSnSet
Lisa M Breckels
addGoAnnotations
and example therein.
Classification using the support vector machine algorithm.
svmClassification( object, assessRes, scores = c("prediction", "all", "none"), cost, sigma, fcol = "markers", ... )
svmClassification( object, assessRes, scores = c("prediction", "all", "none"), cost, sigma, fcol = "markers", ... )
object |
An instance of class |
assessRes |
An instance of class
|
scores |
One of |
cost |
If |
sigma |
If |
fcol |
The feature meta-data containing marker definitions.
Default is |
... |
Additional parameters passed to |
An instance of class "MSnSet"
with
svm
and svm.scores
feature variables storing the
classification results and scores respectively.
Laurent Gatto
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- svmOptimisation(dunkley2006, cost = 2^seq(-2,2,2), sigma = 10^seq(-1, 1, 1), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- svmClassification(dunkley2006, params) getPredictions(res, fcol = "svm") getPredictions(res, fcol = "svm", t = 0.75) plot2D(res, fcol = "svm")
library(pRolocdata) data(dunkley2006) ## reducing parameter search space and iterations params <- svmOptimisation(dunkley2006, cost = 2^seq(-2,2,2), sigma = 10^seq(-1, 1, 1), times = 3) params plot(params) f1Count(params) levelPlot(params) getParams(params) res <- svmClassification(dunkley2006, params) getPredictions(res, fcol = "svm") getPredictions(res, fcol = "svm", t = 0.75) plot2D(res, fcol = "svm")
Classification parameter optimisation for the support vector machine algorithm.
svmOptimisation( object, fcol = "markers", cost = 2^(-4:4), sigma = 10^(-3:2), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
svmOptimisation( object, fcol = "markers", cost = 2^(-4:4), sigma = 10^(-3:2), times = 100, test.size = 0.2, xval = 5, fun = mean, seed, verbose = TRUE, ... )
object |
An instance of class |
fcol |
The feature meta-data containing marker definitions.
Default is |
cost |
The hyper-parameter. Default values are |
sigma |
The hyper-parameter. Default values are |
times |
The number of times internal cross-validation is performed. Default is 100. |
test.size |
The size of test data. Default is 0.2 (20 percent). |
xval |
The |
fun |
The function used to summarise the |
seed |
The optional random number generator seed. |
verbose |
A |
... |
Additional parameters passed to |
Note that when performance scores precision, recall and (macro) F1 are calculated, any NA values are replaced by 0. This decision is motivated by the fact that any class that would have either a NA precision or recall would result in an NA F1 score and, eventually, a NA macro F1 (i.e. mean(F1)). Replacing NAs by 0s leads to F1 values of 0 and a reduced yet defined final macro F1 score.
An instance of class "GenRegRes"
.
Laurent Gatto
svmClassification
and example therein.
These functions implement the T augmented Gaussian mixture (TAGM) model for mass spectrometry-based spatial proteomics datasets using Markov-chain Monte-Carlo (MCMC) for inference.
tagmMcmcTrain( object, fcol = "markers", method = "MCMC", numIter = 1000L, burnin = 100L, thin = 5L, mu0 = NULL, lambda0 = 0.01, nu0 = NULL, S0 = NULL, beta0 = NULL, u = 2, v = 10, numChains = 4L, BPPARAM = BiocParallel::bpparam() ) tagmMcmcPredict( object, params, fcol = "markers", probJoint = FALSE, probOutlier = TRUE ) tagmPredict( object, params, fcol = "markers", probJoint = FALSE, probOutlier = TRUE ) tagmMcmcProcess(params)
tagmMcmcTrain( object, fcol = "markers", method = "MCMC", numIter = 1000L, burnin = 100L, thin = 5L, mu0 = NULL, lambda0 = 0.01, nu0 = NULL, S0 = NULL, beta0 = NULL, u = 2, v = 10, numChains = 4L, BPPARAM = BiocParallel::bpparam() ) tagmMcmcPredict( object, params, fcol = "markers", probJoint = FALSE, probOutlier = TRUE ) tagmPredict( object, params, fcol = "markers", probJoint = FALSE, probOutlier = TRUE ) tagmMcmcProcess(params)
object |
An |
fcol |
The feature meta-data containing marker definitions.
Default is |
method |
A |
numIter |
The number of iterations of the MCMC algorithm. Default is 1000. |
burnin |
The number of samples to be discarded from the begining of the chain. Default is 100. |
thin |
The thinning frequency to be applied to the MCMC chain. Default is 5. |
mu0 |
The prior mean. Default is |
lambda0 |
The prior shrinkage. Default is 0.01. |
nu0 |
The prior degreed of freedom. Default is
|
S0 |
The prior inverse-wishart scale matrix. Empirical prior used by default. |
beta0 |
The prior Dirichlet distribution concentration. Default is 1 for each class. |
u |
The prior shape parameter for Beta(u, v). Default is 2 |
v |
The prior shape parameter for Beta(u, v). Default is 10. |
numChains |
The number of parrallel chains to be run. Default it 4. |
BPPARAM |
Support for parallel processing using the
|
params |
An instance of class |
probJoint |
A |
probOutlier |
A |
The tagmMcmcTrain
function generates the samples from the
posterior distributions (object or class MCMCParams
) based on an
annotated quantitative spatial proteomics dataset (object of class
MSnbase::MSnSet
). Both are then passed to the tagmPredict
function to predict the sub-cellular localisation of protein of
unknown localisation. See the pRoloc-bayesian vignette for
details and examples. In this implementation, if numerical instability
is detected in the covariance matrix of the data a small multiple of
the identity is added. A message is printed if this conditioning step
is performed.
tagmMcmcTrain
returns an instance of class
MCMCParams
.
tagmMcmcPredict
returns an instance of class
MSnbase::MSnSet
containing the localisation predictions as
a new tagm.mcmc.allocation
feature variable. The allocation
probability is encoded as tagm.mcmc.probability
(corresponding to the mean of the distribution
probability). In additionm the upper and lower quantiles of
the allocation probability distribution are available as
tagm.mcmc.probability.lowerquantile
and
tagm.mcmc.probability.upperquantile
feature variables. The
Shannon entropy is available in the tagm.mcmc.mean.shannon
feature variable, measuring the uncertainty in the allocations
(a high value representing high uncertainty; the highest value
is the natural logarithm of the number of classes).
tagmMcmcProcess
returns an instance of class
MCMCParams
with its summary slot populated.
A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269
The plotEllipse()
function can be used to visualise
TAGM models on PCA plots with ellipses.
Tests if the marker class sizes are large enough for the parameter
optimisation scheme, i.e. the size is greater that xval + n
,
where the default xval
is 5 and n
is 2. If the test
is unsuccessful, a warning is thrown.
testMarkers(object, xval = 5, n = 2, fcol = "markers", error = FALSE)
testMarkers(object, xval = 5, n = 2, fcol = "markers", error = FALSE)
object |
An instance of class |
xval |
The number cross-validation partitions. See the
|
n |
Number of additional examples. |
fcol |
The name of the prediction column in the
|
error |
A |
In case the test indicates that a class contains too few examples,
it is advised to either add some or, if not possible, to remove
the class altogether (see minMarkers
)
as the parameter optimisation is likely to fail or, at least,
produce unreliable results for that class.
If successfull, the test invisibly returns NULL
. Else,
it invisibly returns the names of the classes that have too few examples.
Laurent Gatto
getMarkers
and minMarkers
library("pRolocdata") data(dunkley2006) getMarkers(dunkley2006) testMarkers(dunkley2006) toosmall <- testMarkers(dunkley2006, xval = 15) toosmall try(testMarkers(dunkley2006, xval = 15, error = TRUE))
library("pRolocdata") data(dunkley2006) getMarkers(dunkley2006) testMarkers(dunkley2006) toosmall <- testMarkers(dunkley2006, xval = 15) toosmall try(testMarkers(dunkley2006, xval = 15, error = TRUE))
MSnSet
This function creates a stratified 'test' MSnSet
which can be used
for algorihtmic development. A "MSnSet"
containing only
the marker proteins, as defined in fcol
, is returned with a new
feature data column appended called test
in which a stratified subset
of these markers has been relabelled as 'unknowns'.
testMSnSet(object, fcol = "markers", size = 0.2, seed)
testMSnSet(object, fcol = "markers", size = 0.2, seed)
object |
An instance of class |
fcol |
The feature meta-data column name containing the
marker definitions on which the data will be stratified. Default
is |
size |
The size of the data set to be extracted. Default is 0.2 (20 percent). |
seed |
The optional random number generator seed. |
An instance of class "MSnSet"
which
contains only the proteins that have a labelled localisation
i.e. the marker proteins, as defined in fcol
and a new
column in the feature data slot called test
which has part
of the labels relabelled as "unknown" class (the number of
proteins renamed as "unknown" is according to the parameter size).
Lisa Breckels
sampleMSnSet
unknownMSnSet
markerMSnSet
library(pRolocdata) data(tan2009r1) sample <- testMSnSet(tan2009r1) getMarkers(sample, "test") all(dim(sample) == dim(markerMSnSet(tan2009r1)))
library(pRolocdata) data(tan2009r1) sample <- testMSnSet(tan2009r1) getMarkers(sample, "test") all(dim(sample) == dim(markerMSnSet(tan2009r1)))
The possible weights to be considered is a sequence from 0 (favour
auxiliary data) to 1 (favour primary data). Each possible
combination of weights for nclass
classes must be
tested. The thetas
function produces a weight matrix
for nclass
columns (one for each class) with all possible
weight combinations (number of rows).
thetas(nclass, by = 0.5, length.out, verbose = TRUE)
thetas(nclass, by = 0.5, length.out, verbose = TRUE)
nclass |
Number of marker classes |
by |
The increment of the weights. One of |
length.out |
The desired length of the weight sequence. |
verbose |
A |
A matrix with all possible theta weight combinations.
Lisa Breckels
dim(thetas(4, by = 0.5)) dim(thetas(4, by = 0.2)) dim(thetas(5, by = 0.2)) dim(thetas(5, length.out = 5)) dim(thetas(6, by = 0.2))
dim(thetas(4, by = 0.5)) dim(thetas(4, by = 0.2)) dim(thetas(5, by = 0.2)) dim(thetas(5, length.out = 5)) dim(thetas(6, by = 0.2))
This is just a dummy entry for methods from unexported classes that generate warnings during package checking.
Laurent Gatto <[email protected]>
The function assumes that its input is a binary MSnSet
and
computes, for each marker class, the number of non-zero expression
profiles. The function is meant to be used to produce heatmaps
(see the example) and visualise binary (such as GO) MSnSet
objects and assess their utility: all zero features/classes will
not be informative at all (and can be filtered out with
filterBinMSnSet
) while features/classes with many
annotations (GO terms) are likely not be be informative either.
zerosInBinMSnSet(object, fcol = "markers", as.matrix = TRUE, percent = TRUE)
zerosInBinMSnSet(object, fcol = "markers", as.matrix = TRUE, percent = TRUE)
object |
An instance of class |
fcol |
A |
as.matrix |
If |
percent |
If |
A matrix
or a list
indicating the number of
non-zero value per marker class.
Laurent Gatto
library(pRolocdata) data(hyperLOPIT2015goCC) zerosInBinMSnSet(hyperLOPIT2015goCC) zerosInBinMSnSet(hyperLOPIT2015goCC, percent = FALSE) pal <- colorRampPalette(c("white", "blue")) library(lattice) levelplot(zerosInBinMSnSet(hyperLOPIT2015goCC), xlab = "Number of non-0s", ylab = "Marker class", col.regions = pal(140))
library(pRolocdata) data(hyperLOPIT2015goCC) zerosInBinMSnSet(hyperLOPIT2015goCC) zerosInBinMSnSet(hyperLOPIT2015goCC, percent = FALSE) pal <- colorRampPalette(c("white", "blue")) library(lattice) levelplot(zerosInBinMSnSet(hyperLOPIT2015goCC), xlab = "Number of non-0s", ylab = "Marker class", col.regions = pal(140))