Title: | Phenotypes Identification Using Mapper from topological data Analysis |
---|---|
Description: | The PIUMA package offers a tidy pipeline of Topological Data Analysis frameworks to identify and characterize communities in high and heterogeneous dimensional data. |
Authors: | Mattia Chiesa [aut, cre] , Arianna Dagliati [aut] , Alessia Gerbasi [aut] , Giuseppe Albi [aut], Laura Ballarini [aut], Luca Piacentini [aut] |
Maintainer: | Mattia Chiesa <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-11-19 06:13:11 UTC |
Source: | https://github.com/bioc/PIUMA |
This function computes the average of the entropies for each node of a network.
checkNetEntropy(outcome_vect)
checkNetEntropy(outcome_vect)
outcome_vect |
A vector containing the average outcome values for each node of a network. |
The average of the entropies is related to the amount of information stored in the network.
The network entropy using each node of a network.
Mattia Chiesa, Laura Ballarini, Luca Piacentini
makeTDAobj
,
dfToDistance
,
dfToProjection
,
mapperCore
,
jaccardMatrix
,
tdaDfEnrichment
# use example data: set.seed(1) entropy <- checkNetEntropy(round(runif(10),0))
# use example data: set.seed(1) entropy <- checkNetEntropy(round(runif(10),0))
This function assesses the fitting to a scale-free net model.
checkScaleFreeModel(x, showPlot = FALSE)
checkScaleFreeModel(x, showPlot = FALSE)
x |
A TDAobj object, processed by the |
showPlot |
Whether the plot has to be generated. Default: FALSE |
The scale-free networks show a high negative correlation beween k and p(k).
A list containing:
the estimated gamma value
The correlation between the k and the degree distribution p(k).
The p-value of the correlation between the k and the degree distribution p(k).
The correlation between the logarithm (base 10) of k and the logarithm (base 10) of the degree distribution p(k).
The p-value of the correlation between the logarithm (base 10) of k and the logarithm (base 10) of the degree distribution p(k).
Mattia Chiesa, Laura Ballarini,Luca Piacentini
makeTDAobj
,
dfToDistance
,
dfToProjection
,
mapperCore
,
jaccardMatrix
## use example data: data(tda_test_data) #netModel <- checkScaleFreeModel(tda_test_data)
## use example data: data(tda_test_data) #netModel <- checkScaleFreeModel(tda_test_data)
dfToProjection
and
dfToDistance
funtions of PIUMA
package.A dataset to test the dfToProjection
and
dfToDistance
funtions of PIUMA
package.
df_test_proj
df_test_proj
A data.frame containing 15 rows (cells) and 15 columns (genes)
An example dataset for PIUMA
package
This function returns the distance matrix computed by using the Pearson's, Euclidean or Gower distance methods. The distances are computed between the rows of a data.frame in the classical form n x m, where n (rows) are observations and m (columns) are features.
dfToDistance(x, distMethod = c("euclidean", "gower", "pearson"))
dfToDistance(x, distMethod = c("euclidean", "gower", "pearson"))
x |
A TDAobj object, generated by makeTDAobj Rows (n) and columns (m) should be, respectively, observations and features. |
distMethod |
The distance method to calculate the distance matix. "euclidean", "gower" and "pearson" values are allowed. Default: "euclidean". |
The starting TDAobj object, in which the computed distance matrix has been added (slot: 'dist_mat')
Mattia Chiesa, Laura Ballarini, Luca Piacentini
## use example data: data(tda_test_data) dfDist <- dfToDistance(tda_test_data, "euclidean")
## use example data: data(tda_test_data) dfDist <- dfToDistance(tda_test_data, "euclidean")
This function performs the transformation of data from a high dimensional space into a low dimensional space, wrapping 6 well-knwon reduction methods; i.e., PCA, KPCA, t-SNE, UMAP, MDS, and Isomap. In the topological data analysis, the identified components are commonly used as lenses.
dfToProjection( x, method = c("PCA", "UMAP", "TSNE", "MDS", "KPCA", "ISOMAP"), nComp = 2, centerPCA = FALSE, scalePCA = FALSE, umapNNeigh = 15, umapMinDist = 0.1, tsnePerpl = 30, tsneMaxIter = 300, kpcaKernel = c("rbfdot", "laplacedot", "polydot", "tanhdot", "besseldot", "anovadot", "vanilladot", "splinedot"), kpcaSigma = 0.1, kpcaDegree = 1, isomNNeigh = 5, showPlot = FALSE, vectColor = NULL )
dfToProjection( x, method = c("PCA", "UMAP", "TSNE", "MDS", "KPCA", "ISOMAP"), nComp = 2, centerPCA = FALSE, scalePCA = FALSE, umapNNeigh = 15, umapMinDist = 0.1, tsnePerpl = 30, tsneMaxIter = 300, kpcaKernel = c("rbfdot", "laplacedot", "polydot", "tanhdot", "besseldot", "anovadot", "vanilladot", "splinedot"), kpcaSigma = 0.1, kpcaDegree = 1, isomNNeigh = 5, showPlot = FALSE, vectColor = NULL )
x |
A TDAobj object, generated by makeTDAobj |
method |
Name of the dimensionality reduction method to use. "PCA", "UMAP", "TSNE", "MDS", "KPCA" and "isomap" values are allowed. Default is: "PCA". |
nComp |
The number of components to be computed. Default: 2 |
centerPCA |
Whether the data should be centered before PCA. Default:TRUE |
scalePCA |
Whether the data should be scaled before PCA. Default:TRUE |
umapNNeigh |
The number of neighbors for UMAP. Default: 15 |
umapMinDist |
The minimum distance between points for UMAP. Default: 0.1 |
tsnePerpl |
Perplexity argument of t-SNE. Default: 30 |
tsneMaxIter |
The maximum number of iterations for t-SNE. Default: 300 |
kpcaKernel |
The type of kernel for kPCA. "rbfdot", "laplacedot", "polydot", "tanhdot", "besseldot", "anovadot", "vanilladot" and "splinedot" are allowed. Default: "polydot". |
kpcaSigma |
The 'sigma' argument for kPCA. Default: 0.1. |
kpcaDegree |
The 'degree' argument for kPCA. Default: 1. |
isomNNeigh |
The number of neighbors for Isomap. Default: 5. |
showPlot |
Whether the scatter plot of the first two principal components should be shown. Default: TRUE. |
vectColor |
Vector containing the variable tocolor the scatter plot Default: NULL. |
The starting TDAobj object, in which the principal components of projected data have been added (slot:'comp')
Mattia Chiesa, Laura Ballarini, Luca Piacentini
# use example data: data(tda_test_data) set.seed(1) cmp <- dfToProjection(tda_test_data, "PCA", nComp=2)
# use example data: data(tda_test_data) set.seed(1) cmp <- dfToProjection(tda_test_data, "PCA", nComp=2)
The method to get data from the comp slot
getComp(x) ## S4 method for signature 'TDAobj' getComp(x)
getComp(x) ## S4 method for signature 'TDAobj' getComp(x)
x |
a |
a data.frame with the comp data
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to get data from the dfMapper slot
getDfMapper(x) ## S4 method for signature 'TDAobj' getDfMapper(x)
getDfMapper(x) ## S4 method for signature 'TDAobj' getDfMapper(x)
x |
a |
a data.frame with the dfMapper data
Mattia Chiesa
data(tda_test_data) ex_out <- getDfMapper(tda_test_data)
data(tda_test_data) ex_out <- getDfMapper(tda_test_data)
The method to get data from the dist_mat slot
getDistMat(x) ## S4 method for signature 'TDAobj' getDistMat(x)
getDistMat(x) ## S4 method for signature 'TDAobj' getDistMat(x)
x |
a |
a data.frame with the dist_mat data
Mattia Chiesa
data(tda_test_data) ex_out <- getDistMat(tda_test_data)
data(tda_test_data) ex_out <- getDistMat(tda_test_data)
The method to get data from the jacc slot
getJacc(x) ## S4 method for signature 'TDAobj' getJacc(x)
getJacc(x) ## S4 method for signature 'TDAobj' getJacc(x)
x |
a |
a matrix with the jacc data
Mattia Chiesa
data(tda_test_data) ex_out <- getJacc(tda_test_data)
data(tda_test_data) ex_out <- getJacc(tda_test_data)
The method to get data from the node_data_mat slot
getNodeDataMat(x) ## S4 method for signature 'TDAobj' getNodeDataMat(x)
getNodeDataMat(x) ## S4 method for signature 'TDAobj' getNodeDataMat(x)
x |
a |
a data.frame with the node_data_mat data
Mattia Chiesa
data(tda_test_data) ex_out <- getNodeDataMat(tda_test_data)
data(tda_test_data) ex_out <- getNodeDataMat(tda_test_data)
The method to get data from the orig_data slot
getOrigData(x) ## S4 method for signature 'TDAobj' getOrigData(x)
getOrigData(x) ## S4 method for signature 'TDAobj' getOrigData(x)
x |
a |
a data.frame with the original data
Mattia Chiesa
data(tda_test_data) ex_out <- getOrigData(tda_test_data)
data(tda_test_data) ex_out <- getOrigData(tda_test_data)
The method to get data from the outcome slot
getOutcome(x) ## S4 method for signature 'TDAobj' getOutcome(x)
getOutcome(x) ## S4 method for signature 'TDAobj' getOutcome(x)
x |
a |
a data.frame with the outcome data
Mattia Chiesa
data(tda_test_data) ex_out <- getOutcome(tda_test_data)
data(tda_test_data) ex_out <- getOutcome(tda_test_data)
The method to get data from the outcomeFact slot
getOutcomeFact(x) ## S4 method for signature 'TDAobj' getOutcomeFact(x)
getOutcomeFact(x) ## S4 method for signature 'TDAobj' getOutcomeFact(x)
x |
a |
a data.frame with the outcomeFact data
Mattia Chiesa
data(tda_test_data) ex_out <- getOutcomeFact(tda_test_data)
data(tda_test_data) ex_out <- getOutcomeFact(tda_test_data)
The method to get data from the scaled_data slot
getScaledData(x) ## S4 method for signature 'TDAobj' getScaledData(x)
getScaledData(x) ## S4 method for signature 'TDAobj' getScaledData(x)
x |
a |
a data.frame with the scaled data
Mattia Chiesa
data(tda_test_data) ex_out <- getScaledData(tda_test_data)
data(tda_test_data) ex_out <- getScaledData(tda_test_data)
This function computes the Jaccard index for each pair of
nodes contained in TDAobj, generated by the mapperCore
function. The resulting data.frame can be used to represent data as a
network, for instance, in Cytoscape
jaccardMatrix(x)
jaccardMatrix(x)
x |
A TDAobj object, processed by the |
The Jaccard index measures the similarity of two nodes A and B. It ranges from 0 to 1. If A and B share no members, their Jaccard index would be 0 (= NA). If A and B share all members, their Jaccard index would be 1. Hence, the higher the index, the more similar the two nodes. If the Jaccard index between A and B is different from NA, it means that an edge exists between A and B. The output matrix of Jaccard indexes can be used as an adjacency matrix. The resulting data.frame can be used to represent data as a network, for instance, in Cytoscape.
The starting TDAobj object, in which the matrix of Jaccard indexes, calculated comparing each node of the 'dfMapper' slot, has been added (slot: 'jacc')
Mattia Chiesa, Laura Ballarini, Luca Piacentini
makeTDAobj
,
dfToDistance
,
dfToProjection
,
mapperCore
## use example data: data(tda_test_data) jacc_mat <- jaccardMatrix(tda_test_data)
## use example data: data(tda_test_data) jacc_mat <- jaccardMatrix(tda_test_data)
This function import a data.frame and create the object to store all data needed for TDA analysis. In addition, some preliminary preprocess steps are performed; specifically, outcomes variables data will be separated the rest of dataset. The remaining dataset will be also re-scaled (0-1)
makeTDAobj(df, outcomes)
makeTDAobj(df, outcomes)
df |
A data.frame representing a dataset in the classical n x m form. Rows (n) and columns (m) should be, respectively, observations and features. |
outcomes |
A string or vector of string containing the name of variables that have to be considered 'outcomes' |
A TDA object containing:
orig_data A data.frame of original data (without outcomes)
scaled_data A data.frame of re-scaled data (without outcomes)
outcomeFact A data.frame of original outcomes
outcome A data.frame of original outcomes converted as numeric
comp A data.frame containing the components of projected data
dist_mat A data.frame containing the computed distance matrix
dfMapper A data.frame containing the nodes, with their elements, identified by TDA
jacc A matrix of Jaccard indexes between each pair of dfMapper nodes
node_data_mat A data.frame with the node size and the average value
Mattia Chiesa, Laura Ballarini, Luca Piacentini
## use example data: data("vascEC_meta") data("vascEC_norm") df <- cbind(vascEC_meta,vascEC_norm) res <- makeTDAobj(df, "zone")
## use example data: data("vascEC_meta") data("vascEC_norm") df <- cbind(vascEC_meta,vascEC_norm) res <- makeTDAobj(df, "zone")
This function import a SummarizedExperiment
object
and create
the object to store all data needed for TDA analysis. In addition, some
preliminary preprocess steps are performed; specifically, outcomes
variables data will be separated the rest of dataset.
The remaining dataset will be also re-scaled (0-1)
makeTDAobjFromSE(SE, outcomes)
makeTDAobjFromSE(SE, outcomes)
SE |
A |
outcomes |
A string or vector of string containing the name of variables that have to be considered 'outcomes' |
A TDA object containing:
orig_data A data.frame of original data (without outcomes)
scaled_data A data.frame of re-scaled data (without outcomes)
outcomeFact A data.frame of original outcomes
outcome A data.frame of original outcomes converted as numeric
comp A data.frame containing the components of projected data
dist_mat A data.frame containing the computed distance matrix
dfMapper A data.frame containing the nodes, with their elements, identified by TDA
jacc A matrix of Jaccard indexes between each pair of dfMapper nodes
node_data_mat A data.frame with the node size and the average value
Mattia Chiesa, Laura Ballarini, Luca Piacentini
## use example data: data("vascEC_meta") data("vascEC_norm") suppressMessages(library(SummarizedExperiment)) dataSE <- SummarizedExperiment(assays=as.matrix(t(vascEC_norm)), colData=as.data.frame(vascEC_meta)) res <- makeTDAobjFromSE(dataSE, "zone")
## use example data: data("vascEC_meta") data("vascEC_norm") suppressMessages(library(SummarizedExperiment)) dataSE <- SummarizedExperiment(assays=as.matrix(t(vascEC_norm)), colData=as.data.frame(vascEC_meta)) res <- makeTDAobjFromSE(dataSE, "zone")
This is a comprehensive function permitting to perform the core TDA Mapper algorithm with 2D lenses. It allow setting several types of clustering methods.
mapperCore( x, nBins = 15, overlap = 0.4, mClustNode = 2, remEmptyNode = TRUE, clustMeth = c("kmeans", "HR", "DBSCAN", "OPTICS"), HRMethod = c("average", "complete") )
mapperCore( x, nBins = 15, overlap = 0.4, mClustNode = 2, remEmptyNode = TRUE, clustMeth = c("kmeans", "HR", "DBSCAN", "OPTICS"), HRMethod = c("average", "complete") )
x |
A TDAobj object, processed by the |
nBins |
The number of bins (i.e. the resolution of the cover). Default: 15. |
overlap |
The overlap between bins (i.e.the gain of the cover). Default: 0.4. |
mClustNode |
The number of clusters in each overlapping bin. Default: 2 |
remEmptyNode |
A logical value to remove or not the empty nodes from the resulting data.frame. Default: TRUE. |
clustMeth |
The clustering algorithm."HR", "kmeans", "DBSCAN", and "OPTICS" are allowed. Default: "kmeans". |
HRMethod |
The name of the linkage criterion (when clustMeth="HR"). "average" and "complete" values are allowed. Default: "average". |
The starting TDAobj object, in which the result of mapper algorithm (inferred nodes with their elements) has been added (slot: 'dfMapper')
A data.frame containing the clusters, with their elements, identified by TDA .
Mattia Chiesa, Laura Ballarini, Luca Piacentini
makeTDAobj
,
dfToDistance
,
dfToProjection
# use example data: data(tda_test_data) set.seed(1) dfMapper <- mapperCore(tda_test_data, nBins=5, overlap=0.5, mClustNode=2, clustMeth="kmeans")
# use example data: data(tda_test_data) set.seed(1) dfMapper <- mapperCore(tda_test_data, nBins=5, overlap=0.5, mClustNode=2, clustMeth="kmeans")
The application of unsupervised learning methodologies could help the identification of specific phenotypes in huge heterogeneous cohorts, such as clinical or -omics data. Among them, the Topological Data Analysis (TDA) is a rapidly growing field that combines concepts from algebraic topology and computational geometry to analyze and extract meaningful information from complex and high-dimensional data sets. Moreover, TDA is a robust and effective methodology, able to preserve the intrinsic characteristics of data and the mutual relations among observations, depicting complex data in a graph-based representation. Indeed, building topological models as networks, TDA allows complex diseases to be inspected in a continuous space, where subjects can fluctuate over the graph, sharing, at the same time, more than one adjacent node of the network. Overall, TDA offers a powerful set of tools to capture the underlying topological features of data, revealing essential patterns and relationships that might be hidden from traditional statistical techniques. The PIUMA package (Phenotypes Identification Using Mapper from topological data Analysis) allows implementing all the main steps of a Topological Data Analysis. PIUMA is the italian word meaning 'feather'.
See the package vignette, by typing vignette("PIUMA")
to discover
all the functions.
Mattia Chiesa, Laura Ballarini, Luca Piacentini
The method to set the comp slot
setComp(x, y) ## S4 method for signature 'TDAobj' setComp(x, y)
setComp(x, y) ## S4 method for signature 'TDAobj' setComp(x, y)
x |
a |
y |
a data.frame with the comp data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the dfMapper slot
setDfMapper(x, y) ## S4 method for signature 'TDAobj' setDfMapper(x, y)
setDfMapper(x, y) ## S4 method for signature 'TDAobj' setDfMapper(x, y)
x |
a |
y |
a data.frame with the dfMapper data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the dist_mat slot
setDistMat(x, y) ## S4 method for signature 'TDAobj' setDistMat(x, y)
setDistMat(x, y) ## S4 method for signature 'TDAobj' setDistMat(x, y)
x |
a |
y |
a data.frame with the dist_mat data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the jacc slot
setJacc(x, y) ## S4 method for signature 'TDAobj' setJacc(x, y)
setJacc(x, y) ## S4 method for signature 'TDAobj' setJacc(x, y)
x |
a |
y |
a matrix with the jacc data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the node_data_mat slot
setNodeDataMat(x, y) ## S4 method for signature 'TDAobj' setNodeDataMat(x, y)
setNodeDataMat(x, y) ## S4 method for signature 'TDAobj' setNodeDataMat(x, y)
x |
a |
y |
a data.frame with the node_data_mat data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the orig_data slot
setOrigData(x, y) ## S4 method for signature 'TDAobj' setOrigData(x, y)
setOrigData(x, y) ## S4 method for signature 'TDAobj' setOrigData(x, y)
x |
a |
y |
a data.frame with the original data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the outcome slot
setOutcome(x, y) ## S4 method for signature 'TDAobj' setOutcome(x, y)
setOutcome(x, y) ## S4 method for signature 'TDAobj' setOutcome(x, y)
x |
a |
y |
a data.frame with the outcome data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the outcomeFact slot
setOutcomeFact(x, y) ## S4 method for signature 'TDAobj' setOutcomeFact(x, y)
setOutcomeFact(x, y) ## S4 method for signature 'TDAobj' setOutcomeFact(x, y)
x |
a |
y |
a data.frame with the outcomeFact data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
The method to set the scaled_data slot
setScaledData(x, y) ## S4 method for signature 'TDAobj' setScaledData(x, y)
setScaledData(x, y) ## S4 method for signature 'TDAobj' setScaledData(x, y)
x |
a |
y |
a data.frame with the scaled data |
a TDAobj
object
Mattia Chiesa
data(tda_test_data)
data(tda_test_data)
PIUMA
package.A TDAobj to test the PIUMA
package.
tda_test_data
tda_test_data
A TDAobj with data in all slots
An example dataset for PIUMA
package
This function computes the average value of additional features provided by the user and calculate the size for each node of 'dfMapper' slot
tdaDfEnrichment(x, df)
tdaDfEnrichment(x, df)
x |
A TDAobj object, processed by the |
df |
A data.frame with scaled values in the classical n x m form: rows (n) and columns (m) must be observations and features, respectively. |
The starting TDAobj object, in which the a data.frame with additional information for each node has been added (slot: 'node_data_mat')
Mattia Chiesa, Laura Ballarini, Luca Piacentini
makeTDAobj
,
dfToDistance
,
dfToProjection
,
mapperCore
,
jaccardMatrix
## use example data: data(tda_test_data) data(df_test_proj) enrich_mat_tda <- tdaDfEnrichment(tda_test_data, df_test_proj)
## use example data: data(tda_test_data) data(df_test_proj) enrich_mat_tda <- tdaDfEnrichment(tda_test_data, df_test_proj)
The TDA object for storing TDA data
TDAobj class showClass("TDAobj")
orig_data
A data.frame of original data (without outcomes)
scaled_data
A data.frame of re-scaled data (without outcomes)
outcomeFact
A data.frame of original outcomes
outcome
A data.frame of original outcomes converted as numeric
comp
A data.frame containing the components of projected data
dist_mat
A data.frame containing the computed distance matrix
dfMapper
A data.frame containing the nodes, with their elements, identified by TDA
jacc
A matrix of Jaccard indexes between each pair of dfMapper nodes
node_data_mat
A data.frame with the node size and the average value of each feature
We tested PIUMA on a subset of the single-cell RNA Sequencing dataset (GSE:GSE193346 generated and published by Feng et al. (2022) on Nature Communication to demonstrate that distinct transcriptional profiles are present in specific cell types of each heart chambers, which were attributed to have roles in cardiac development. In this tutorial, our aim will be to exploit PIUMA for identifying sub-population of vascular endothelial cells, which can be associated with specific heart developmental stages. The original dataset consisted of three layers of heterogeneity: cell type, stage and zone (i.e., heart chamber). Our testing dataset was obtained by subsetting vascular endothelial cells (cell type) by Seurat object, extracting raw counts and metadata. Thus, we filtered low expressed genes and normalized data by DaMiRseq
vascEC_meta
vascEC_meta
A dataframe containing 1180 rows (cells) and 2 columns (outcomes)
An example dataset for PIUMA
package
We tested PIUMA on a subset of the single-cell RNA Sequencing dataset (GSE:GSE193346 generated and published by Feng et al. (2022) on Nature Communication to demonstrate that distinct transcriptional profiles are present in specific cell types of each heart chambers, which were attributed to have roles in cardiac development. In this tutorial, our aim will be to exploit PIUMA for identifying sub-population of vascular endothelial cells, which can be associated with specific heart developmental stages. The original dataset consisted of three layers of heterogeneity: cell type, stage and zone (i.e., heart chamber). Our testing dataset was obtained by subsetting vascular endothelial cells (cell type) by Seurat object, extracting raw counts and metadata. Thus, we filtered low expressed genes and normalized data by DaMiRseq
vascEC_norm
vascEC_norm
A matrix containing 1180 rows (cells) and 838 columns (genes)
An example dataset for PIUMA
package