Title: | An Evolutionary Framework for the Identification and Study of Prognostic Gene Expression Signatures in Cancer |
---|---|
Description: | A multi-objective optimization algorithm for disease sub-type discovery based on a non-dominated sorting genetic algorithm. The 'Galgo' framework combines the advantages of clustering algorithms for grouping heterogeneous 'omics' data and the searching properties of genetic algorithms for feature selection. The algorithm search for the optimal number of clusters determination considering the features that maximize the survival difference between sub-types while keeping cluster consistency high. |
Authors: | Martin Guerrero [aut], Carlos Catania [cre] |
Maintainer: | Carlos Catania <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.17.0 |
Built: | 2024-12-29 05:24:04 UTC |
Source: | https://github.com/bioc/GSgalgoR |
This package was developed to provide a simple to use set of functions to use the galgo algorithm. A multi-objective optimization algorithm for disease subtype discovery based on a non-dominated sorting genetic algorithm.
Different statistical and machine learning approaches have long been used to identify gene expression/molecular signatures with prognostic potential in different cancer types. Nonetheless, the molecular classification of tumors is a difficult task and the results obtained via the current statistical methods are highly dependent on the features analyzed, the number of possible tumor subtypes under consideration, and the underlying assumptions made about the data. In addition, some cancer types are still lacking prognostic signatures and/or of subtype-specific predictors which are continually needed to further dissect tumor biology. In order to identify specific molecular phenotypes to develop precision medicine strategies we present Galgo: A multi-objective optimization process based on a non-dominated sorting genetic algorithm that combines the advantages of clustering methods for grouping heterogeneous omics data and the exploratory properties of genetic algorithms (GA) in order to find features that maximize the survival difference between subtypes while keeping high cluster consistency.
Package: | GSgalgoR |
Type: | Package |
Version: | 1.0.0 |
Date: | 2020-05-06 |
License: | GPL-3 |
Copyright: | (c) 2020 Martin E. Guerrero-Gimenez. |
URL: | https://www.github.com/harpomaxx/galgo |
LazyLoad: | yes |
Martin E. Guerrero-Gimenez [email protected]
Maintainer: Carlos A. Catania [email protected]
Functions to calculate distance matrices using cpu computing
calculate_distance_pearson_cpu(x) calculate_distance_spearman_cpu(x) calculate_distance_uncentered_cpu(x) calculate_distance_euclidean_cpu(x) select_distance(distancetype = "pearson")
calculate_distance_pearson_cpu(x) calculate_distance_spearman_cpu(x) calculate_distance_uncentered_cpu(x) calculate_distance_euclidean_cpu(x) select_distance(distancetype = "pearson")
x |
an expression matrix with features as rows and samples as columns |
distancetype |
a |
select_distance(distancetype)
assigns global function
calculate_distance according to the parameters specified
calculate_distance_pearson_cpu(x)
returns columnwise pearson
distance calculated using the CPU
calculate_distance_uncentered_cpu(x)
returns columnwise
uncentered pearson distance calculated using the CPU
calculate_distance_spearman_cpu(x)
returns columnwise
spearman distance calculated using the CPU
calculate_distance_euclidean_cpu(x)
returns columnwise
euclidean distance calculated using the CPU
Martin E Guerrero-Gimenez, [email protected]
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) calculate_distance <- select_distance(distancetype = "pearson") Dist <- calculate_distance(train.Exp) k <- 4 Pam <- cluster_algorithm(Dist, k) table(Pam$cluster)
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) calculate_distance <- select_distance(distancetype = "pearson") Dist <- calculate_distance(train.Exp) k <- 4 Pam <- cluster_algorithm(Dist, k) table(Pam$cluster)
Print basic info per generation
callback_base_report (userdir, generation, pop_pool, pareto, prob_matrix, current_time)
callback_base_report (userdir, generation, pop_pool, pareto, prob_matrix, current_time)
userdir |
the default directory used by 'galgo()' to store files |
generation |
a number indicating the number of iterations of the galgo algorithm |
pop_pool |
a |
pareto |
the solutions found by Galgo across all generations in the solution space |
prob_matrix |
a |
current_time |
an |
Nothing.
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with base_report_callback assigned to the report_callback # hook-point GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, report_callback = callback_base_report )
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with base_report_callback assigned to the report_callback # hook-point GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, report_callback = callback_base_report )
A base callback function that returns a galgo.Obj
callback_base_return_pop (userdir = "",generation, pop_pool, pareto, prob_matrix, current_time)
callback_base_return_pop (userdir = "",generation, pop_pool, pareto, prob_matrix, current_time)
userdir |
the default directory used by 'galgo()' to store files |
generation |
a number indicating the number of iterations of the galgo algorithm |
pop_pool |
a |
pareto |
the solutions found by Galgo across all generations in the solution space |
prob_matrix |
a |
current_time |
an |
an object of class galgo
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with base_return_pop_callback assigned to the end_galgo_callback # hook-point # By using this callback galgo() return a `galgo,Obj` object. output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, end_galgo_callback = callback_base_return_pop )
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with base_return_pop_callback assigned to the end_galgo_callback # hook-point # By using this callback galgo() return a `galgo,Obj` object. output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, end_galgo_callback = callback_base_return_pop )
A default call_back function that does nothing.
callback_default (userdir = "",generation, pop_pool, pareto, prob_matrix, current_time)
callback_default (userdir = "",generation, pop_pool, pareto, prob_matrix, current_time)
userdir |
the default directory used by |
generation |
a number indicating the number of iterations of the galgo algorithm |
pop_pool |
a |
pareto |
the solutions found by Galgo across all generations in the solution space |
prob_matrix |
a |
current_time |
an |
Nothing
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with default_callback assigned to all the hook-points GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, start_galgo_callback = callback_default,# When Galgo is about to start. end_galgo_callback = callback_default, # When Galgo is about to finish. start_gen_callback = callback_default, # At the beginning of each iteration. end_gen_callback = callback_default, # At the end of each iteration. )
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with default_callback assigned to all the hook-points GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, start_galgo_callback = callback_default,# When Galgo is about to start. end_galgo_callback = callback_default, # When Galgo is about to finish. start_gen_callback = callback_default, # At the beginning of each iteration. end_gen_callback = callback_default, # At the end of each iteration. )
The main idea behind this callback function is to provide some feedback to the user about galgo execution. No other relevant information is shown
callback_no_report (userdir = "",generation, pop_pool, pareto, prob_matrix, current_time)
callback_no_report (userdir = "",generation, pop_pool, pareto, prob_matrix, current_time)
userdir |
the default directory used by 'galgo()' to store files |
generation |
a number indicating the number of iterations of the galgo algorithm |
pop_pool |
a |
pareto |
the solutions found by Galgo across all generations in the solution space |
prob_matrix |
a |
current_time |
an |
Nothing.
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with no_report_callback assigned to the report_callback # hook-point GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, report_callback = callback_no_report )
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo with no_report_callback assigned to the report_callback # hook-point GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS, report_callback = callback_no_report )
Classify samples from multiple centroids
classify_multiple(prob_matrix, centroid_list, distancetype = "pearson")
classify_multiple(prob_matrix, centroid_list, distancetype = "pearson")
prob_matrix |
a |
centroid_list |
a |
distancetype |
a |
Returns a data.frame
with the classes assigned to
each sample in each signature, were samples are a rows and signatures
in columns
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output) RESULTS <- non_dominated_summary( output = output, OS = OS, prob_matrix = expression, distancetype = "pearson" ) CentroidsList <- create_centroids(output, RESULTS$solution, trainset = expression) classes <- classify_multiple(prob_matrix = expression, centroid_list = CentroidsList)
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output) RESULTS <- non_dominated_summary( output = output, OS = OS, prob_matrix = expression, distancetype = "pearson" ) CentroidsList <- create_centroids(output, RESULTS$solution, trainset = expression) classes <- classify_multiple(prob_matrix = expression, centroid_list = CentroidsList)
In GSgalgoR
, the partition around medioids (PAM) algorithm is the
default clustering process used under the evolutionary process.
cluster_algorithm(c, k)
cluster_algorithm(c, k)
c |
a dissimilarity matrix object of type |
k |
positive integer specifying the number of clusters, less than the number of observations |
The function runs the pam
function of
the 'cluster'
package
with options cluster.only =TRUE
,
diss = TRUE
, do.swap=TRUE
,
keep.diss=FALSE
, keep.data = FALSE
,
pamonce= 2
Returns a 'list'
with the value '$cluster'
which
contains the cluster assignment of each of the samples evaluated
Reynolds, A., Richards, G., de la Iglesia, B. and Rayward-Smith, V. (1992) Clustering rules: A comparison of partitioning and hierarchical clustering algorithms; Journal of Mathematical Modelling and Algorithms 5, 475–504. 10.1007/s10852-005-9022-1.
Erich Schubert and Peter J. Rousseeuw (2019) Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms; Preprint, (https://arxiv.org/abs/1810.05691).
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) calculate_distance <- select_distance(distancetype = "pearson") Dist <- calculate_distance(train.Exp) k <- 4 Pam <- cluster_algorithm(Dist, k) table(Pam$cluster)
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) calculate_distance <- select_distance(distancetype = "pearson") Dist <- calculate_distance(train.Exp) k <- 4 Pam <- cluster_algorithm(Dist, k) table(Pam$cluster)
Given an matrix of centroids, where
are the prototypic
centroids with
features, classify new samples according to the
distance to the centroids.
cluster_classify(data, centroid, method = "pearson")
cluster_classify(data, centroid, method = "pearson")
data |
a |
centroid |
a |
method |
Character string indicating which method to use to calculate
distance to centroid. Options are |
Returns a numeric vector of length with the class assigned
to each sample according to the shortest distance to centroid
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) data(IntClustMemb) TrainData <- list(x = train.Exp, y = IntClustMemb) # Create prototypic centroids pam <- pamr.train(TrainData) centroids <- pam$centroids Class <- cluster_classify(train.Exp, centroids) table(Class, IntClustMemb)
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) data(IntClustMemb) TrainData <- list(x = train.Exp, y = IntClustMemb) # Create prototypic centroids pam <- pamr.train(TrainData) centroids <- pam$centroids Class <- cluster_classify(train.Exp, centroids) table(Class, IntClustMemb)
Cosine similarity is a metric of similarity between two non-zero vectors
of an inner product space that measures the cosine of the angle between them.
Two vectors with the same orientation have a cosine similarity of 1, if
they are perpendicular they have a similarity of 0, and if they have
opposing directions the cosine similarity is -1, independent of their
magnitude.
One advantage of cosine similarity is its low-complexity, especially for
sparse vectors where only the non-zero dimensions need to be considered,
which is a common case in GSgalgoR
.
Other names of cosine similarity are Otuska-Orchini similarity when it is
applied to binary data, which is the case for GSgalgoR
, where
individual solutions represented as strings of 0 and 1 are compared with t
his metric.
cosine_similarity(a, b)
cosine_similarity(a, b)
a , b
|
A string of numbers with equal length. It can also be two binary strings of 0's and 1's |
In practice, the function can return numeric values from -1 to 1 according the vector orientations, where a cosine similarity of 1 implies same orientation of the vectors while -1 imply vector of opposing directions. In the binary application, values range from 0 to 1, where 0 are totally discordant vectors while 1 are identical binary vectors.
solution1 <- c(1, 0, 0, 1, 0, 0, 1) solution2 <- solution1 r <- cosine_similarity(solution1, solution2) # the cosine similarity (r) equals 1 solution2 <- abs(solution1 - 1) r2 <- cosine_similarity(solution1, solution2) # the cosine similarity (r2) equals 0
solution1 <- c(1, 0, 0, 1, 0, 0, 1) solution2 <- solution1 r <- cosine_similarity(solution1, solution2) # the cosine similarity (r) equals 1 solution2 <- abs(solution1 - 1) r2 <- cosine_similarity(solution1, solution2) # the cosine similarity (r2) equals 0
This functions create the signature centroids estimated from the GalgoR output and the expression matrix of the training sets.
create_centroids (output, solution_names, trainset, distancetype = "pearson")
create_centroids (output, solution_names, trainset, distancetype = "pearson")
output |
@param output An object of class |
solution_names |
A |
trainset |
a |
distancetype |
a |
Returns a list with the centroid matrix for each of the solutions
in solution_names
, where each column represents the prototypic
centroid of a subtype and each row the constituents features of the
solution signature
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output) RESULTS <- non_dominated_summary( output = output, OS = OS, prob_matrix = expression, distancetype = "pearson" ) CentroidsList <- create_centroids(output, RESULTS$solution, trainset = expression)
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output) RESULTS <- non_dominated_summary( output = output, OS = OS, prob_matrix = expression, distancetype = "pearson" ) CentroidsList <- create_centroids(output, RESULTS$solution, trainset = expression)
galgo
accepts an expression matrix and a
survival object to find robust gene expression signatures related to a
given outcome
galgo (population = 30, generations = 2, nCV = 5, distancetype = "pearson", TournamentSize = 2, period = 1825, OS, prob_matrix, res_dir = "", start_galgo_callback = callback_default, end_galgo_callback = callback_base_return_pop, report_callback = callback_base_report, start_gen_callback = callback_default, end_gen_callback = callback_default, verbose = 2)
galgo (population = 30, generations = 2, nCV = 5, distancetype = "pearson", TournamentSize = 2, period = 1825, OS, prob_matrix, res_dir = "", start_galgo_callback = callback_default, end_galgo_callback = callback_base_return_pop, report_callback = callback_base_report, start_gen_callback = callback_default, end_gen_callback = callback_default, verbose = 2)
population |
a number indicating the number of solutions in the population of solutions that will be evolved |
generations |
a number indicating the number of iterations of the galgo algorithm |
nCV |
number of cross-validation sets |
distancetype |
character, it can be
|
TournamentSize |
a number indicating the size of the tournaments for the selection procedure |
period |
a number indicating the outcome period to evaluate the RMST |
OS |
a |
prob_matrix |
a |
res_dir |
a |
start_galgo_callback |
optional callback function for the start of the galgo execution |
end_galgo_callback |
optional callback function for the end of the galgo execution |
report_callback |
optional callback function |
start_gen_callback |
optional callback function for the beginning of the run |
end_gen_callback |
optional callback function for the end of the run |
verbose |
select the level of information printed during galgo execution |
an object of type 'galgo.Obj'
that corresponds to a list
with the elements $Solutions
and $ParetoFront
.
$Solutions
is a matrix where
is the number
of features evaluated and
is the number of solutions obtained.
The submatrix
is a binary matrix where each row represents
the chromosome of an evolved solution from the solution population, where
each feature can be present (1) or absent (0) in the solution.
Column
represent the
number of clusters for each
solutions. Column
to
shows the SC Fitness and
Survival Fitness values, the solution rank, and the crowding distance of
the solution in the final pareto front respectively.
For easier interpretation of the
'galgo.Obj'
, the output can be
reshaped using the to_list
and
to_dataframe
functions
Martin E Guerrero-Gimenez, [email protected]
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(seq_len(nrow(expression)), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output)
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(seq_len(nrow(expression)), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output)
This function calculates the mean value for each feature of each class to calculate the prototypic centroids of the different groups
k_centroids(data, class)
k_centroids(data, class)
data |
a scaled gene expression |
class |
a vector with the samples classes |
returns a data.frame
with the estimated prototypic centroids
for each class with the features names as rownames
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) calculate_distance <- select_distance(distancetype = "pearson") Dist <- calculate_distance(train.Exp) k <- 4 Pam <- cluster_algorithm(Dist, k) table(Pam$cluster) centroids <- k_centroids(train.Exp, Pam)
# load example dataset require(iC10TrainingData) require(pamr) data(train.Exp) calculate_distance <- select_distance(distancetype = "pearson") Dist <- calculate_distance(train.Exp) k <- 4 Pam <- cluster_algorithm(Dist, k) table(Pam$cluster) centroids <- k_centroids(train.Exp, Pam)
The function uses a 'galgo.Obj'
as input an the training dataset to
evaluate the non-dominated solutions found by GalgoR
non_dominated_summary (output, prob_matrix, OS, distancetype = "pearson")
non_dominated_summary (output, prob_matrix, OS, distancetype = "pearson")
output |
An object of class |
prob_matrix |
a |
OS |
a |
distancetype |
a |
Returns a data.frame
with 5 columns and a number of rows
equals to the non-dominated solutions found by GalgoR.
The first column has the name of the non-dominated solutions, the second
the number of partitions found for each solution (k)
, the third,
the number of genes, the fourth the mean silhouette coefficient of the
solution and the last columns has the estimated C.Index for each one.
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) non_dominated_summary( output = output, OS = OS, prob_matrix = expression, distancetype = "pearson" )
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) non_dominated_summary( output = output, OS = OS, prob_matrix = expression, distancetype = "pearson" )
Plot pareto front from an galgo.Obj
plot_pareto(output)
plot_pareto(output)
output |
An object of class |
This function returns a scatterplot showing the solutions found by Galgo accross all generations in the solution space, where the Silhouette Fitness is in the x-axis and the survival fitness in the y-axis. A line is drawn over all non-dominated solutions showing the estimated Pareto front
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) plot_pareto(output)
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) plot_pareto(output)
Survival fitness function using the Restricted Mean Survival Time (RMST) of each group as proposed by Dehbi & Royston et al. (2017).
surv_fitness(OS, clustclass, period)
surv_fitness(OS, clustclass, period)
OS |
a |
clustclass |
a numeric vector with the group label for each patient |
period |
a number representing the period of time to evaluate in the RMST calculation |
The function computes the Harmonic mean of the differences between Restricted Mean Survival Time (RMST) of consecutive survival curves multiplied by the number of comparisons.
Martin E Guerrero-Gimenez, [email protected]
Dehbi Hakim-Moulay, Royston Patrick, Hackshaw Allan. Life expectancy difference and life expectancy ratio: two measures of treatment effects in randomized trials with non-proportional hazards BMJ 2017; 357 :j2250 https://www.bmj.com/content/357/bmj.j2250
# load example dataset library(breastCancerTRANSBIG) library(Biobase) data(transbig) Train <- transbig rm(transbig) clinical <- pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) surv_fitness(OS, clustclass = clinical$grade, period = 3650)
# load example dataset library(breastCancerTRANSBIG) library(Biobase) data(transbig) Train <- transbig rm(transbig) clinical <- pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) surv_fitness(OS, clustclass = clinical$grade, period = 3650)
The current function transforms a galgo.Obj
to a data.frame
to_dataframe(output)
to_dataframe(output)
output |
An object of class |
The current function restructurates a galgo.Obj
to a more
easy to understand an use data.frame
. The output data.frame
has dimensions, were the rownames (
) are the solutions
obtained by the
galgo
algorithm.
The columns has the following structure:
Genes: The features included in each solution in form
of a list
k: The number of partitions found in that solution
SC.Fit: The average silhouette coefficient of the partitions found
Surv.Fit: The survival fitness value
Rank: The solution rank
CrowD: The solution crowding distance related to the rest of the solutions
Martin E Guerrero-Gimenez, [email protected]
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output)
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output)
The current function transforms a galgo.Obj
to a list
to_list(output)
to_list(output)
output |
An object of class |
The current function restructurates a galgo.Obj
to a more
easy to understand an use list
. This output is particularly useful
if one wants to select a given solution and use its outputs in a new
classifier. The output of type list
has a length equals to the
number of solutions obtained by the galgo
algorithm.
Basically this output is a list of lists, where each element of the output
is named after the solution's name (solution.n
, where n
is
the number assigned to that solution), and inside of it, it has all the
constituents for that given solution with the following structure:
output$solution.n$Genes: A vector of the features included in the solution
output$solution.n$k: The number of partitions found in that solution
output$solution.n$SC.Fit: The average silhouette coefficient of the partitions found
output$solution.n$Surv.Fit: The survival fitness value
output$solution.n$Rank: The solution rank
CrowD: The solution crowding distance related to the rest of the solutions
Martin E Guerrero-Gimenez, [email protected]
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output)
# load example dataset library(breastCancerTRANSBIG) data(transbig) Train <- transbig rm(transbig) expression <- Biobase::exprs(Train) clinical <- Biobase::pData(Train) OS <- survival::Surv(time = clinical$t.rfs, event = clinical$e.rfs) # We will use a reduced dataset for the example expression <- expression[sample(1:nrow(expression), 100), ] # Now we scale the expression matrix expression <- t(scale(t(expression))) # Run galgo output <- GSgalgoR::galgo(generations = 5, population = 15, prob_matrix = expression, OS = OS) outputDF <- to_dataframe(output) outputList <- to_list(output)