Title: | GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets |
---|---|
Description: | Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets. |
Authors: | Mattia Chiesa <[email protected]>, Luca Piacentini <[email protected]> |
Maintainer: | Mattia Chiesa <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.27.0 |
Built: | 2024-11-19 04:45:36 UTC |
Source: | https://github.com/bioc/GARS |
The AllPop slot contains the list of populations
AllPop(x) ## S4 method for signature 'GarsSelectedFeatures' AllPop(x)
AllPop(x) ## S4 method for signature 'GarsSelectedFeatures' AllPop(x)
x |
a |
a list containing all the populations
Mattia Chiesa, Luca Piacentini
data(GARS_res_GA) ex_pop <- AllPop(GARS_res_GA)
data(GARS_res_GA) ex_pop <- AllPop(GARS_res_GA)
The FitScore slot contains the fitness values over the generations
FitScore(x) ## S4 method for signature 'GarsSelectedFeatures' FitScore(x)
FitScore(x) ## S4 method for signature 'GarsSelectedFeatures' FitScore(x)
x |
a |
a vector containing the fitness scores
Mattia Chiesa, Luca Piacentini
data(GARS_res_GA) ex_pop <- FitScore(GARS_res_GA)
data(GARS_res_GA) ex_pop <- FitScore(GARS_res_GA)
The main function of GARS is GARS_GA
, which implements a
clustering-based Genetic Algorithm to select
Robust Subsets of features in high-dimensional datasets.
The user can extract the results of
GARS_GA
, exploiting the assessor methods:
MatrixFeatures
, LastPop
,
AllPop
and FitScore
.
See the package vignette, by typing vignette("GARS")
to discover
all the GARS_GA
functions.
Mattia Chiesa, Giada Maioli, Luca Piacentini
The class labels of the sample dataset
GARS_classes
GARS_classes
A vector of type "factor" with 58 elements: 29 labelled as "N" and 29 labelled as "T".
An example data for testing GARS
package
This function creates the initial random population of chromosomes
GARS_create_rnd_population(data, chr.len, chr.num = 1000)
GARS_create_rnd_population(data, chr.len, chr.num = 1000)
data |
A
' |
chr.len |
The length of chromosomes. This value corresponds to the desired length of the feature set. |
chr.num |
The number of chromosomes to generate. Default is 1000 |
A matrix representing the chromosomes population: each column is a chromosome and each element correspond to the feature position in 'data'
Mattia Chiesa, Luca Piacentini
# use example data: data(GARS_data_norm) GARS_create_rnd_population(GARS_data_norm, chr.len=10, chr.num=100)
# use example data: data(GARS_data_norm) GARS_create_rnd_population(GARS_data_norm, chr.len=10, chr.num=100)
This function implements the one-point and the two-point cross-over.
GARS_Crossover(chr.pop, co.rate = 0.8, type = c("one.p", "two.p"), one.p.quart = c("I.quart", "II.quart", "III.quart"))
GARS_Crossover(chr.pop, co.rate = 0.8, type = c("one.p", "two.p"), one.p.quart = c("I.quart", "II.quart", "III.quart"))
chr.pop |
A matrix or a data.frame representing the chromosomes population: each column is a chromosome and each element corresponds to the feature position in the data matrix |
co.rate |
The probability of each random couple of chromosomes to swap some parts. It must be between 0 and 1. Default is 0.8 |
type |
The type of crossover method; one-point ("one.p") and two-point ("two.p") are allowed. Default is "one.p" |
one.p.quart |
The position of the cromosome where performing the crossover, if "one.p" is selected. The first quartile ("I.quart"), the second quartile ("II.quart", i.e. the median) and the third quartile ("III.quart") are allowed. Default is "I.quart" |
A matrix representing the "crossed" population. The dimensions of this matrix are the same of 'chr.pop'
Mattia Chiesa, Luca Piacentini
GARS_Mutation
,
GARS_Selection
,
GARS_Elitism
,
data(GARS_popul) crossed_pop <- GARS_Crossover(GARS_popul, co.rate=0.9) crossed_pop <- GARS_Crossover(GARS_popul, type="two.p") crossed_pop <- GARS_Crossover(GARS_popul, type="one.p", one.p.quart= "II.quart")
data(GARS_popul) crossed_pop <- GARS_Crossover(GARS_popul, co.rate=0.9) crossed_pop <- GARS_Crossover(GARS_popul, type="two.p") crossed_pop <- GARS_Crossover(GARS_popul, type="one.p", one.p.quart= "II.quart")
An RNA-seq normalized matrix to test several GARS functions; this dataset
was obtained using the DaMirseq
package to normalize the raw count
matrix present in MLSeq
package.
GARS_data_norm
GARS_data_norm
A matrix of 157 genes (columns) and 58 samples (rows)
An example data for testing GARS
package
This function splits the chromosome population in two parts allowing the best chromosomes to be preserved from the "evolutionary" steps: Selection, Crossover and Mutation.
GARS_Elitism(chr.pop, fitn.values, n.elit = 10)
GARS_Elitism(chr.pop, fitn.values, n.elit = 10)
chr.pop |
A matrix or a data.frame representing the chromosomes population: each column is a chromosome and each element corresponds to the feature position in the data matrix |
fitn.values |
A numeric vector where each element corresponds to the fitness score of each chromosome in 'chr.pop' |
n.elit |
The number of best chromosomes to be selected by elitism. This number must be even. Default is 10 |
A list containing:
The population of best chromosomes selected by elitism.
The population of chromosomes not selected by elitism.
The fitness values of best chromosomes selected by elitism.
The fitness values of chromosomes not selected by elitism.
Mattia Chiesa, Luca Piacentini
GARS_Mutation
,
GARS_Selection
,
GARS_Crossover
,
GARS_FitFun
,
data(GARS_popul) data(GARS_Fitness_score) pop_list <- GARS_Elitism(GARS_popul, GARS_Fitness_score)
data(GARS_popul) data(GARS_Fitness_score) pop_list <- GARS_Elitism(GARS_popul, GARS_Fitness_score)
A numeric vector with the maximum fitness score for each iteration
GARS_fit_list
GARS_fit_list
A numeric vector with 100 fitness scores
An example data for testing GARS
package
In GARS the Fitness Function consists in calculating the Averaged Silhouette Index after a Multi-Dimensional Scaling
GARS_FitFun(data, classes, chr.pop)
GARS_FitFun(data, classes, chr.pop)
data |
A
' |
classes |
A vector of type "factor" with |
chr.pop |
A matrix or a data.frame representing the chromosomes population: each column is a chromosome and each element corresponds to the feature position in the expression data matrix |
A numeric vector where each element corresponds to the fitness score of each chromosome in 'chr.pop'
Mattia Chiesa, Luca Piacentini
# use example data: data(GARS_data_norm) data(GARS_classes) data(GARS_popul) fitness_scores <- GARS_FitFun(GARS_data_norm, GARS_classes, GARS_popul)
# use example data: data(GARS_data_norm) data(GARS_classes) data(GARS_popul) fitness_scores <- GARS_FitFun(GARS_data_norm, GARS_classes, GARS_popul)
A numeric vector with the fitness scores for each chromosome in a single generation
GARS_Fitness_score
GARS_Fitness_score
A numeric vector with 50 fitness scores
An example data for testing GARS
package
This function allows the users to run all GARS funtion at once. This is the easier and recommended way to use GARS.
GARS_GA(data, classes, chr.num = 1000, chr.len, generation = 500, co.rate = 0.8, mut.rate = 0.01, n.elit = 10, type.sel = c("RW", "TS"), type.co = c("one.p", "two.p"), type.one.p.co = c("I.quart", "II.quart", "III.quart"), n.gen.conv = 80, plots = c("yes", "no"), n.Feat_plot = 10, verbose = c("yes", "no"))
GARS_GA(data, classes, chr.num = 1000, chr.len, generation = 500, co.rate = 0.8, mut.rate = 0.01, n.elit = 10, type.sel = c("RW", "TS"), type.co = c("one.p", "two.p"), type.one.p.co = c("I.quart", "II.quart", "III.quart"), n.gen.conv = 80, plots = c("yes", "no"), n.Feat_plot = 10, verbose = c("yes", "no"))
data |
A
' |
classes |
The class vector |
chr.num |
The number of chromosomes to generate. Default is 1000 |
chr.len |
The length of chromosomes. This value corresponds to the desired length of the feature set |
generation |
The maximum number of generations. Default is 1000 |
co.rate |
The probability of each random couple of chromosomes to swap some parts. It must be between 0 and 1. Default is 0.8 |
mut.rate |
The probability to apply a random mutation to each element. It must be between 0 and 1. Default is 0.01 |
n.elit |
The number of best chromosomes to be selected by elitism. This number must be even. Default is 10 |
type.sel |
The type of selection method; Roulette Wheel ("RW") and Tournament Selection ("TS") are allowed. Default is "RW" |
type.co |
The type of crossover method; one-point ("one.p") and two-point ("two.p") are allowed. Default is "one.p" |
type.one.p.co |
The position of the cromosome where performing the crossover, if "one.p" is selected. The first quartile ("I.quart"), the second quartile ("II.quart", i.e. the median) and the third quartile ("III.quart") are allowed. Default is "I.quart" |
n.gen.conv |
The number of consecutive generations with the same maximum fitness score. |
plots |
If graphs have to be plotted; "yes" or "no" are allowed. Default is "yes" |
n.Feat_plot |
The number of features to be plotted |
verbose |
If statistics have to be printed; "yes" or "no" are allowed. Default is "yes" |
A GarsSelectedFeatures object, containg:
a matrix of selected features
a matrix containg the last chromosome population
a list containing all the populations produced over the generations
a numeric vector containing the maximum fitness scores, computed in each generation
Mattia Chiesa, Luca Piacentini
# use example data: data(GARS_data_norm) data(GARS_classes) res_ex <- GARS_GA(GARS_data_norm, GARS_classes, chr.num = 100, chr.len=10, generation = 5, co.rate = 0.8, mut.rate = 0.1, n.elit = 10, type.sel = "RW", type.co ="one.p", type.one.p.co = "II.quart", n.gen.conv = 80, plots = "no", verbose = "no")
# use example data: data(GARS_data_norm) data(GARS_classes) res_ex <- GARS_GA(GARS_data_norm, GARS_classes, chr.num = 100, chr.len=10, generation = 5, co.rate = 0.8, mut.rate = 0.1, n.elit = 10, type.sel = "RW", type.co ="one.p", type.one.p.co = "II.quart", n.gen.conv = 80, plots = "no", verbose = "no")
This function implements the mutation step in the GA. First, it checks and replace duplicate features in each chromosomes; then, random mutation are applied to the entire population.
GARS_Mutation(chr.pop, mut.rate = 0.01, totFeats)
GARS_Mutation(chr.pop, mut.rate = 0.01, totFeats)
chr.pop |
A matrix or a data.frame representing the chromosomes population: each column is a chromosome and each element correspond to the feature position in the data matrix |
mut.rate |
The probability to apply a random mutation to each element. It must be between 0 and 1. Default is 0.01 |
totFeats |
The total number of features. Often, it corresponds to number of columns of the data matrix |
A matrix representing the "mutated" population. The dimensions of this matrix are the same of 'chr.pop'
Mattia Chiesa, Luca Piacentini
GARS_Elitism
,
GARS_Selection
,
GARS_Crossover
,
# use example data: data(GARS_popul) data(GARS_data_norm) mutated_pop <- GARS_Mutation(GARS_popul, mut.rate=0.1, dim(GARS_data_norm)[2])
# use example data: data(GARS_popul) data(GARS_data_norm) mutated_pop <- GARS_Mutation(GARS_popul, mut.rate=0.1, dim(GARS_data_norm)[2])
This function allows assessing visually how many times a feature is selected across the generations. In principle, a highly recurring feature is more likely to be important.
GARS_PlotFeaturesUsage(popul.list, allFeat, nFeat = length(allFeat))
GARS_PlotFeaturesUsage(popul.list, allFeat, nFeat = length(allFeat))
popul.list |
A SummarizedExpression object |
allFeat |
A character vector containing the list of the all features name. Often, it corresponds to the columns name of the data matrix. |
nFeat |
The number of features which have to be plotted.
Default is ' |
A bubble chart where each plotted feature is represented by a colored circle. A feature is important (i.e. conserved) if the size is wide and the color tends to red; the smaller the size, the lighter the color and less informative the feature.
Mattia Chiesa, Luca Piacentini
# use example data: data(GARS_data_norm) data(GARS_pop_list) allfeat_names <- colnames(GARS_data_norm) GARS_PlotFeaturesUsage(GARS_pop_list, allfeat_names, nFeat = 10)
# use example data: data(GARS_data_norm) data(GARS_pop_list) allfeat_names <- colnames(GARS_data_norm) GARS_PlotFeaturesUsage(GARS_pop_list, allfeat_names, nFeat = 10)
This function plots the maximum fitness scores for each generation
GARS_PlotFitnessEvolution(fitness.scores)
GARS_PlotFitnessEvolution(fitness.scores)
fitness.scores |
A numeric vector where each element corresponds to the fitness score |
A plot which represent the evolution of the fitness score across the generations
Mattia Chiesa, Luca Piacentini
# use example data: data(GARS_fit_list) GARS_PlotFitnessEvolution(GARS_fit_list)
# use example data: data(GARS_fit_list) GARS_PlotFitnessEvolution(GARS_fit_list)
A list containing 100 of consecutive chromosomes populations
GARS_pop_list
GARS_pop_list
A list with 100 consecutive chromosomes populations
An example data for testing GARS
package
A matrix to test several GARS functions, representing a chromosome population
GARS_popul
GARS_popul
A matrix of 20 rows (features) and 50 columns (chromosomes)
An example data for testing GARS
package
An object representing the output of GARS_GA
GARS_res_GA
GARS_res_GA
A GarsSelectedFeatures
An example data for testing GARS
package
This function implements two kind of GA Selection step: the "Roulette Wheel" and the "Tournament" selection.
GARS_Selection(chr.pop, type = c("RW", "TS"), fitn.values)
GARS_Selection(chr.pop, type = c("RW", "TS"), fitn.values)
chr.pop |
A matrix or a data.frame representing the chromosomes population: each column is a chromosome and each element corresponds to the feature position in the data matrix |
type |
The type of selection method; Roulette Wheel ("RW") and Tournament Selection ("TS") are allowed. Default is "RW" |
fitn.values |
A numeric vector where each element corresponds to the fitness score of each chromosome in 'chr.pop' |
A matrix representing the "selected" population. The dimensions of this matrix are the same of 'chr.pop'.
Mattia Chiesa, Luca Piacentini
GARS_Mutation
,
GARS_Crossover
,
GARS_Elitism
,
# use example data: data(GARS_popul) data(GARS_Fitness_score) selected_pop <- GARS_Selection(GARS_popul, "RW", GARS_Fitness_score)
# use example data: data(GARS_popul) data(GARS_Fitness_score) selected_pop <- GARS_Selection(GARS_popul, "RW", GARS_Fitness_score)
The output class for GARS_GA function
data_red
a matrix containing the expression values for the selected feature
last_pop
a matrix containing the chromosome population of the last generation
pop_list
a list containing all the populations produced over the generations
fit_list
a vector containing the maximum fitness scores
showClass("GarsSelectedFeatures")
showClass("GarsSelectedFeatures")
The LastPop slot contains the last chromosome population
LastPop(x) ## S4 method for signature 'GarsSelectedFeatures' LastPop(x)
LastPop(x) ## S4 method for signature 'GarsSelectedFeatures' LastPop(x)
x |
a |
a matrix containing the last population
Mattia Chiesa, Luca Piacentini
data(GARS_res_GA) ex_pop <- LastPop(GARS_res_GA)
data(GARS_res_GA) ex_pop <- LastPop(GARS_res_GA)
The MatrixFeatures slot contains the reduced dataset
MatrixFeatures(x) ## S4 method for signature 'GarsSelectedFeatures' MatrixFeatures(x)
MatrixFeatures(x) ## S4 method for signature 'GarsSelectedFeatures' MatrixFeatures(x)
x |
a |
a matrix with the reduced dataset
Mattia Chiesa, Luca Piacentini
data(GARS_res_GA) ex_matrix <- MatrixFeatures(GARS_res_GA)
data(GARS_res_GA) ex_matrix <- MatrixFeatures(GARS_res_GA)