Title: | Gene Selection |
---|---|
Description: | This package provides a supervised technique able to identify differentially expressed genes, based on the construction of \emph{Fuzzy Patterns} (FPs). The Fuzzy Patterns are built by means of applying 3 Membership Functions to discretized gene expression values. |
Authors: | R. Alvarez-Gonzalez, D. Glez-Pena, F. Diaz, F. Fdez-Riverola |
Maintainer: | Rodrigo Alvarez-Glez <[email protected]> |
License: | GPL-2 |
Version: | 1.65.0 |
Built: | 2024-11-13 06:14:16 UTC |
Source: | https://github.com/bioc/DFP |
This package provides a supervised technique able to identify differentially expressed genes, based on the construction of Fuzzy Patterns (FPs). The Fuzzy Patterns are built by means of applying 3 Membership Functions to discretized gene expression values.
Package: | DFP |
Type: | Package |
Version: | 1.0 |
Date: | 2008-07-03 |
License: | GPL-2 |
The main functionality of the package is provided by the discriminantFuzzyPattern
function, which works in a 4-step process:
Calculates the Membership Functions. These functions are used in the next step to discretize gene expression data.
Discretizes the gene expression data (float values) into ‘Low’, ‘Medium’ or ‘High’ labels.
Calculates a Fuzzy Pattern for each category. To do this, a given percentage of the samples belonging to a category must have the same label (‘Low’, ‘Medium’ or ‘High’).
Calculates the Discriminant Fuzzy Pattern (DFP) that includes those genes present in two or more FPs with different assigned labels.
Additional data classes: ExpressionSet
, AnnotatedDataFrame
.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
######################################### ############ Get sample data ############ ######################################### library(DFP) data(rmadataset) ######################################### # Filter the most representative genes # ######################################### res <- discriminantFuzzyPattern(rmadataset) ######################################### ###### Different result displays ######## ######################################### plotMembershipFunctions(rmadataset, res$membership.functions, featureNames(rmadataset)[1:2]) showDiscreteValues(res$discrete.values, featureNames(rmadataset)[1:10], c("healthy", "AML-inv")) showFuzzyPatterns(res$fuzzy.patterns, "healthy")[21:50] plotDiscriminantFuzzyPattern(res$discriminant.fuzzy.pattern)
######################################### ############ Get sample data ############ ######################################### library(DFP) data(rmadataset) ######################################### # Filter the most representative genes # ######################################### res <- discriminantFuzzyPattern(rmadataset) ######################################### ###### Different result displays ######## ######################################### plotMembershipFunctions(rmadataset, res$membership.functions, featureNames(rmadataset)[1:2]) showDiscreteValues(res$discrete.values, featureNames(rmadataset)[1:10], c("healthy", "AML-inv")) showFuzzyPatterns(res$fuzzy.patterns, "healthy")[21:50] plotDiscriminantFuzzyPattern(res$discriminant.fuzzy.pattern)
Calculates the Discriminant Fuzzy Pattern (DFP) that includes those genes present in two or more FPs with different assigned labels.
calculateDiscriminantFuzzyPattern(rmadataset, fps)
calculateDiscriminantFuzzyPattern(rmadataset, fps)
rmadataset |
|
fps |
Genes belonging to each Fuzzy Patterns. There are one FP for each class. |
Genes belonging to the final DFP.
Includes an attribute ifs with the Impact Factor for each category.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
Calculates a Fuzzy Pattern for each category. To do this, a given percentage of the samples belonging to a category must have the same label (‘Low’, ‘Medium’ or ‘High’).
calculateFuzzyPatterns(rmadataset, dvs, piVal = 0.9, overlapping = 2)
calculateFuzzyPatterns(rmadataset, dvs, piVal = 0.9, overlapping = 2)
rmadataset |
|
dvs |
Matrix containing discrete values according to the overlapping parameter after discretizing the gene expression values. |
piVal |
Controls the degree of exigency for selecting a gene as a member of a Fuzzy Pattern. |
overlapping |
Modifies the number of membership functions used in the discretization process.
|
Genes belonging to each Fuzzy Patterns. There are one FP for each class.
Includes an attribute ifs with the Impact Factor for each category.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
Calculates the Membership Functions. These functions are used in the next step (discretizeExpressionValues
) to discretize gene expression data.
calculateMembershipFunctions(rmadataset, skipFactor = 3)
calculateMembershipFunctions(rmadataset, skipFactor = 3)
rmadataset |
|
skipFactor |
Numeric value to omit odd values (a way of normalization). |
Membership functions to determine the discret value (linguistic label) corresponding to a given gene expression level.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
Discretizes the gene expression data (float values) into ‘Low’, ‘Medium’ or ‘High’ labels.
discretizeExpressionValues(rmadataset, mfs, zeta = 0.5, overlapping = 2)
discretizeExpressionValues(rmadataset, mfs, zeta = 0.5, overlapping = 2)
rmadataset |
|
mfs |
Membership functions to determine the discret value (linguistic label) corresponding to a given gene expression level. |
zeta |
Threshold value which controls the activation of a linguistic label ('Low', 'Medium' or 'High'). |
overlapping |
Modifies the number of membership functions used in the discretization process.
|
Matrix containing discrete values according to the overlapping parameter after discretizing the gene expression values.
Includes an attribute types which determines the category of each sample.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
discriminantFuzzyPattern discovers significant genes based on the construction of Fuzzy Patterns (FPs). The Fuzzy Patterns are built by means of applying 3 Membership Functions to the gene expression values in the matrix rmadataset.
discriminantFuzzyPattern(rmadataset, skipFactor = 3, zeta = 0.5, overlapping = 2, piVal = 0.9)
discriminantFuzzyPattern(rmadataset, skipFactor = 3, zeta = 0.5, overlapping = 2, piVal = 0.9)
rmadataset |
|
skipFactor |
Numeric value to omit odd values (a way of normalization). |
zeta |
Threshold value which controls the activation of a linguistic label ('Low', 'Medium' or 'High'). |
overlapping |
Modifies the number of membership functions used in the discretization process.
|
piVal |
Controls the degree of exigency for selecting a gene as a member of a Fuzzy Pattern. |
The discriminantFuzzyPattern
function works in a 4-step process:
Calculates the Membership Functions. These functions are used in the next step to discretize gene expression data.
Discretizes the gene expression data (float values) into ‘Low’, ‘Medium’ or ‘High’ labels.
Calculates a Fuzzy Pattern for each category. To do this, a given percentage of the samples belonging to a category must have the same label (‘Low’, ‘Medium’ or ‘High’).
Calculates the Discriminant Fuzzy Pattern (DFP) that includes those genes present in two or more FPs with different assigned labels.
membership.functions |
Membership functions to determine the discret value corresponding to a given gene expression level. |
discrete.values |
Discrete values according to the overlapping parameter after discretizing the gene expression values. |
fuzzy.patterns |
Genes belonging to each Fuzzy Patterns. There are one FP for each class. |
discriminant.fuzzy.pattern |
Genes belonging to the final DFP. |
params |
The parameters used to tune the algorithm (as arguments in the function). |
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
######################################### ############ Get sample data ############ ######################################### library(DFP) data(rmadataset) ######################################### # Filters the most representative genes # ######################################### res <- discriminantFuzzyPattern(rmadataset) summary(res)
######################################### ############ Get sample data ############ ######################################### library(DFP) data(rmadataset) ######################################### # Filters the most representative genes # ######################################### res <- discriminantFuzzyPattern(rmadataset) summary(res)
A virtual class which represents a generic Membership Function.
A virtual Class: No objects may be created from it.
center
:Object of class "numeric"
. Represents the peak point in the function curve.
width
:Object of class "numeric"
. Represents the length of values lower than 1 and greater than 0 in the function curve.
signature(object = "ExpressionLevel")
: Prints the ExpressionLevel subclass of the object.
signature(object = "ExpressionLevel", values = "numeric")
: Generic function to be implemented in the subclasses.
signature(object = "ExpressionLevel", x = "numeric")
: Generic function to be implemented in the subclasses.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
showClass("ExpressionLevel")
showClass("ExpressionLevel")
A class which represents a Membership Function to determine the membership of a numeric value to the ‘High’ discrete label. The result depends on the ‘center’ and ‘width’ values.
Objects can be created by calls of the form new("HighExpressionLevel")
.
center
:Object of class "numeric"
. Represents the peak point in the function curve.
width
:Object of class "numeric"
. Represents the length of values lower than 1 and greater than 0 in the function curve.
Class "ExpressionLevel"
, directly.
signature(object = "HighExpressionLevel", values = "numeric")
: Establishes the ‘center’ and ‘width’ slots of the object, given a vector of numeric values.
signature(object = "HighExpressionLevel", x = "numeric")
: Returns a value in the [0,1] interval, which represents the membership to the ‘High’ discrete label.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
showClass("HighExpressionLevel")
showClass("HighExpressionLevel")
A class which represents a Membership Function to determine the membership of a numeric value to the ‘Low’ discrete label. The result depends on the ‘center’ and ‘width’ values.
Objects can be created by calls of the form new("LowExpressionLevel")
.
center
:Object of class "numeric"
. Represents the peak point in the function curve.
width
:Object of class "numeric"
. Represents the length of values lower than 1 and greater than 0 in the function curve.
Class "ExpressionLevel"
, directly.
signature(object = "LowExpressionLevel", values = "numeric")
: Establishes the ‘center’ and ‘width’ slots of the object, given a vector of numeric values.
signature(object = "LowExpressionLevel", x = "numeric")
: Returns a value in the [0,1] interval, which represents the membership to the ‘Low’ discrete label.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
showClass("LowExpressionLevel")
showClass("LowExpressionLevel")
A class which represents a Membership Function to determine the membership of a numeric value to the ‘Medium’ discrete label. The result depends on the ‘center’ and ‘width’ values.
Objects can be created by calls of the form new("MediumExpressionLevel")
.
center
:Object of class "numeric"
. Represents the peak point in the function curve.
width
:Object of class "numeric"
. Represents the length of values lower than 1 and greater than 0 in the function curve.
Class "ExpressionLevel"
, directly.
signature(object = "MediumExpressionLevel", values = "numeric")
: Establishes the ‘center’ and ‘width’ slots of the object, given a vector of numeric values.
signature(object = "MediumExpressionLevel", x = "numeric")
: Returns a value in the [0,1] interval, which represents the membership to the ‘Medium’ discrete label.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
showClass("MediumExpressionLevel")
showClass("MediumExpressionLevel")
This function plots the Discriminant Fuzzy Pattern of the relevant genes (in rows) for the sample
classes (in columns), as well as the impact factor which determines if a gene belongs to a Fuzzy Pattern
in a class (if its value is higher than the piVal).
The relevant genes are those which are present in almost two different Fuzzy Patterns with different linguistic labels.
The plotting is made in both graphical and text mode.
plotDiscriminantFuzzyPattern(dfp, overlapping = 2)
plotDiscriminantFuzzyPattern(dfp, overlapping = 2)
dfp |
A matrix with the fuzzy patterns and impact factors for the relevant genes. |
overlapping |
Modifies the number of membership functions used in the discretization process.
|
A matrix with the discriminant genes in rows, along with the Fuzzy Pattern for each class (in columns).
This object contains an attribute (ifs
) which stores the Impact Factors used to determine if a gene
belongs to a Fuzzy Pattern in a class (if the value is higher than the piVal).
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
Each gene has 3 Membership Functions (‘Low’, ‘Medium’ and ‘High’) which can be plotted as curves in graphical mode.
In the text mode a membership function is represented with its center and width.
This function receives one or more gene names and plots the results in both graphical and text mode.
If a set of genes containing more than 36 elements is provided, only the text mode is available.
plotMembershipFunctions(rmadataset, mfs, genes)
plotMembershipFunctions(rmadataset, mfs, genes)
rmadataset |
An |
mfs |
A list of 3 ExpressionLevel objects (‘Low’, ‘Medium’ and ‘High’) for each gene (a list of lists). |
genes |
The set of genes to plot (a vector). |
A dataframe with the values of the membership functions (‘Low’, ‘Medium’ and ‘High’) for each gene (in rows) received as a parameter.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
This function creates an ExpressionSet
with an AnnotatedDataFrame
. To do this, it requires two CSV files in a predefined format:
‘exprsData
’ with the expression values of genes (in rows) of different samples (in columns).
‘pData
’ with the samples (in columns) and the metadata ‘class’ (the most important for the algorithm discriminantFuzzyPattern
), ‘age’ and ‘sex’.
readCSV(fileExprs, filePhenodata)
readCSV(fileExprs, filePhenodata)
fileExprs |
The path to the |
filePhenodata |
The path to the |
An ExpressionSet
object with an AnnotatedDataFrame
storing ‘class’, ‘age’ and ‘sex’ information.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
dataDir <- system.file("extdata", package="DFP"); dataDir fileExprs <- file.path(dataDir, "exprsData.csv"); fileExprs filePhenodata <- file.path(dataDir, "pData.csv"); filePhenodata rmadataset <- readCSV(fileExprs, filePhenodata); rmadataset pData(phenoData(rmadataset)) exprs(rmadataset)[1:10,1:5]
dataDir <- system.file("extdata", package="DFP"); dataDir fileExprs <- file.path(dataDir, "exprsData.csv"); fileExprs filePhenodata <- file.path(dataDir, "pData.csv"); filePhenodata rmadataset <- readCSV(fileExprs, filePhenodata); rmadataset pData(phenoData(rmadataset)) exprs(rmadataset)[1:10,1:5]
This ExpressionSet
object includes an AnnotatedDataFrame
with metadata about ‘Disease type’ (the most important for the algorithm), ‘Patient age’ and ‘Patient gender’.
This data set gives the expression values of 500 genes in 35 samples.
data(rmadataset)
data(rmadataset)
ExpressionSet |
str(pData(phenoData(rmadataset)))
|
AnnotatedDataFrame |
str(exprs(rmadataset))
|
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
data(rmadataset) featureNames(rmadataset)[1:20] sampleNames(rmadataset) varLabels(rmadataset) pData(phenoData(rmadataset)) exprs(rmadataset)[1:10,1:5]
data(rmadataset) featureNames(rmadataset)[1:20] sampleNames(rmadataset) varLabels(rmadataset) pData(phenoData(rmadataset)) exprs(rmadataset)[1:10,1:5]
Prints the slots (center and width) of an "ExpressionLevel"
object.
See "ExpressionLevel"
.
In an intermediate step, the algorithm discriminantFuzzyPattern
converts the gene expression values into discrete labels (combining ‘Low’, ‘Medium’ and ‘High’,
depending on the value of the param ‘overlapping’).
This function permits printing these labels, specifying a set of genes (a vector) and/or classes of samples.
showDiscreteValues(dvs, genes, classes)
showDiscreteValues(dvs, genes, classes)
dvs |
A matrix with discrete labels for a set of genes (in rows) of several samples (in columns). |
genes |
[optional] The set of genes to plot. |
classes |
[optional] A set of classes to which the samples belong. It must be one of the classes stored in the
phenoData of the original |
A subset of the matrix dvs
determined by the restrictions (genes
and/or classes
).
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102
This functions prints (in text mode) the Fuzzy Patterns (discrete labels) calculated for a single class of samples.
showFuzzyPatterns(fps, class)
showFuzzyPatterns(fps, class)
fps |
A matrix with the Fuzzy Patterns (discrete labels) for all the samples and genes. |
class |
A class to which the samples belong. It must be one of the classes stored in the phenoData
of the original |
A vector of Fuzzy Patterns (discrete labels) for a single class of samples, with the genes associated.
Rodrigo Alvarez-Gonzalez
Daniel Glez-Pena
Fernando Diaz
Florentino Fdez-Riverola
Maintainer: Rodrigo Alvarez-Gonzalez <[email protected]>
F. Diaz; F. Fdez-Riverola; D. Glez-Pena; J.M. Corchado. Using Fuzzy Patterns for Gene Selection and Data Reduction on Microarray Data. 7th International Conference on Intelligent Data Engineering and Automated Learning: IDEAL 2006, (2006) pp. 1095-1102