NEWS
GOexpress 1.9.2
BUG FIXES
- Remove function overlap_GO because the VennDiagram package has issues
with condition tests on vectors with length greater than 1
GOexpress 1.9.1
BUG FIXES
- Remove duplicated plot.title argument in ggplot call.
GENERAL UPDATES
GOexpress 1.7.1
GENERAL UPDATES
GOexpress 1.5.5
BUG FIX
- Function subEset also removes empty levels in factors not mentionned in
the subset argument. Necessary for randomForest as the factor of interest
cannot have empty levels.
GOexpress 1.5.4
NEW FEATURES
- plot_design supports subset argument to restrict the plot to certain
subsets of samples.
GOexpress 1.5.3
GENERAL UPDATES
- Clarify that GOexpress in not a conventional gene set enrichment
analysis tool. It is based on ranking, not enrichment.
GOexpress 1.5.2
BUG FIX
- Rename "namespace" column to "namespace_1003"", not "namespace_1006".
GOexpress 1.5.1
GENERAL UPDATES
- Updated maintainer email address.
GOexpress 1.3.2
BUG FIX
- expression_plot() functions return error if an empty gene name is given
GOexpress 1.3.1
BUG FIX
- heatmap_GO() function does not automatically resize margins when using
either gene names or gene identifiers. This caused the user-defined
margins to be systematically ignored.
GOexpress 1.1.12
NEW FEATURES
- Summarisation function of scores and rank from feature-level to
ontology-level also affects the calculation of P-values by pValue_GO. The
default behaviour is to use the same function as used in the call to
GO_analyse, stored in the result object.
GOexpress 1.1.11
NEW FEATURES
- Summarisation function of scores and rank from feature-level to
ontology-level can be overriden from the default "mean" to any function
specified by the user. The function need to support a list of numeric
values as an input, and return a unique numeric value as an output.
GOexpress 1.1.10
BUG FIX
- margins argument in heatmap_GO() was not used in the call to heatmap.2()
GOexpress 1.1.9
TYPOS
- Missing space character in error message.
GOexpress 1.1.8
TYPOS
- Simone Coughlan, not Simone Coughland. Apologies!
GOexpress 1.1.7
BUG FIXES
- Changed the rank of annotated features absent from the given
ExpressionSet to (number of features in ExpressionSet + 1) instead of
max(rank) + 1 to match the documentation, and desired behaviour. This
follows the logic of giving the same rank R to N tied-scored features,
while the next feature receives a rank of R + N.
GOexpress 1.1.6
BUG FIXES
- overlap_GO() was crashing for 3-group Venn diagrams, except if the
VennDiagram was loaded manually loaded in the workspace using
libray(VennDiagram). The function will now run seemlessly without that
manual step, as loading GOexpress will immediately load VennDiagram
in the workspace (stated as a dependency in the DESCRIPTION file).
NEW FEATURES
- New function pValue_GO() allows calculation of P-value for each
ontology using permutation of genes labels. This allows users to estimate
the chance of seeing a GO term reach a particular rank (or score).
Features a fancy progres bar shamelessly adapted from StackOverflow.
- heatmap_GO now semi-autmoatically resizes the bottom and right margins
to accomodate large gene and sample labels, respectively. The user may
control those margins using the "margins" argument of the function.
- heatmap_GO default call now shows the gene feature identifier for those
missing an annotated gene name, when gene names are requested (also
the default).
- A rank.by slot is now created by the GO_analyse() function in the result
object to state the metric used to order the result tables.
- a filters.GO slot stating the filters and cutoffs applied to the result
object is now created or updated by successive uses of the subset_scores()
function. Warnings and notes are displayed if conflicting filters and
cutoffs are applied on a previously filtered result object.
- rerank() function now supports re-ordering by P-value. Note that this
is only applicable to the output of the pValue_GO() function mentioned
above.
- rerank() function now updates the rank.by slot of the result object
to state the current ordering metric.
- subset_scores() function now allows filtering by P-value. Note that this
is only applicable to the output of the pValue_GO() function mentioned
above.
- Backward compatibility with Ensembl annotation releases 75 and earlier,
which used 'external_gene_id', which was renamed to 'external_gene_name'
in releases 76 and later.
- table_genes() function defaults to sorting genes by decreasing score
(equivalent to increasing rank). Gene feature name or identifier are
supported alternative filters for sorting.
UPDATED FEATURES
- Allow user to override row_labels in heatmap_GO. This way, the
color-coding of the sample can be kept, while better description of the
samples can be used to label them, instead of the phenodata values.
- In heatmap_GO(), if the labRow argument is of length 1, it is assumed to
be the name of a column in the phenoData slot. Useful to re-label
subsetted ExpressionSet objects.
GENERAL UPDATES
- Updated the AlvMac training dataset to include 'RPL36A' an example
of multiple Ensembl gene identifier annotated to the same gene name.
- Updated the AlvMac example custom annotations to match the updated
dataset.
- Updated the example AlvMac_results to match the updated dataset.
- Set the random seed prior to running the GO_analyse() example in
the vignette. Hopefully, this should allow reproducible testing by the
users.
- In User Guide, load package before loading the attached data.
- In User Guide, new sections and examples dealing with the re-labelling
of heatmap samples, the use of P-values, the re-ranking and subsetting
of results using P-values. New sub-sections for clarity. Emphasis on
the use and generation of local annotation, rather than use of current
online Ensembl annotation release.
- No more code connecting to the Ensembl server in any the help files
and User Guide.
- Help pages examples with more consistent indentation of code.
GOexpress 1.1.5
BUG FIXES
- Forgot to commit image file of shiny screenshot in release 1.1.5
GOexpress 1.1.4
NEW FEATURES
- Custom annotations can be provided to the GO_analyse function using
three new arguments: "GO_genes", "all_GO", and "all_genes". See below for
individual description.
- The GO_genes argument allows the user to provide associations between
feature identifiers in the ExpressionSet and gene ontology identifiers.
This will skip all calls to the Ensembl BioMart server. Consequently, we
recommend the use of the "all_GO" and "all_genes" arguments whenever
"GO_genes" is used. Otherwise, some downstream functions may not work.
For instance, the expression_plot, expression_profiles, and heatmap_GO
function can be used to generate plots,although lacking gene and gene
ontology names. See below for more details.
- The all_GO argument allows the user to provide the name and namespace
("biological_process", "molecular_function", or "cellular_component")
corresponding to gene ontology identifiers. This enables subsequent
filtering of result tables by namespace, and annotation of heatmaps with
the name of the gene ontology.
- The all_genes argument allows the user to provide the name and
description corresponding to feature identifiers in the ExpressionSet.
This enables annotation of expression plots with the gene name associated
with the feature.
UPDATED FEATURES
- In the GO_analyse function, the eSet argument is now formally checked
to be of class ExpressionSet prior to any calculation. If not, the
function returns an appropriate error message.
- The error messages caused when the user gives a name that does not exist
in the phenoData slot of the ExpressionSet were updated to use the word
"column", rather than "factor".
GENERAL UPDATES
- Updated the help page for function GO_analyse to describe the new
features described above and provide a code example.
- Used the BiocStyle package to format the vignette.
- Added a new section in the vignette to document the new features
described above.
- Added a new section in the vignette to mention the creation of shiny
applications using the output of GOexpress. Included a screenshot of
a shiny application developed from the original AlvMac dataset.
- Added a new section in the vignette to highlight the availability of
a "subset" argument to avoid the need for additional ExpressionSet objects
to analyse or visualise subsets of samples.
GOexpress 1.1.3
NEW FEATURES
- Included private function used to generate the prefix2dataset table.
- Included private function used to generate the microarray2dataset table.
- GO_analyse states the number of gene features in the given dataset that
were found in the BioMart dataset. This allows the user to interrupt the
script if a suspiciously low number of mapped features suggests an
incorrect BioMart dataset was used.
UPDATED FEATURES
- Updated the prefix2dataset table. Three more species
("Chlorocebus sabaeus", "Papio anubis" , and "Poecilia formosa"), and two
more columns ("species" and "sample").
- Updated the microarray2dataset table. Probeset patterns are now hard
coded in the package, but dynamically defined as unique to a platform
(or not) by the function building the microarray2dataset table. Species
without microarray platforms were also included in the table with NA
values. Code was added in the GO_analyse method to ignore those before
trying to resolve the origin of the expression data, if not specified
by the user.
- Renamed column "prefix" to "pattern" in microarray2dataset table.
- Disabled manual check of C. elegans and D. melanogaster, as the
automated detection is doing the exact same thing. Only S. cerevisiae
requires a manual check of the "Y" prefix instead of the automatically
extracted "Y[:LETTERS:]{2}" pattern.
- Methods expression_plot_symbol and expression_profiles_symbol now
return the list of gene feature identifiers instead of NULL when multiple
plots are produced. That may be confusing as they return the list of close
matches when an invalid gene symbol was given, and they return the ggplot
when obly one plot is produced.
GENERAL UPDATES
- Code layout. Lines of code over 80 characters were split around
brackets and commas to use the built-in indentation defaulted to 4 space
characters. Plus, this made the code more readable in many cases.
GOexpress 1.1.2
GENERAL UPDATES
- DESCRIPTION file minor correction: Inappropriate "Metagenomics"
biocView removed.
GOexpress 1.1.1
BUG FIXES
- Ensembl BioMart has changed the column 'external_gene_id' to
'external_gene_name'. Renamed my biomaRt queries accordingly to prevent
GO_analyse() from crashing during the analysis step.
- Updated the AlvMac_results variable containing sample results annotated
with the deprecated 'external_gene_id'. It now includes the new
'external_gene_name'.
GENERAL UPDATES
- DESCRIPTION file minor correction.
GOexpress 0.99.17
BUG FIXES
- NAs were introduced in the average score and rank of GO terms following
analysis of microarray data (ANOVA and randomForest) for GO terms without
associated features in the ExpressionSet. The problem was not found in any
case for analysis of RNA-Seq data. This was causing issues during the
subset_scores function and subsequent plots, such as main titles made of
multiple lines and NAs. GO terms without associated probesets are now
given average score of 0 and average rank equal to the maximum in the
dataset plus 1.
GOexpress 0.99.16
GENERAL UPDATES
- Included co-authors who participated in the generation and analysis of
data used to test the package.
GOexpress 0.99.15
GENERAL UPDATES
- Updated man pages GO_analyse to describe the recently added subset
slot in the result variable.
GOexpress 0.99.14
BUG FIXES
- The anova method of the GO_analyse method was broken since the
introduction of ExpressionSet, release 0.99.4.
UPDATED FEATURES
- The subset argument was also added to the GO_analyse method. This allows
the identification of genes clustering a subset of groups, at a subset of
time-points, etc. while plotting the expression profile of the entire
dataset, if desired.
GENERAL UPDATES
- Updated man pages GO_analyse to allow only one example to be run. This
saves time during CMD check, while making the man page more easily
readable.
GOexpress 0.99.13
UPDATED FEATURES
- All expression plots were updated to a default ylim range corresponding
to the minimum and maximum expression values found in the entire
ExpressionSet. This is meant to avoid mis-interpretation of the amplitude
of variation between sample groups, as suggested in the paper:
Rougier, N.P., Droettboom, M., and Bourne, P.E. (2014). Ten simple rules
for better figures. PLoS computational biology 10, e1003833.
GOexpress 0.99.12
UPDATED FEATURES
- All expression plots were given new arguments for more control: xlab
allows users to change the default title for the X-axis, ylim allows users
to override the lower and upper boundaries of the Y axis.
GENERAL UPDATES
- Updated man pages AlvMac, AlvMac_results, microarray2dataset, and
prefix2dataset. Replaced the LaTeX describe statement by an itemise
statement to make the man page more readable in a terminal window.
- Moved functions between the post_analysis script and the toolkit
script. From now on, only functions accessible to the users should be
present in the post_analysis script, while toolkit should contain
method called internally.
- The User's Guide was updated to redirect the users to the new support
site for Bioconductor rather than the bioc-devel mailing list.
GOexpress 0.99.11
NEW FEATURES
- The new subEset() method subsets an ExpressionSet given a list where
item names are column names of the phenoData slots and item values are
vectors of values corresponding to sample to retain (e.g.
list(Time=c("2H", "6H")) will retain samples with value "2H" or "6H" in
the "Time" column of the phenoData slot).
UPDATED FEATURES
- All relevant visualisation methods have been added an argument to
subset samples to plot to those with a given set of values for a given
column in the phenoData (uses the new function subEset described above).
- The example analysis results was renamed from "raw_results" to
"AlvMac_results", so that it can now loaded running data(AlvMac_results).
The new name is meant to be more specific to the package.
GOexpress 0.99.10
GENERAL UPDATES
- Replaced \format by \value sections to the following man pages:
AlvMac.Rd, microarray2dataset.Rd, prefix2dataset.Rd, raw_results.Rd
to avoid having an empty value section, as recommended by the Bioconductor
package tracker. Hopefully, it does not require a non-empty \format
section as well...
- Restricted lines to less than 80 characters and indentation by multiple
of four space characters.
GOexpress 0.99.9
UPDATED FEATURES
- all post analysis functions were given more sanity checks to verify
that the "result" argument contains the required slots of a GO_analyse()
output and the the arguments pointing at a phenotypic data column are
valid column names.
BUG FIXES
- expression_plot_symbol and expression_profiles symbol used the default
value of col.palette and colourF instead of forwarding the user-defined
one to the expression_plot and expression_profiles functions.
GENERAL UPDATES
- Added cross-reference in UsersGuide to point at examples of usages of
factor and numeric values for the expression plots
GOexpress 0.99.8
GENERAL UPDATES
- Resaved R data files to reduced package disk size. No more WARNING in
R CMD check.
- Removed reference to GitHub in the README file. The weblink is given
in the DESCRIPTION file anyway if users are interested.
GOexpress 0.99.7
UPDATED FEATURES
- fixed a typo in the code of heatmap_GO which made it crash for any other
dataset than the example dataset.
GOexpress 0.99.6
UPDATED FEATURES
- expression_profiles_symbol() method was missing the "index" argument to
select the feature identifier to plot alone.
GOexpress 0.99.5
NEW FEATURES
- expression_profiles() method plots the expression profiles of individual
samples series, as opposed to grouped samples series handled by
expression_plot functions.
- expression_profiles_symbol() method plots the expression profiles of
individual samples series using a gene name instead of an Ensembl gene
identifier.
UPDATED FEATURES
- overlap_GO can print to screen, if filename argument is set to NULL
(Default).
- heatmap_GO, cluter_GO and plot_design can resize title font and wrap the
text on multiple lines.
- expression_plot and expression_plot_symbol can orient X axis labels at
a given angle.
- replaced return(NULL) statement by stop() when no close match is found
to a gene name in the family of expression_plot functions.
GENERAL UPDATES
- User's Guide updated.
- List of contributors updated in User's Guide and DESCRIPTION.
GOexpress 0.99.4
UPDATED FEATURES
- Use of ExpressionSet instead of numeric named matrix and
AnnotatedDataFrame. Better consistency with other Bioconductor packages.
GENERAL UPDATES
- Implemented corrections requested following the Bioconductor review.
Includes typos, consistent terminology through the package code and
metafiles, additional information in help files, no reference to GitHub
as an alternate installation option, use of arrow signs instead of equal
signs for value assignment.
- Restricted lines to 80 characters, and used 4-space tabulations.
- Corrected out-of-date documentation.
GOexpress 0.99.3
UPDATED FEATURES
- Control the size of the legend text in the two expression plot figures.
Updated help files accordingly.
- Updated vignette with new section "Statistics".
- Complete cleaning of code files for lines shorter than 80 columns.
- Cleanup of help files for lines shorter than 80 columns.
- Enabled filtering of raw results on the average score of a GO term.
GOexpress 0.99.2
UPDATED FEATURES
- Metadata lines in the preamble of the Sweave file
GOexpress 0.99.1
UPDATED FEATURES
- Sweave vignette implemented.
- Replaced all message() statements by cat() to make Sweave
output the full message in the vignette.
- Updated a missed F into FALSE
- Updated an invalid biocViews (typo)
- Date field added for a proper citation() method.
GOexpress 0.99.0
UPDATED FEATURES
- Replaced all cat() statements by message() to match the
Biocondcutor guidelines.
GOexpress 0.6.2
UPDATED FEATURES
- Fixed typo in the example of the vignette of GO_anova().
- Advice to use the subset_scores() function in the GO_anova()
vignette.
- Helpful warning message in the vignette of GO_anova() to make
sure that the experimental factor to analyse is actually formatted
as a factor in the true R language meaning.
- Updated license to GPL (>= 3) instead of GPL-2.
GOexpress 0.6.1
NEW FEATURES
- GOexpress now implements the randomForest framework to
estimate the importance of each gene on the clustering of the samples
according to the different levels of the desired factor (See section
UPDATED FEATURES below for more details).
- rerank() function return a re-ordered version of the result
variable given, ordered using either the rank or the score metrics.
- Sample data raw_result was added to provide an example output
of the analysis, and also allow to test the visualisation functions
on a readily available variable instead of having to run the analysis
function everytime (would have killed the R CMD check duration).
UPDATED FEATURES
- GO_analyse() default has been changed to a random forest
statistical framework. This framework has various advantages over the
previous ANOVA approach; namely 1) the bootstrapping and re-sampling
of gene subsets to measure the importance of each gene in predicting
the desired factor over many iterations (Default: 1,000 trees), 2)
a shorter analysis duration due to the Fortran code underlying the
randomForest package, 3) randomForest does not make assumptions on the
distribution of the expression level within and between genes, 4)
randomForest deals implicitely with interactions in multifactorial
experiments.
- Genes are now initially ranked according to their rank and GO terms
according to the average rank of their associated genes. Genes
associated with a GO term but absent from the dataset are assigned a
value of max(rank)+1 and a score of 0. Ties between genes are resolved
by giving all genes the minimal rank, the next gene being given
rank+length(tied_genes). The average_rank metric is expected to be
more robust than the average "score". Depending on the statistical
framework used, "score" may mean "importance" (randomForest) or
"F.value" (ANOVA).
- More sanity checks at the start of some methods to ensure smooth
execution of the downstream code and more helpful error messages.
- More synonyms allowed for subsetting and filtering of the result
variable.
- list_genes() and table_genes() methods now allow to chose whether
to return all feature identifiers associated with a GO term, or only
those also present in the expression dataset.
GOexpress 0.5.5
UPDATED FEATURES
- heatmap_GO() default colorscale has been changed to
blue-white-red better suited to the representation of expression
level. The previous colorscale green-black-red is now suggested if
differential expression data is used (e.g. log2FC).
GOexpress 0.5.4
NEW FEATURES
- Sample data accessible with the data() method.
GOexpress 0.5.3
NEW FEATURES
- Method overlap_GO() produces a Venn diagram showing the overlap
of gene sets associated with two to five GO terms. The Venn diagram
is saved in a TIFF image file.
GOexpress 0.5.2
UPDATED FEATURES
- Method heatmap_GO() updated to color-code the samples by levels
of a factor, and a better colormap is used to represent the level of
expression of genes.
GOexpress 0.5.1
UPDATED FEATURES
- Package renamed GOexpress after verifying that no other package
on Bioconductor uses this name.
- Method GO_anova() renamed to GO_analyse() as other metrics than
ANOVA may be implemented in the future.
GOexpress 0.4.1
NEW FEATURES
- Supports microarray probeset identifiers as gene identifiers
in the expression dataset.
- Argument "result=" is no more optional in all the post-analysis
functions.
UPDATED FEATURES
- expression_ploto() functions now dynamically adapt the colormap
to the number of groups of samples instead of three hard-coded colors
used for the colormap.
GOexpress 0.3.1
NEW FEATURES
- expression_plot() plots the expression profile of the gene
corresponding to an Ensembl identifier, given valid variable name
for the X-axis and a grouping factor for the Y-axis.
- expression_plot_symbol() plots the expression profile of the
gene(s) with the Ensembl identifier(s) corresponding to a gene
symbol, given valid variable name for the X-axis and a grouping
factor for the Y-axis.
- plot_design() plots the univariate effect of each level of each
factor available in the AnnotatedDataFrame on the expression levels
of genes associated with a GO term.
UPDATED FEATURES
- The scoring function now uses the total count of genes associated
with a GOterm instead of the count of genes in the dataset and
associated with the GO term. Genes absent from the expression dataset
are assigned a F.value of 0, similarly to the genes with a non-
significant ANOVA result. This was not found to significantly impact
the ranking of GO terms, but is a more objective scoring of the
GO terms if the expression data was generated through objective
filtering (e.g. filtering out lowly expressed genes).
- subset_scores() now also filters the $mapping and $anova slots
to retain only mappings and gene information related to the GO terms
left in the $scores slot after filtering. This reduces significantly
the memory space of the filtered variable, while the object containing
the raw results of GO_anova() will still contain the full information
fetched from the Ensembl BioMart server if the user wants it.
- ggplot2 and grid are now required for the proper installation of
anovaGO, even though these packages are only required for a subset
of non-essential features in anovaGO. This way, all features in
anovaGO will be available as soon as the package is installed.
- Messages printed to the user during the executing of the funcions
were made as clear as possible, particularly when fetching information
from the Ensembl BioMart server.
GOexpress 0.2
UPDATED FEATURES
- subset_scores() can now filter and keep only GO terms of a given
type, i.e. "Biological process", "Molecular function", or "Cellular
Component".
GOexpress 0.1
OVERVIEW
- This package was designed for the analysis of bioinformatics-related
data based on gene expression measurements. It requires 3 input
values: (1) a gene-by-sample table providing the expression level
of genes (rows) in each sample (columns), (2) an AnnotatedDataFrame
from the Biobase package providing phenotypic information about the
samples grouping them by levels of the factor, (3) the name of the
grouping factor to investigate, which must be a valid column name
in the AnnotatedDataFrame.
- The analysis will identify all Gene Ontology (GO) terms represented
by at least one gene in the expression dataset. A one-way ANOVA
will be performed on the grouping factor for each gene present in
the expression dataset. Following multiple-testing correction, genes
below the threshold for significance will be assigned an F.value of
0. GO terms will be scored and ranked on the average F.value of
associated genes.
- Functions are provided to investigate and visualise the results of
the above analysis. The score table can be filtered for rows over
given thresholds. The distribution of scores can be visualised. The
quantiles of scores can be obtained. The genes associated with a
given GO term can be listed, with or without descriptive information.
Hierarchical clustering of the samples can be performed based on the
expression levels of genes associated with a given GO term. Heatmaps
accompanied by hierarchical clustering of samples and genes can be
drawn and customised.
FEATURES
- GO_anova() scores all Gene Ontology (GO) terms represented in
the dataset based on the ability of their associated genes to cluster
samples according to a predefined grouping factor. It also returns
the table used to map genes to GO terms, the table summarising the
one-way ANOVA results for each gene, and finally the predefined
grouping factor used for ANOVA. Genes annotated to a GO term but
absent from the expression dataset are ignored.
- get_mart_dataset() returns a connection to the appropriate BioMart
dataset of the Ensembl server based on the gene name of the first gene in
the expression dataset. The choice of the dataset can be overriden by the
user if a valid BioMart Ensembl dataset is specified.
- subset_scores() filters the table of scored GO terms in the
output of GO_anova() and returns a list formatted identically to the
output of GO_anova() with the resulting filtered table of scores.
- hist_scores() plots the distribution of average F scores in the
output of GO_anova() or subset_scores().
- quantiles_scores() returns the quantile values corresponding
to defined percentiles.
- list_genes() returns the list of Ensembl gene identifiers
associated with a given GO term.
- table_genes() returns a table of information about the Ensembl
gene identifiers associated with a given GO term.
- cluster_GO() plots a hierarchical clustering of the samples
based on the expression levels of genes associated with a given
GO term.
- heatmap_GO() plots a heatmap with hierarchical clustering of
the samples and genes based on the expression levels of genes
associated with a given GO term.