Title: | Facilities for Filtering Bioconductor Annotation Resources |
---|---|
Description: | This package provides class and other infrastructure to implement filters for manipulating Bioconductor annotation resources. The filters will be used by ensembldb, Organism.dplyr, and other packages. |
Authors: | Martin Morgan [aut], Johannes Rainer [aut], Joachim Bargsten [ctb], Daniel Van Twisk [ctb], Bioconductor Package Maintainer [cre] |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.31.0 |
Built: | 2024-12-27 06:00:20 UTC |
Source: | https://github.com/bioc/AnnotationFilter |
The filters extending the base AnnotationFilter
class
represent a simple filtering concept for annotation resources.
Each filter object is thought to filter on a single (database)
table column using the provided values and the defined condition.
Filter instances created using the constructor functions (e.g.
GeneIdFilter
).
supportedFilters()
lists all defined filters. It returns a two column
data.frame
with the filter class name and its default field.
Packages using AnnotationFilter
should implement the
supportedFilters
for their annotation resource object (e.g. for
object = "EnsDb"
in the ensembldb
package) to list all
supported filters for the specific resource.
condition()
get the condition
value for
the filter object
.
value()
get the value
for the filter
object
.
field()
get the field
for the filter
object
.
not()
get the not
for the filter object
.
feature()
get the feature
for the
GRangesFilter
object
.
Converts an AnnotationFilter
object to a
character(1)
giving an equation that can be used as input to
a dplyr
filter.
AnnotationFilter
translates a filter
expression such as ~ gene_id == "BCL2"
into a filter object
extending the AnnotationFilter
class (in the example a
GeneIdFilter
object) or an
AnnotationFilterList
if the expression contains multiple
conditions (see examples below). Filter expressions have to be written
in the form ~ <field> <condition> <value>
, with <field>
being the default field of the filter class (use the
supportedFilter
function to list all fields and filter classes),
<condition>
the logical expression and <value>
the value
for the filter.
CdsStartFilter(value, condition = "==", not = FALSE) CdsEndFilter(value, condition = "==", not = FALSE) ExonIdFilter(value, condition = "==", not = FALSE) ExonNameFilter(value, condition = "==", not = FALSE) ExonRankFilter(value, condition = "==", not = FALSE) ExonStartFilter(value, condition = "==", not = FALSE) ExonEndFilter(value, condition = "==", not = FALSE) GeneIdFilter(value, condition = "==", not = FALSE) GeneNameFilter(value, condition = "==", not = FALSE) GeneBiotypeFilter(value, condition = "==", not = FALSE) GeneStartFilter(value, condition = "==", not = FALSE) GeneEndFilter(value, condition = "==", not = FALSE) EntrezFilter(value, condition = "==", not = FALSE) SymbolFilter(value, condition = "==", not = FALSE) TxIdFilter(value, condition = "==", not = FALSE) TxNameFilter(value, condition = "==", not = FALSE) TxBiotypeFilter(value, condition = "==", not = FALSE) TxStartFilter(value, condition = "==", not = FALSE) TxEndFilter(value, condition = "==", not = FALSE) ProteinIdFilter(value, condition = "==", not = FALSE) UniprotFilter(value, condition = "==", not = FALSE) SeqNameFilter(value, condition = "==", not = FALSE) SeqStrandFilter(value, condition = "==", not = FALSE) ## S4 method for signature 'AnnotationFilter' condition(object) ## S4 method for signature 'AnnotationFilter' value(object) ## S4 method for signature 'AnnotationFilter' field(object) ## S4 method for signature 'AnnotationFilter' not(object) GRangesFilter(value, feature = "gene", type = c("any", "start", "end", "within", "equal")) feature(object) ## S4 method for signature 'AnnotationFilter,missing' convertFilter(object) ## S4 method for signature 'missing' supportedFilters(object) AnnotationFilter(expr)
CdsStartFilter(value, condition = "==", not = FALSE) CdsEndFilter(value, condition = "==", not = FALSE) ExonIdFilter(value, condition = "==", not = FALSE) ExonNameFilter(value, condition = "==", not = FALSE) ExonRankFilter(value, condition = "==", not = FALSE) ExonStartFilter(value, condition = "==", not = FALSE) ExonEndFilter(value, condition = "==", not = FALSE) GeneIdFilter(value, condition = "==", not = FALSE) GeneNameFilter(value, condition = "==", not = FALSE) GeneBiotypeFilter(value, condition = "==", not = FALSE) GeneStartFilter(value, condition = "==", not = FALSE) GeneEndFilter(value, condition = "==", not = FALSE) EntrezFilter(value, condition = "==", not = FALSE) SymbolFilter(value, condition = "==", not = FALSE) TxIdFilter(value, condition = "==", not = FALSE) TxNameFilter(value, condition = "==", not = FALSE) TxBiotypeFilter(value, condition = "==", not = FALSE) TxStartFilter(value, condition = "==", not = FALSE) TxEndFilter(value, condition = "==", not = FALSE) ProteinIdFilter(value, condition = "==", not = FALSE) UniprotFilter(value, condition = "==", not = FALSE) SeqNameFilter(value, condition = "==", not = FALSE) SeqStrandFilter(value, condition = "==", not = FALSE) ## S4 method for signature 'AnnotationFilter' condition(object) ## S4 method for signature 'AnnotationFilter' value(object) ## S4 method for signature 'AnnotationFilter' field(object) ## S4 method for signature 'AnnotationFilter' not(object) GRangesFilter(value, feature = "gene", type = c("any", "start", "end", "within", "equal")) feature(object) ## S4 method for signature 'AnnotationFilter,missing' convertFilter(object) ## S4 method for signature 'missing' supportedFilters(object) AnnotationFilter(expr)
object |
An |
value |
|
feature |
|
type |
|
expr |
A filter expression, written as a |
condition |
|
not |
|
By default filters are only available for tables containing the
field on which the filter acts (i.e. that contain a column with the
name matching the value of the field
slot of the
object). See the vignette for a description to use filters for
databases in which the database table column name differs from the
default field
of the filter.
Filter expressions for the AnnotationFilter
class have to be
written as formulas, i.e. starting with a ~
.
The constructor function return an object extending
AnnotationFilter
. For the return value of the other methods see
the methods' descriptions.
character(1)
that can be used as input to a dplyr
filter.
AnnotationFilter
returns an
AnnotationFilter
or an AnnotationFilterList
.
Translation of nested filter expressions using the
AnnotationFilter
function is not yet supported.
AnnotationFilterList
for combining
AnnotationFilter
objects.
## filter by GRanges GRangesFilter(GenomicRanges::GRanges("chr10:87869000-87876000")) ## Create a SymbolFilter to filter on a gene's symbol. sf <- SymbolFilter("BCL2") sf ## Create a GeneStartFilter to filter based on the genes' chromosomal start ## coordinates gsf <- GeneStartFilter(10000, condition = ">") gsf filter <- SymbolFilter("ADA", "==") result <- convertFilter(filter) result supportedFilters() ## Convert a filter expression based on a gene ID to a GeneIdFilter gnf <- AnnotationFilter(~ gene_id == "BCL2") gnf ## Same conversion but for two gene IDs. gnf <- AnnotationFilter(~ gene_id %in% c("BCL2", "BCL2L11")) gnf ## Converting an expression that combines multiple filters. As a result we ## get an AnnotationFilterList containing the corresponding filters. ## Be aware that nesting of expressions/filters does not work. flt <- AnnotationFilter(~ gene_id %in% c("BCL2", "BCL2L11") & tx_biotype == "nonsense_mediated_decay" | seq_name == "Y") flt
## filter by GRanges GRangesFilter(GenomicRanges::GRanges("chr10:87869000-87876000")) ## Create a SymbolFilter to filter on a gene's symbol. sf <- SymbolFilter("BCL2") sf ## Create a GeneStartFilter to filter based on the genes' chromosomal start ## coordinates gsf <- GeneStartFilter(10000, condition = ">") gsf filter <- SymbolFilter("ADA", "==") result <- convertFilter(filter) result supportedFilters() ## Convert a filter expression based on a gene ID to a GeneIdFilter gnf <- AnnotationFilter(~ gene_id == "BCL2") gnf ## Same conversion but for two gene IDs. gnf <- AnnotationFilter(~ gene_id %in% c("BCL2", "BCL2L11")) gnf ## Converting an expression that combines multiple filters. As a result we ## get an AnnotationFilterList containing the corresponding filters. ## Be aware that nesting of expressions/filters does not work. flt <- AnnotationFilter(~ gene_id %in% c("BCL2", "BCL2L11") & tx_biotype == "nonsense_mediated_decay" | seq_name == "Y") flt
The AnnotationFilterList
allows to combine
filter objects extending the AnnotationFilter
class to construct more complex queries. Consecutive filter
objects in the AnnotationFilterList
can be combined by a
logical and (&
) or or (|
). The
AnnotationFilterList
extends list
, individual
elements can thus be accessed with [[
.
value()
get a list
with the
AnnotationFilter
objects. Use [[
to access
individual filters.
logicOp()
gets the logical operators separating
successive AnnotationFilter
.
not()
gets the logical operators separating
successive AnnotationFilter
.
Converts an AnnotationFilterList
object to a
character(1)
giving an equation that can be used as input to
a dplyr
filter.
AnnotationFilterList(..., logicOp = character(), logOp = character(), not = FALSE, .groupingFlag = FALSE) ## S4 method for signature 'AnnotationFilterList' value(object) ## S4 method for signature 'AnnotationFilterList' logicOp(object) ## S4 method for signature 'AnnotationFilterList' not(object) ## S4 method for signature 'AnnotationFilterList' distributeNegation(object, .prior_negation = FALSE) ## S4 method for signature 'AnnotationFilterList,missing' convertFilter(object) ## S4 method for signature 'AnnotationFilterList' show(object)
AnnotationFilterList(..., logicOp = character(), logOp = character(), not = FALSE, .groupingFlag = FALSE) ## S4 method for signature 'AnnotationFilterList' value(object) ## S4 method for signature 'AnnotationFilterList' logicOp(object) ## S4 method for signature 'AnnotationFilterList' not(object) ## S4 method for signature 'AnnotationFilterList' distributeNegation(object, .prior_negation = FALSE) ## S4 method for signature 'AnnotationFilterList,missing' convertFilter(object) ## S4 method for signature 'AnnotationFilterList' show(object)
... |
individual |
logicOp |
|
logOp |
Deprecated; use |
not |
|
.groupingFlag |
Flag desginated for internal use only. |
object |
An object of class |
.prior_negation |
|
AnnotationFilterList
returns an AnnotationFilterList
.
value()
returns a list
with AnnotationFilter
objects.
logicOp()
returns a character()
vector of
“&” or “|” symbols.
not()
returns a character()
vector of
“&” or “|” symbols.
AnnotationFilterList
object with DeMorgan's law applied to
it such that it is equal to the original AnnotationFilterList
object but all !
's are distributed out of the
AnnotationFilterList
object and to the nested
AnnotationFilter
objects.
character(1)
that can be used as input to a dplyr
filter.
The AnnotationFilterList
does not support containing empty
elements, hence all elements of length == 0
are removed in
the constructor function.
supportedFilters
for available
AnnotationFilter
objects
## Create some AnnotationFilters gf <- GeneNameFilter(c("BCL2", "BCL2L11")) tbtf <- TxBiotypeFilter("protein_coding", condition = "!=") ## Combine both to an AnnotationFilterList. By default elements are combined ## using a logical "and" operator. The filter list represents thus a query ## like: get all features where the gene name is either ("BCL2" or "BCL2L11") ## and the transcript biotype is not "protein_coding". afl <- AnnotationFilterList(gf, tbtf) afl ## Access individual filters. afl[[1]] ## Create a filter in the form of: get all features where the gene name is ## either ("BCL2" or "BCL2L11") and the transcript biotype is not ## "protein_coding" or the seq_name is "Y". Hence, this will get all feature ## also found by the previous AnnotationFilterList and returns also all ## features on chromosome Y. afl <- AnnotationFilterList(gf, tbtf, SeqNameFilter("Y"), logicOp = c("&", "|")) afl afl <- AnnotationFilter(~!(symbol == 'ADA' | symbol %startsWith% 'SNORD')) afl <- distributeNegation(afl) afl afl <- AnnotationFilter(~symbol=="ADA" & tx_start > "400000") result <- convertFilter(afl) result
## Create some AnnotationFilters gf <- GeneNameFilter(c("BCL2", "BCL2L11")) tbtf <- TxBiotypeFilter("protein_coding", condition = "!=") ## Combine both to an AnnotationFilterList. By default elements are combined ## using a logical "and" operator. The filter list represents thus a query ## like: get all features where the gene name is either ("BCL2" or "BCL2L11") ## and the transcript biotype is not "protein_coding". afl <- AnnotationFilterList(gf, tbtf) afl ## Access individual filters. afl[[1]] ## Create a filter in the form of: get all features where the gene name is ## either ("BCL2" or "BCL2L11") and the transcript biotype is not ## "protein_coding" or the seq_name is "Y". Hence, this will get all feature ## also found by the previous AnnotationFilterList and returns also all ## features on chromosome Y. afl <- AnnotationFilterList(gf, tbtf, SeqNameFilter("Y"), logicOp = c("&", "|")) afl afl <- AnnotationFilter(~!(symbol == 'ADA' | symbol %startsWith% 'SNORD')) afl <- distributeNegation(afl) afl afl <- AnnotationFilter(~symbol=="ADA" & tx_start > "400000") result <- convertFilter(afl) result
The GenenameFilter
class and functions are deprecated. Please use the
GeneNameFilter()
instead.
GenenameFilter(value, condition = "==", not = FALSE)
GenenameFilter(value, condition = "==", not = FALSE)
value |
|
condition |
|
not |
|
The constructor function return a GenenameFilter
.