Title: | R/Bioconductor Package for interfacing with Snaptron for rapid querying of expression counts |
---|---|
Description: | snapcount is a client interface to the Snaptron webservices which support querying by gene name or genomic region. Results include raw expression counts derived from alignment of RNA-seq samples and/or various summarized measures of expression across one or more regions/genes per-sample (e.g. percent spliced in). |
Authors: | Rone Charles [aut, cre] |
Maintainer: | Rone Charles <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.19.0 |
Built: | 2024-10-31 05:25:11 UTC |
Source: | https://github.com/bioc/snapcount |
snapcount is a client interface to the Snaptron webservice which supports querying by gene name or genomic region.
Results include raw expression counts derived from alignment of RNA-seq samples and/or various summarized measures of expression across one or more regions/genes per-sample (e.g. percent spliced in).
To learn more about snapcount, check out the vignette:
browseVignettes(package = "snapcount")
snapcount.host
Change the host that snapcount uses when
connecting to Snaptron. Default: snaptron.cs.jhu.edu
snapcount.port
Change the port that snapcount uses when
connecting to Snaptron. Default: 80
Maintainer: Rone Charles [email protected]
Useful links:
Report bugs at https://github.com/langmead-lab/snapcount/issues
The variants for this enum will be populated dynamically after the package has been loaded. If the package cannot connect to the internet the variants will default to:
Compilation
Compilation
An object of class environment
of length 21.
gtex
tcga
srav2
sra
http://snaptron.cs.jhu.edu/data.html for more information about Snaptron compilations.
qb <- QueryBuilder(compilation = Compilation$gtex, regions = "KCNIP4") query_jx(qb)
qb <- QueryBuilder(compilation = Compilation$gtex, regions = "KCNIP4") query_jx(qb)
Enum for Snaptron Coordinate modifiers
Coordinates
Coordinates
An object of class environment
of length 4.
Exact
Return junctions whose start and end coordinates match the boundaries of the region requested.
Within
Return junctions whose start and end coordinates are within the boundaries of the region requested.
StartIsExactOrWithin
Return junctions whose start coordinate matches, or is within, the boundaries of the region requested.
EndIsExactOrWithin
Return junctions whose end coordinate matches, or is within, the boundaries of the region requested.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") qb <- set_coordinate_modifier(qb, Coordinates$Exact) qb
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") qb <- set_coordinate_modifier(qb, Coordinates$Exact) qb
QueryBuilder
object from the given urlConstructs a QueryBuilder
object from the given url
from_url(url)
from_url(url)
url |
a well-formed url preferably obtained from a call to the
|
Returns a QueryBuilder
object with attributes set from the
parsed url.
sb <- from_url("http://snaptron.cs.jhu.edu/gtex/snaptron?regions=CD99") get_regions(sb) get_compilation(sb)
sb <- from_url("http://snaptron.cs.jhu.edu/gtex/snaptron?regions=CD99") get_regions(sb) get_compilation(sb)
Get or set sample-related contraints for query
get_column_filters(qb) set_column_filters(qb, ...)
get_column_filters(qb) set_column_filters(qb, ...)
qb |
a QueryBuilder object constructed using the
|
... |
one or more boolean predicates as either strings or unevaluated expressions |
get_column_filters
returns the current filters as a list of strings.
set_column_filters
returns a new QueryBuilder
object
with the column filters set to the value of column_filters
.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") # column filters set using a string qb <- set_column_filters(qb, "SMTS == Brain") get_column_filters(qb) # column filters set using unevaluated expression qb <- set_column_filters(qb, SMTS == "Spleen") get_column_filters(qb)
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") # column filters set using a string qb <- set_column_filters(qb, "SMTS == Brain") get_column_filters(qb) # column filters set using unevaluated expression qb <- set_column_filters(qb, SMTS == "Spleen") get_column_filters(qb)
Get or set query compilation
get_compilation(qb) set_compilation(qb, compilation)
get_compilation(qb) set_compilation(qb, compilation)
qb |
A QueryBuilder object constructed using the
|
compilation |
A single string containing the name of the Snaptron data
source. Any variant of the |
get_compilation
returns the current compilation as string.
set_compilation
returns a new QueryBuilder
object with
the compilation set to the value of compilation
.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") get_compilation(qb) qb <- set_compilation(qb, Compilation$tcga) get_compilation(qb)
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") get_compilation(qb) qb <- set_compilation(qb, Compilation$tcga) get_compilation(qb)
Get or set coordinate modifiers for the query.
get_coordinate_modifier(qb) set_coordinate_modifier(qb, coordinate_modifier)
get_coordinate_modifier(qb) set_coordinate_modifier(qb, coordinate_modifier)
qb |
a QueryBuilder object constructed using the
|
coordinate_modifier |
any of the variants of the Coordinates enum. |
get_coordinate_modifier
returns the current coodinate modifier
as a string.
set_coordinate_modifier
returns a new QueryBuilder
object with the coordinate modifier set to the value of
coordinate_modifier
.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") qb <- set_coordinate_modifier(qb, Coordinates$Within) get_coordinate_modifier(qb)
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") qb <- set_coordinate_modifier(qb, Coordinates$Within) get_coordinate_modifier(qb)
Get or set query regions
get_regions(qb) set_regions(qb, regions)
get_regions(qb) set_regions(qb, regions)
qb |
A QueryBuilder object constructed using the
|
regions |
Either a list of 1 more |
get_regions
returns the current regions as a list of strings.
set_regions
returns a new QueryBuilder
object with the
regions set to the value of regions
.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") get_regions(qb) qb <- set_regions(qb, "chr1:1-1000") get_regions(qb) qb <- set_regions(qb, GenomicRanges::GRanges("chr1", "1-1000")) get_regions(qb)
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") get_regions(qb) qb <- set_regions(qb, "chr1:1-1000") get_regions(qb) qb <- set_regions(qb, GenomicRanges::GRanges("chr1", "1-1000")) get_regions(qb)
Get or set range-related contraints for query
get_row_filters(qb) set_row_filters(qb, ...)
get_row_filters(qb) set_row_filters(qb, ...)
qb |
a QueryBuilder object constructed using the
|
... |
one or more boolean predicates as either strings or unevaluated expressions. |
get_row_filters
returns the current row filters as list of strings.
set_row_filters
returns a new QueryBuilder
object with
the row filters set to the value of row_filters
.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") # row filters set as a string qb <- set_row_filters(qb, "strand == +") get_row_filters(qb) # row filters set using unevaluated expression qb <- set_row_filters(qb, strand == "+") get_row_filters(qb)
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") # row filters set as a string qb <- set_row_filters(qb, "strand == +") get_row_filters(qb) # row filters set using unevaluated expression qb <- set_row_filters(qb, strand == "+") get_row_filters(qb)
Get or set query sample ids
get_sids(qb) set_sids(qb, sids)
get_sids(qb) set_sids(qb, sids)
qb |
a QueryBuilder object constructed using the
|
sids |
a vector or 1 or more whole numbers to filter results on. |
get_sids
returns the current sample ids as a vector of integers.
set_sids
returns a new QueryBuilder
object with the
sample ids set to the value of sids
.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") qb <- set_sids(qb, c(1, 2, 3)) get_sids(qb)
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") qb <- set_sids(qb, c(1, 2, 3)) get_sids(qb)
Calculates a coverage summary statistic per sample of the normalized coverage difference between two sets of separate junctions defined by at least two basic queries and organized into two groups.
junction_inclusion_ratio(group1, group2, group_names = NULL)
junction_inclusion_ratio(group1, group2, group_names = NULL)
group1 , group2
|
Each group is a list of 1 or more QueryBuilder objects |
group_names |
Optional vector of strings representing the group names |
The summary statistic is as follows: If the coverage of the first group is "A" and the second is "B":
JIR(A,B)=(A - B) / (A+B+1)
This is calculated for every sample that occurs in one or the other (or both) groups results.
A DataFrame of samples, with their JIR score and metadata, which had > 0 coverage in at least one resulting row in at least one of the groups
sb1 <- QueryBuilder(compilation = "srav2", regions = "chr2:29446395-30142858") sb1 <- set_coordinate_modifier(sb1, Coordinates$Within) sb1 <- set_row_filters(sb1, strand == "-") sb2 <- QueryBuilder(compilation = "srav2", regions = "chr2:29416789-29446394") sb2 <- set_coordinate_modifier(sb2, Coordinates$Within) sb2 <- set_row_filters(sb2, strand == "-") junction_inclusion_ratio(list(sb1), list(sb2))
sb1 <- QueryBuilder(compilation = "srav2", regions = "chr2:29446395-30142858") sb1 <- set_coordinate_modifier(sb1, Coordinates$Within) sb1 <- set_row_filters(sb1, strand == "-") sb2 <- QueryBuilder(compilation = "srav2", regions = "chr2:29416789-29446394") sb2 <- set_coordinate_modifier(sb2, Coordinates$Within) sb2 <- set_row_filters(sb2, strand == "-") junction_inclusion_ratio(list(sb1), list(sb2))
This function operates similar to the junction_union()
function, i.e
it is cross compilation and merging of the same junction from multiple
compilations will be handled exactly the same way. But instead
of every junction which appears in at least one compilation, only
the junctions which appear in every compilation will be returned.
junction_intersection(...)
junction_intersection(...)
... |
One or more QueryBuilder objects |
A RangedSummarizedExperiment of junctions common across compilations
# Using query builder wrappers sb1 <- QueryBuilder(compilation = "gtex", regions = "chr1:1879786-1879786") sb1 <- set_coordinate_modifier(sb1, Coordinates$EndIsExactOrWithin) sb1 <- set_row_filters(sb1, strand == "-") sb2 <- QueryBuilder(compilation = "tcga", regions = "chr1:1879786-1879786") sb2 <- set_coordinate_modifier(sb2, Coordinates$EndIsExactOrWithin) sb2 <- set_row_filters(sb2, strand == "-") junction_intersection(sb1, sb2)
# Using query builder wrappers sb1 <- QueryBuilder(compilation = "gtex", regions = "chr1:1879786-1879786") sb1 <- set_coordinate_modifier(sb1, Coordinates$EndIsExactOrWithin) sb1 <- set_row_filters(sb1, strand == "-") sb2 <- QueryBuilder(compilation = "tcga", regions = "chr1:1879786-1879786") sb2 <- set_coordinate_modifier(sb2, Coordinates$EndIsExactOrWithin) sb2 <- set_row_filters(sb2, strand == "-") junction_intersection(sb1, sb2)
This function queries 2 or more compilations which are on the same reference version (e.g. hg38) and merges the resulting junctions into a single output table, unioning the sample coverage columns and the snaptron_id (jx ID) columns (the latter delimiter will be ":"). All sample IDs will be disjoint between compilations.
junction_union(...)
junction_union(...)
... |
One or more QueryBuilder objects |
Union is based on the following fields (combined into a comparison key):
group
chromosome
start
end
strand
The goal is to have a single list of junctions where every junction occurs in at least one compilation and if a junction occurs in > 1 compilations it still only has a single row representing all the samples across compilations that it appears in. Sample aggregate statistics will be recalculated for junctions which are merged across all samples from all compilations:
sample_count
coverage_sum
coverage_avg
coverage_median
A RangedSummarizedExperiment of junctions appearing in at least one compilation
# Using query builder wrappers sb1 <- QueryBuilder(compilation = "gtex", regions = "chr1:1879786-1879786") sb1 <- set_coordinate_modifier(sb1, Coordinates$EndIsExactOrWithin) sb1 <- set_row_filters(sb1, strand == "-") sb2 <- QueryBuilder(compilation = "tcga", regions = "chr1:1879786-1879786") sb2 <- set_coordinate_modifier(sb2, Coordinates$EndIsExactOrWithin) sb2 <- set_row_filters(sb2, strand == "-") junction_union(sb1, sb2)
# Using query builder wrappers sb1 <- QueryBuilder(compilation = "gtex", regions = "chr1:1879786-1879786") sb1 <- set_coordinate_modifier(sb1, Coordinates$EndIsExactOrWithin) sb1 <- set_row_filters(sb1, strand == "-") sb2 <- QueryBuilder(compilation = "tcga", regions = "chr1:1879786-1879786") sb2 <- set_coordinate_modifier(sb2, Coordinates$EndIsExactOrWithin) sb2 <- set_row_filters(sb2, strand == "-") junction_union(sb1, sb2)
Similar to the JIR, this calculates Percent Spliced In (PSI) statistics for the definition of 2 different groups: inclusion and exclusion. Currently this function only supports the cassette exon use case.
percent_spliced_in( inclusion_group1, inclusion_group2, exclusion_group, min_count = 20, group_names = NULL )
percent_spliced_in( inclusion_group1, inclusion_group2, exclusion_group, min_count = 20, group_names = NULL )
inclusion_group1 , inclusion_group2 , exclusion_group
|
Where each is a list of 1 or more QueryBuilder objects |
min_count |
minimum total count (denominator) required to not be assigned -1 |
group_names |
Optional vector of strings representing the group names |
Inclusion typically defines 2 basic queries, one for the junction preceding the cassette exon, and the second for the junction following the cassette exon. The exclusion group contains one basic query which defines the junction which skips the cassette exon.
The PSI itself is implemented as:
PSI(inclusion1, inclusion2, exclusion) =
mean(inclusion1, inclusion2) / (mean(inclusion1, inclusion2) + exclusion)
where each term denotes the coverage of junctions that resulted from the basic queries in that group in the current sample.
A DataFrame of samples, with their PSI score and metadata, which had > 0 coverage in at least one resulting row in at least one of the groups
in1 <- QueryBuilder(compilation = "srav2", regions = "chr1:94468008-94472172") in1 <- set_coordinate_modifier(in1, Coordinates$Exact) in1 <- set_row_filters(in1, strand == "+") in2 <- QueryBuilder(compilation = "srav2", regions = "chr1:94468008-94472172") in2 <- set_coordinate_modifier(in2, Coordinates$Exact) in2 <- set_row_filters(in2, strand == "+") ex <- QueryBuilder(compilation = "srav2", regions = "chr1:94468008-94475142") ex <- set_coordinate_modifier(ex, Coordinates$Exact) ex <- set_row_filters(ex, strand == "+") percent_spliced_in(list(in1), list(in2), list(ex))
in1 <- QueryBuilder(compilation = "srav2", regions = "chr1:94468008-94472172") in1 <- set_coordinate_modifier(in1, Coordinates$Exact) in1 <- set_row_filters(in1, strand == "+") in2 <- QueryBuilder(compilation = "srav2", regions = "chr1:94468008-94472172") in2 <- set_coordinate_modifier(in2, Coordinates$Exact) in2 <- set_row_filters(in2, strand == "+") ex <- QueryBuilder(compilation = "srav2", regions = "chr1:94468008-94475142") ex <- set_coordinate_modifier(ex, Coordinates$Exact) ex <- set_row_filters(ex, strand == "+") percent_spliced_in(list(in1), list(in2), list(ex))
Given one or more gene names or genomic range intervals it will return a list of 0 or more genes, junctions, or exons (depending on which query form is used) which overlap the ranges.
query_jx(sb, return_rse = TRUE, split_by_region = FALSE) query_gene(sb, return_rse = TRUE, split_by_region = FALSE) query_exon(sb, return_rse = TRUE, split_by_region = FALSE)
query_jx(sb, return_rse = TRUE, split_by_region = FALSE) query_gene(sb, return_rse = TRUE, split_by_region = FALSE) query_exon(sb, return_rse = TRUE, split_by_region = FALSE)
sb |
A SnaptronQueryBuilder object |
return_rse |
Should the query data be returned as a simple data frame or converted to a RangedSummarizedExperiment. |
split_by_region |
By default the results from multiple queries will be
returned in a |
Functions will return either a RangedSummarizedExperiment or
data.table depending on whether the return_rse
parameter is set to
TRUE
or FALSE
.
# Contruct a QueryBuilder object qb <- QueryBuilder(compilation = "gtex", regions = "chr1:1-100000") qb <- set_row_filters(qb, samples_count >= 20) query_jx(qb) qb <- set_row_filters(qb, NULL) qb <- set_column_filters(qb, SMTS == "Brain") query_gene(qb)
# Contruct a QueryBuilder object qb <- QueryBuilder(compilation = "gtex", regions = "chr1:1-100000") qb <- set_row_filters(qb, samples_count >= 20) query_jx(qb) qb <- set_row_filters(qb, NULL) qb <- set_column_filters(qb, SMTS == "Brain") query_gene(qb)
Construct a QueryBuilder object given a compilation and one or regions.
QueryBuilder(compilation, regions)
QueryBuilder(compilation, regions)
compilation |
A single string containing the name of the
Snaptron data source. Any variant of the |
regions |
Either a list of 1 more |
A QueryBuilder object.
# contruct a query builder for GTEX data source and BRAC1 gene qb <- QueryBuilder(compilation = Compilation$gtex, regions = "BRCA1") # contruct a query builder for TCGA data source and chromosome region qb <- QueryBuilder(compilation = "tcga", regions = "chr1:1-1000") # construct a query builder for TCGA data source using GRanges object library(GenomicRanges) qb <- QueryBuilder(compilation = "tcga", regions = GRanges("chr1", "1-1000"))
# contruct a query builder for GTEX data source and BRAC1 gene qb <- QueryBuilder(compilation = Compilation$gtex, regions = "BRCA1") # contruct a query builder for TCGA data source and chromosome region qb <- QueryBuilder(compilation = "tcga", regions = "chr1:1-1000") # construct a query builder for TCGA data source using GRanges object library(GenomicRanges) qb <- QueryBuilder(compilation = "tcga", regions = GRanges("chr1", "1-1000"))
Lists the number of samples labeled with a specific tissue type. Samples are filtered for ones which have junctions across all the user-specified groups. That is, if a sample only appears in the results of some of the groups (from their basic queries) it will be assigned a 0, otherwise if it is in all of the groups' results it will be assigned a 1. This is similar to the SSC high level query type, but doesn't sum the coverage.
tissue_specificity(..., group_names = NULL)
tissue_specificity(..., group_names = NULL)
... |
One or more QueryBuilder objects |
group_names |
Optional vector of strings representing the group names |
The samples are then grouped by their tissue type (e.g. Brain). This is useful for determining if there's an enrichment for a specific tissue in the set of junctions queried. Results from this can be fed to a statistical test, such as the Kruskal-wallis non-parametric rank test. This query is limited to GTEx only, due to the fact that GTEx is one of the few compilations that has consistent and complete tissue metadata.
A DataFrame of all samples in the compilation with either a 0 or 1 indicating their occurrence and shared status (if > 1 group passed in). Occurrence here is if the sample has at least one result with > 0 coverage, and further, if > 1 group is passed in, then if it occurs in the results of all groups. Also includes the sample tissue type and sample_id.
in1 <- QueryBuilder(compilation = "gtex", regions = "chr4:20763023-20763023") in1 <- set_coordinate_modifier(in1, Coordinates$EndIsExactOrWithin) in1 <- set_row_filters(in1, strand == "-") in2 <- QueryBuilder(compilation = "gtex", regions = "chr4:20763098-20763098") in2 <- set_coordinate_modifier(in2, Coordinates$StartIsExactOrWithin) in2 <- set_row_filters(in2, strand == "-") tissue_specificity(list(in1, in2))
in1 <- QueryBuilder(compilation = "gtex", regions = "chr4:20763023-20763023") in1 <- set_coordinate_modifier(in1, Coordinates$EndIsExactOrWithin) in1 <- set_row_filters(in1, strand == "-") in2 <- QueryBuilder(compilation = "gtex", regions = "chr4:20763098-20763098") in2 <- set_coordinate_modifier(in2, Coordinates$StartIsExactOrWithin) in2 <- set_row_filters(in2, strand == "-") tissue_specificity(list(in1, in2))
This function can be paired with the from_url
method from
the QueryBuilder class, allowing users to share sources of data
from Snaptron.
uri_of_last_successful_request()
uri_of_last_successful_request()
URI of last successful request to Snaptron or NULL
if there
have not been any successful requests.
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") query_jx(qb) uri_of_last_successful_request()
qb <- QueryBuilder(compilation = "gtex", regions = "CD99") query_jx(qb) uri_of_last_successful_request()