Title: | R interface to genome annotation files and the UCSC genome browser |
---|---|
Description: | Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport. |
Authors: | Michael Lawrence, Vince Carey, Robert Gentleman |
Maintainer: | Michael Lawrence <[email protected]> |
License: | Artistic-2.0 + file LICENSE |
Version: | 1.67.0 |
Built: | 2024-11-06 06:17:15 UTC |
Source: | https://github.com/bioc/rtracklayer |
Get the active view.
The following methods are defined by rtracklayer.
activeView(object)
: Gets the active
BrowserView
from a browser session.
activeView(object) <- value
: Sets the active
BrowserView
in a browser session.
Coerce the structure of an object to one following BED-like conventions, i.e., with columns for blocks and thick regions.
asBED(x, ...) ## S4 method for signature 'GRangesList' asBED(x) ## S4 method for signature 'GAlignments' asBED(x)
asBED(x, ...) ## S4 method for signature 'GRangesList' asBED(x) ## S4 method for signature 'GAlignments' asBED(x)
x |
Generally, a tabular object to structure as BED |
... |
Arguments to pass to methods |
The exact behavior depends on the class of object
.
GRangesList
This treats object
as if it were a
list of transcripts, i.e., each element contains the exons of a
transcript. The blockStarts
and
blockSizes
columns are derived from the ranges in each
element. Also, add name
column from names(object)
.
GAlignments
Converts to GRangesList via grglist
and procedes accordingly.
A GRanges
, with the columns name
,
blockStarts
and blockSizes
added.
Michael Lawrence
## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene) exons <- exonsBy(TxDb_Hsapiens_UCSC_hg19_knownGene) mcols(asBED(exons)) ## End(Not run)
## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene) exons <- exonsBy(TxDb_Hsapiens_UCSC_hg19_knownGene) mcols(asBED(exons)) ## End(Not run)
Coerce the structure of an object to one following GFF-like
conventions, i.e., using the Parent
GFF3 attribute to encode
the hierarchical structure. This object is then suitable for export as GFF3.
asGFF(x, ...) ## S4 method for signature 'GRangesList' asGFF(x, parentType = "mRNA", childType = "exon")
asGFF(x, ...) ## S4 method for signature 'GRangesList' asGFF(x, parentType = "mRNA", childType = "exon")
x |
Generally, a tabular object to structure as GFF(3) |
parentType |
The value to store in the |
childType |
The value to store in the |
... |
Arguments to pass to methods |
For the GRangesList
method:
A GRanges
, with the columns: ID
(unique identifier),
Name
(from names(x)
, and the names on each element of
x
, if any), type
(as given by parentType
and
childType
), and Parent
(to relate each child range to
its parent at the top-level).
Michael Lawrence
## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene) exons <- exonsBy(TxDb_Hsapiens_UCSC_hg19_knownGene) mcols(asGFF(exons)) ## End(Not run)
## Not run: library(TxDb.Hsapiens.UCSC.hg19.knownGene) exons <- exonsBy(TxDb_Hsapiens_UCSC_hg19_knownGene) mcols(asGFF(exons)) ## End(Not run)
Methods for import and export of
GAlignments
or
GAlignmentPairs
objects from and to BAM
files, represented as BamFile
objects.
## S4 method for signature 'BamFile,ANY,ANY' import(con, format, text, paired = FALSE, use.names = FALSE, param = ScanBamParam(...), genome = NA_character_, ...) ## S4 method for signature 'ANY,BamFile,ANY' export(object, con, format, ...)
## S4 method for signature 'BamFile,ANY,ANY' import(con, format, text, paired = FALSE, use.names = FALSE, param = ScanBamParam(...), genome = NA_character_, ...) ## S4 method for signature 'ANY,BamFile,ANY' export(object, con, format, ...)
object |
The object to export, such as
a |
con |
A path, URL, connection or |
format |
If not missing, should be “bam”. |
text |
Not supported. |
paired |
If |
use.names |
Whether to parse QNAME as the names on the result. |
param |
The |
genome |
Single string or |
... |
Arguments that are passed to |
BAM fields not formally present in the GAlignments[Pairs]
object
are extracted from the metadata columns, if present; otherwise, the
missing value, “"."”, is output. The file is sorted and
indexed. This can be useful for subsetting BAM files, although
filterBam
may eventually become flexible
enough to be the favored alternative.
Michael Lawrence
The readGAlignments
and
readGAlignmentPairs
functions
for reading BAM files.
library(Rsamtools) ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") gal <- import(ex1_file, param=ScanBamParam(what="flag")) gal.minus <- gal[strand(gal) == "-"] ## Not run: export(gal, BamFile("ex1-minus.bam")) ## End(Not run)
library(Rsamtools) ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") gal <- import(ex1_file, param=ScanBamParam(what="flag")) gal.minus <- gal[strand(gal) == "-"] ## Not run: export(gal, BamFile("ex1-minus.bam")) ## End(Not run)
The type of UCSC track line used to annotate most types of tracks (every type except Wiggle).
Objects can be created by calls of the form new("BasicTrackLine",
...)
or parsed from a character vector track line with as(text,
"BasicTrackLine")
or converted from a
GraphTrackLine
using as(wig, "BasicTrackLine")
.
itemRgb
:Object of class "logical"
indicating
whether each feature in a track uploaded as BED should
be drawn in its specified color.
useScore
:Object of class "logical"
indicating
whether the data value should be mapped to color.
group
:Object of class "character"
naming a
group to which this track should belong.
db
:Object of class "character"
indicating the
associated genome assembly.
offset
:Object of class "numeric"
, a number
added to all positions in the track.
url
:Object of class "character"
referring to
additional information about this track.
htmlUrl
:Object of class "character"
referring
to an HTML page to be displayed with this track.
name
:Object of class "character"
specifying
the name of the track.
description
:Object of class "character"
describing the track.
visibility
:Object of class "character"
indicating the default visible mode of the track, see
UCSCTrackModes
.
color
:Object of class "integer"
representing the
track color (as from col2rgb
).
colorByStrand
:Object of class "matrix"
with two columns, as from col2rgb
. The two colors
indicate the color for each strand (positive, negative).
priority
:Object of class "numeric"
specifying
the rank of the track.
Class "TrackLine"
, directly.
Export line to its string representation.
"GraphTrackLine"
)Convert this line to a graph track line, using defaults for slots not held in common.
Michael Lawrence
https://genome.ucsc.edu/goldenPath/help/customTrack.html#TRACK for the official documentation.
GraphTrackLine
for Wiggle/bedGraph tracks.
A UCSC track line for graphical tracks.
Objects can be created by calls of the form new("Bed15TrackLine",
...)
or parsed from a character vector track line with
as(text, "Bed15TrackLine")
.
expStep
:A "numeric"
scalar indicating the step
size for the heatmap color gradient.
expScale
:A positive "numeric"
scalar indicating the
range of the data to be [-expScale, expScale]
for
determining the heatmap color gradient.
expNames
:A "character"
vector naming the
the experimental samples.
name
:Object of class "character"
specifying
the name of the track.
description
:Object of class "character"
describing the track.
visibility
:Object of class "character"
indicating the default visible mode of the track, see
UCSCTrackModes
.
color
:Object of class "integer"
representing the
track color (as from col2rgb
).
priority
:Object of class "numeric"
specifying
the rank of this track.
Class "TrackLine"
, directly.
Export line to its string representation.
Michael Lawrence
Official documentation: https://genomewiki.ucsc.edu/index.php/Microarray_track.
export.bed15
for exporting bed15 tracks.
These functions support the import and export of the UCSC BED format and its variants, including BEDGraph.
## S4 method for signature 'BEDFile,ANY,ANY' import(con, format, text, trackLine = TRUE, genome = NA, colnames = NULL, which = NULL, seqinfo = NULL, extraCols = character(), sep = c("\t", ""), na.strings=character(0L)) import.bed(con, ...) import.bed15(con, ...) import.bedGraph(con, ...) ## S4 method for signature 'ANY,BEDFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,BEDFile,ANY' export(object, con, format, append = FALSE, index = FALSE, ignore.strand = FALSE, trackLine = NULL) ## S4 method for signature 'UCSCData,BEDFile,ANY' export(object, con, format, trackLine = TRUE, ...) export.bed(object, con, ...) export.bed15(object, con, ...) ## S4 method for signature 'GenomicRanges,BED15File,ANY' export(object, con, format, expNames = NULL, trackLine = NULL, ...) export.bedGraph(object, con, ...)
## S4 method for signature 'BEDFile,ANY,ANY' import(con, format, text, trackLine = TRUE, genome = NA, colnames = NULL, which = NULL, seqinfo = NULL, extraCols = character(), sep = c("\t", ""), na.strings=character(0L)) import.bed(con, ...) import.bed15(con, ...) import.bedGraph(con, ...) ## S4 method for signature 'ANY,BEDFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,BEDFile,ANY' export(object, con, format, append = FALSE, index = FALSE, ignore.strand = FALSE, trackLine = NULL) ## S4 method for signature 'UCSCData,BEDFile,ANY' export(object, con, format, trackLine = TRUE, ...) export.bed(object, con, ...) export.bed15(object, con, ...) ## S4 method for signature 'GenomicRanges,BED15File,ANY' export(object, con, format, expNames = NULL, trackLine = NULL, ...) export.bedGraph(object, con, ...)
con |
A path, URL, connection or |
object |
The object to export, should be a |
format |
If not missing, should be one of “bed”, “bed15”, “bedGraph”, “bedpe”, “narrowPeak” or “broadPeak”. |
text |
If |
trackLine |
For import, an imported track line will be stored
in a |
genome |
The identifier of a genome, or a |
colnames |
A character vector naming the columns to parse. These should name columns in the result, not those in the BED spec, so e.g. specify “thick”, instead of “thickStart”. |
which |
A |
index |
If |
ignore.strand |
Whether to output the strand when not required (by the existence of later fields). |
seqinfo |
If not |
extraCols |
A character vector in the same form as
|
sep |
A character vector with a single character indicating the
field separator, like |
na.strings |
Character vector with strings, appended to the
standard |
append |
If |
expNames |
character vector naming columns in
|
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
The BED format is a tab-separated table of intervals, with annotations like name, score and even sub-intervals for representing alignments and gene models. Official (UCSC) child formats currently include BED15 (adding a number matrix for e.g. expression data across multiple samples) and BEDGraph (a compressed means of storing a single score variable, e.g. coverage; overlapping features are not allowed). Many tools and organizations have extended the BED format with additional columns for particular use cases. The advantage of BED is its balance between simplicity and expressiveness. It is also relatively scalable, because only the first three columns (chrom, start and end) are required. Thus, BED is best suited for representing simple features. For specialized cases, one is usually better off with another format. For example, genome-scale vectors belong in BigWig, alignments from high-throughput sequencing belong in BAM, and gene models are more richly expressed in GFF.
The following is the mapping of BED elements to a GRanges
object.
NA values are allowed only where indicated.
These appear as a “.” in the file. Only the first three columns
(chrom, start and strand) are required. The other columns can only be
included if all previous columns (to the left) are included. Upon export,
default values are used to automatically pad the table, if necessary.
the ranges
component.
character vector (NA's allowed) in the name
column; defaults to NA on export.
numeric vector in the score
column, accessible via the score
accessor. Defaults to 0
on export. This is the only column present in BEDGraph (besides
chrom, start and end), and it is required.
strand factor (NA's allowed) in the strand
column, accessible via the strand
accessor; defaults to NA
on export.
IntegerRanges
object in a
column named thick
; defaults to the ranges of the feature
on export.
an integer matrix of color codes, as returned by
col2rgb
, or any valid input to
col2rgb
, in the itemRgb
column; default is NA
on export, which translates to black.
IntegerRangesList
object in a column named blocks
; defaults to empty upon BED15
export.
For BED15 files, there should be a column of scores in
mcols(object)
for each sample in the experiment. The columns
are named according to the expNames
(found in the file, or
passed as an argument during export). NA
scores are stored as
“-10000” in the file.
For a “bedpe” file, a Pairs
object combining two
GRanges
. The name
and score
are carried over to
the metadata columns.
Otherwise, a GRanges
with the metadata columns described in the
details.
To import one of the multitude of BEDX+Y formats, such as those used
to distribute ENCODE data through UCSC (narrowPeaks, etc), specify
the extraCols
argument to indicate the expected names and
classes of the special columns. We assume that the
last length(extraCols)
columns are special, and that the
preceding columns adhere to the BED format. “narrowPeak”
and “broadPeak” types are handled explicitly by specifying
these types as the format
argument, rather than by
using extraCols
.
The BEDFile
class extends BiocFile
and is a
formal represention of a resource in the BED format.
To cast a path, URL or connection to a BEDFile
, pass it to
the BEDFile
constructor. Classes and constructors also exist
for the subclasses BED15File
, BEDGraphFile
and
BEDPEFile
.
Michael Lawrence
https://genome.ucsc.edu/goldenPath/help/customTrack.html http://bedtools.readthedocs.org/en/latest/content/general-usage.html
test_path <- system.file("tests", package = "rtracklayer") test_bed <- file.path(test_path, "test.bed") test <- import(test_bed) test test_bed_file <- BEDFile(test_bed) import(test_bed_file) test_bed_con <- file(test_bed) import(test_bed_con, format = "bed") import(test_bed, trackLine = FALSE) import(test_bed, genome = "hg19") import(test_bed, colnames = c("name", "strand", "thick")) which <- GRanges("chr7:1-127473000") import(test_bed, which = which) bed15_file <- file.path(test_path, "test.bed15") bed15 <- import(bed15_file) ## Not run: test_bed_out <- file.path(tempdir(), "test.bed") export(test, test_bed_out) test_bed_out_file <- BEDFile(test_bed_out) export(test, test_bed_out_file) export(test, test_bed_out, name = "Alternative name") test_bed_gz <- paste(test_bed_out, ".gz", sep = "") export(test, test_bed_gz) export(test, test_bed_out, index = TRUE) export(test, test_bed_out, index = TRUE, trackLine = FALSE) bed_text <- export(test, format = "bed") test <- import(format = "bed", text = bed_text) test_bed15_out <- file.path(tempdir(), "test.bed15") export(bed15, test_bed15_out) # UCSCData knows the expNames export(as(bed15, "GRanges"), test_bed15_out, # have to specify expNames expNames=paste0("breast_", c("A", "B", "C"))) ## End(Not run)
test_path <- system.file("tests", package = "rtracklayer") test_bed <- file.path(test_path, "test.bed") test <- import(test_bed) test test_bed_file <- BEDFile(test_bed) import(test_bed_file) test_bed_con <- file(test_bed) import(test_bed_con, format = "bed") import(test_bed, trackLine = FALSE) import(test_bed, genome = "hg19") import(test_bed, colnames = c("name", "strand", "thick")) which <- GRanges("chr7:1-127473000") import(test_bed, which = which) bed15_file <- file.path(test_path, "test.bed15") bed15 <- import(bed15_file) ## Not run: test_bed_out <- file.path(tempdir(), "test.bed") export(test, test_bed_out) test_bed_out_file <- BEDFile(test_bed_out) export(test, test_bed_out_file) export(test, test_bed_out, name = "Alternative name") test_bed_gz <- paste(test_bed_out, ".gz", sep = "") export(test, test_bed_gz) export(test, test_bed_out, index = TRUE) export(test, test_bed_out, index = TRUE, trackLine = FALSE) bed_text <- export(test, format = "bed") test <- import(format = "bed", text = bed_text) test_bed15_out <- file.path(tempdir(), "test.bed15") export(bed15, test_bed15_out) # UCSCData knows the expNames export(as(bed15, "GRanges"), test_bed15_out, # have to specify expNames expNames=paste0("breast_", c("A", "B", "C"))) ## End(Not run)
These functions support the import and export of the UCSC BigBed format, a compressed, binary form of BED with a spatial index and precomputed summaries. These functions do not work on Windows.
## S4 method for signature 'BigBedFile,ANY,ANY' import(con, format, text, selection = BigBedSelection(which, ...), which = con, ...) import.bb(con, ...) ## S4 method for signature 'ANY,BigBedFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,BigBedFile,ANY' export(object, con, format, compress = TRUE, extraIndexes = "") export.bb(object, con, ...)
## S4 method for signature 'BigBedFile,ANY,ANY' import(con, format, text, selection = BigBedSelection(which, ...), which = con, ...) import.bb(con, ...) ## S4 method for signature 'ANY,BigBedFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,BigBedFile,ANY' export(object, con, format, compress = TRUE, extraIndexes = "") export.bb(object, con, ...)
con |
A path, URL or |
object |
The object to export, should be |
format |
If not missing, should be “bigBed” or “bb” (case insensitive). |
text |
Not supported. |
selection |
A |
which |
A range data structure coercible to |
compress |
If |
extraIndexes |
If set, make an index on each field in a comma separated list |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
BigBedFile
objectsA BigWigFile
object, an extension of
BiocFile
is a reference to a BigBed file. To cast
a path, URL or connection to a BigBedFile
, pass it to the
BigBedFile
constructor.
BigBed files are more complex than most track files, and there are a
number of methods on BigBedFile
for accessing the additional
information:
seqinfo(x)
:
Gets the Seqinfo
object
indicating the lengths of the sequences for the intervals in the
file. No circularity or genome information is available.
When accessing remote data, the UCSC library caches data in the
‘/tmp/udcCache’ directory. To clean the cache, call
cleanBigBedCache(maxDays)
, where any files older than
maxDays
days old will be deleted.
Michael Lawrence
if (.Platform$OS.type != "windows") { test_path <- system.file("tests", package = "rtracklayer") test_bb <- file.path(test_path, "test.bb") ## Returns ranges with all fields gr <- import(test_bb) gr ## Retuns ranges only for 'chr10' ## between 180185-180185 with all fields which <- GRanges(c("chr10"), IRanges(c(180185, 180185))) import(test_bb, which = which) ## Retuns ranges only for 'chr10' ## between 180185-180185 with name and peak fields selection <- BigBedSelection(which, colnames = c("name", "peak")) import(test_bb, selection = selection) ## Not run: test_bb_out <- file.path(tempdir(), "test_out.bb") export(test, test_bb_out) ## make an index for 'name' test_bb_out <- file.path(tempdir(), "test_out.bb") export(test, test_bb_out, extraIndexes = "name") ## End(Not run) }
if (.Platform$OS.type != "windows") { test_path <- system.file("tests", package = "rtracklayer") test_bb <- file.path(test_path, "test.bb") ## Returns ranges with all fields gr <- import(test_bb) gr ## Retuns ranges only for 'chr10' ## between 180185-180185 with all fields which <- GRanges(c("chr10"), IRanges(c(180185, 180185))) import(test_bb, which = which) ## Retuns ranges only for 'chr10' ## between 180185-180185 with name and peak fields selection <- BigBedSelection(which, colnames = c("name", "peak")) import(test_bb, selection = selection) ## Not run: test_bb_out <- file.path(tempdir(), "test_out.bb") export(test, test_bb_out) ## make an index for 'name' test_bb_out <- file.path(tempdir(), "test_out.bb") export(test, test_bb_out, extraIndexes = "name") ## End(Not run) }
A BigBedSelection
represents a query against a
BigBed file, see import.bb
. It is simply
a RangedSelection with colnames
parameter.colnames
should be a character vector of column names.
Default columns are "name", "score", "thick", "itemRgb"
and "blocks"
, if non-empty, as that is the only column supported
by BigBed.
BigBedSelection(ranges = GRanges(), colnames =
"score")
: Constructs a BigBedSelection
with the given
ranges
and colnames
.
a character
identifying a genome (see
GenomicSelection
), or a
BigBedFile
, in which case the ranges are
derived from the bounds of its sequences.
as(from, "BigBedSelection")
: Coerces from
to a
BigBedSelection
object. Typically, from
is a
GRanges
or
a IntegerRangesList
, the ranges of
which become the ranges in the
new BigBedSelection
.
Michael Lawrence
rl <- IRangesList(chr1 = IRanges::IRanges(c(1, 5), c(3, 6))) BigBedSelection(rl) as(rl, "BigBedSelection") # same as above # do not select any column BigBedSelection(rl, character())
rl <- IRangesList(chr1 = IRanges::IRanges(c(1, 5), c(3, 6))) BigBedSelection(rl) as(rl, "BigBedSelection") # same as above # do not select any column BigBedSelection(rl, character())
These functions support the import and export of the UCSC BigWig format, a compressed, binary form of WIG/BEDGraph with a spatial index and precomputed summaries. These functions do not work on Windows.
## S4 method for signature 'BigWigFile,ANY,ANY' import(con, format, text, selection = BigWigSelection(which, ...), which = con, as = c("GRanges", "RleList", "NumericList"), ...) import.bw(con, ...) ## S4 method for signature 'ANY,BigWigFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,BigWigFile,ANY' export(object, con, format, dataFormat = c("auto", "variableStep", "fixedStep", "bedGraph"), compress = TRUE, fixedSummaries = FALSE) export.bw(object, con, ...)
## S4 method for signature 'BigWigFile,ANY,ANY' import(con, format, text, selection = BigWigSelection(which, ...), which = con, as = c("GRanges", "RleList", "NumericList"), ...) import.bw(con, ...) ## S4 method for signature 'ANY,BigWigFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,BigWigFile,ANY' export(object, con, format, dataFormat = c("auto", "variableStep", "fixedStep", "bedGraph"), compress = TRUE, fixedSummaries = FALSE) export.bw(object, con, ...)
con |
A path, URL or |
object |
The object to export, should be an |
format |
If not missing, should be “bigWig” or “bw” (case insensitive). |
text |
Not supported. |
as |
Specifies the class of the return object. Default is
|
selection |
A |
which |
A range data structure coercible to |
dataFormat |
Probably best left to “auto”. Exists only for historical reasons. |
compress |
If |
fixedSummaries |
If |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
A GRanges
(default), RleList
or NumericList
.
GRanges
return ranges with non-zero score values in a score
metadata column. The length of the NumericList
is the same length
as the selection
argument (one list element per range).
The return order in the NumericList
matches the order of the
BigWigSelection
object.
BigWigFile
objectsA BigWigFile
object, an extension of
BiocFile
is a reference to a BigWig file.
To cast a path, URL or connection to a BigWigFile
, pass it to the
BigWigFile
constructor.
BigWig files are more complex than most track files, and there are a
number of methods on BigWigFile
for accessing the additional
information:
seqinfo(x)
:
Gets the Seqinfo
object
indicating the lengths of the sequences for the intervals in the
file. No circularity or genome information is available.
summary(ranges = as(seqinfo(object), "GenomicRanges"), size
= 1L, type = c("mean", "min", "max", "coverage", "sd"),
defaultValue = NA_real_),
as = c("GRangesList", "RleList", "matrix"), ...
:
Aggregates the intervals in the file
that fall into ranges
, which should be something
coercible to GRanges
. The aggregation essentially
compresses each sequence to a length of size
. The
algorithm is specified by type
; available algorithms
include the mean, min, max, coverage (percent sequence covered
by at least one feature), and standard deviation. When a window
contains no features, defaultValue
is assumed. The result
type depends on as
, and can be a GRangesList, RleList or
matrix, where the number elements (or rows) is equal to the
length of ranges
. For as="matrix"
, there must be
one unique value of size
, which is equal to the number of
columns in the result. The as="matrix"
case is the only
one that supports a size
greater than the width of the
corresponding element in ranges
, where values are
interpolated to yield the matrix result. The
driving use case for this is visualization of coverage when the
screen space is small compared to the viewed portion of the
sequence. The operation is very fast, as it leverages cached
multi-level summaries present in every BigWig file.
If a summary statistic is not available / cannot be computed
for a given range a warning is thrown and the defaultValue
NA_real_
is returned.
When accessing remote data, the UCSC library caches data in the
‘/tmp/udcCache’ directory. To clean the cache, call
cleanBigWigCache(maxDays)
, where any files older than
maxDays
days old will be deleted.
BigWigFileList
objectsA BigWigFileList()
provides a convenient way of managing a list
of BigWigFile
instances.
Michael Lawrence
wigToBigWig
for converting a WIG file to BigWig.
if (.Platform$OS.type != "windows") { test_path <- system.file("tests", package = "rtracklayer") test_bw <- file.path(test_path, "test.bw") ## GRanges ## Returns ranges with non-zero scores. gr <- import(test_bw) gr which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000))) import(test_bw, which = which) ## RleList ## Scores returned as an RleList is equivalent to the coverage. ## Best option when 'which' or 'selection' contain many small ranges. mini <- narrow(unlist(tile(which, 50)), 2) rle <- import(test_bw, which = mini, as = "RleList") rle ## NumericList ## The 'which' is stored as metadata: track <- import(test_bw, which = which, as = "NumericList") metadata(track) ## Not run: test_bw_out <- file.path(tempdir(), "test_out.bw") export(test, test_bw_out) ## End(Not run) bwf <- BigWigFile(test_bw) track <- import(bwf) seqinfo(bwf) summary(bwf) # for each sequence, average all values into one summary(bwf, range(head(track))) # just average the first few features summary(bwf, size = seqlengths(bwf) / 10) # 10X reduction summary(bwf, type = "min") # min instead of mean summary(bwf, track, size = 10, as = "matrix") # each feature 10 windows }
if (.Platform$OS.type != "windows") { test_path <- system.file("tests", package = "rtracklayer") test_bw <- file.path(test_path, "test.bw") ## GRanges ## Returns ranges with non-zero scores. gr <- import(test_bw) gr which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000))) import(test_bw, which = which) ## RleList ## Scores returned as an RleList is equivalent to the coverage. ## Best option when 'which' or 'selection' contain many small ranges. mini <- narrow(unlist(tile(which, 50)), 2) rle <- import(test_bw, which = mini, as = "RleList") rle ## NumericList ## The 'which' is stored as metadata: track <- import(test_bw, which = which, as = "NumericList") metadata(track) ## Not run: test_bw_out <- file.path(tempdir(), "test_out.bw") export(test, test_bw_out) ## End(Not run) bwf <- BigWigFile(test_bw) track <- import(bwf) seqinfo(bwf) summary(bwf) # for each sequence, average all values into one summary(bwf, range(head(track))) # just average the first few features summary(bwf, size = seqlengths(bwf) / 10) # 10X reduction summary(bwf, type = "min") # min instead of mean summary(bwf, track, size = 10, as = "matrix") # each feature 10 windows }
A BigWigSelection
represents a query against a
BigWig file, see import.bw
. It is simply
a RangedSelection that requires its colnames
parameter to be "score", if non-empty, as that is the only column
supported by BigWig.
BigWigSelection(ranges = GRanges(), colnames =
"score")
: Constructs a BigWigSelection
with the given
ranges
and colnames
. ranges
can be either
something coercible to a IntegerRangesList
,
a character
identifying a genome (see
GenomicSelection
), or a
BigWigFile
, in which case the ranges are
derived from the bounds of its sequences.
as(from, "BigWigSelection")
: Coerces from
to a
BigWigSelection
object. Typically, from
is a
GRanges
or
a IntegerRangesList
, the ranges of
which become the ranges in the
new BigWigSelection
.
Michael Lawrence
rl <- IRangesList(chr1 = IRanges::IRanges(c(1, 5), c(3, 6))) BigWigSelection(rl) as(rl, "BigWigSelection") # same as above # do not select the 'score' column BigWigSelection(rl, character())
rl <- IRangesList(chr1 = IRanges::IRanges(c(1, 5), c(3, 6))) BigWigSelection(rl) as(rl, "BigWigSelection") # same as above # do not select the 'score' column BigWigSelection(rl, character())
Obtains the block ranges (subranges, usually exons) from an object,
such as a GRanges
imported
from a BED file.
blocks(x, ...)
blocks(x, ...)
x |
The instance from which to obtain the block/exon
information. Currently must be a |
... |
Additional arguments for methods |
A GRangesList
with an
element for each range in x
. The original block ranges are
relative to the start of the containing range, so the returned ranges
are shifted to absolute coordinates. The seqname and strand are
inherited from the containing range.
Michael Lawrence
import.bed
for importing a track from BED, which
can store block information; asBED
for coercing a
GenomicRanges
into a BED-like structure that can be passed to
this function.
A generic function for launching a genome browser.
browseGenome(object, ...) ## S4 method for signature 'GenomicRanges_OR_GenomicRangesList' browseGenome(object, browser = "UCSC", range = base::range(object), view = TRUE, trackParams = list(), viewParams = list(), name = "customTrack", ...)
browseGenome(object, ...) ## S4 method for signature 'GenomicRanges_OR_GenomicRangesList' browseGenome(object, browser = "UCSC", range = base::range(object), view = TRUE, trackParams = list(), viewParams = list(), name = "customTrack", ...)
object |
A |
browser |
The name of the genome browser. |
range |
A genome identifier or a
|
view |
Whether to open a view. |
trackParams |
Named list of parameters to pass to
|
viewParams |
Named list of parameters to pass to
|
name |
The name for the track. |
... |
Arguments passed to |
Returns a BrowserSession
.
Michael Lawrence
BrowserSession
and
BrowserView
, the two main classes for interfacing
with genome browsers.
## Not run: ## open UCSC genome browser: browseGenome() ## to view a specific range: range <- GRangesForUCSCGenome("hg18", "chr22", IRanges(20000, 50000)) browseGenome(range = range) ## a slightly larger range: browseGenome(range = range, end = 75000) ## with a track: track <- import(system.file("tests", "v1.gff", package = "rtracklayer")) browseGenome(GRangesList(track)) ## End(Not run)
## Not run: ## open UCSC genome browser: browseGenome() ## to view a specific range: range <- GRangesForUCSCGenome("hg18", "chr22", IRanges(20000, 50000)) browseGenome(range = range) ## a slightly larger range: browseGenome(range = range, end = 75000) ## with a track: track <- import(system.file("tests", "v1.gff", package = "rtracklayer")) browseGenome(GRangesList(track)) ## End(Not run)
An object representing a genome browser session. As a
derivative of TrackDb
, each
session contains a set of loaded tracks. In addition, it has
a set of views, in the form of BrowserView
instances, on those tracks. Note that
this is a virtual class; a concrete implementation is provided by each
backend driver.
A virtual Class: No objects may be
created from it. See browserSession
for obtaining
an instance of an implementation for a particular genome browser.
This specifies the API implemented by each browser backend. Note that
a backend is not required to support all operations, and that each
backend often has additional parameters for each of the methods. See
the backend-specific documentation for more details. The only built-in
backend is UCSCSession
.
If a method is denoted as virtual, it must be implemented by the backend to support the corresponding feature. Otherwise, the fallback behavior is described.
browserView(object, range =
range(object), track = trackNames(object), ...)
Constructs a BrowserView
of range
for this session.
browserViews(object, ...)
Gets the BrowserView
instances belonging to
this session.
activeView(object, ...)
Returns the BrowserView
that is currently
active in the session. Fallback calls browserViews
and
queries each view with activeView
.
range(x, ...)
Gets the GRanges
representing the range of the genome currently displayed by the
browser (i.e. the range shown by the active view) or a default
value (possibly NULL
) if no views exist.
getSeq(object, range =
range(object), ...)
gets a genomic sequence of range
from this session.
sequence(object, ...) <- value
Loads a sequence into the session.
track(object, name =
deparse(substitute(track)), view = TRUE, ...) <- value
Loads one or more track
s into the session and optionally open a
view
of the track.
x[[i]] <- value
Loads the track value
into
session x
, under the name i
. Shortcut to above.
x$name <- value
Loads the track value
into
session x
, under the name name
. Shortcut to above.
track(object, ...)
Gets a track from a session.
x[[i]]
Gets the track named i
from session
x
. A shortcut to track
.
x$name
Gets the track named name
from session
x
. A shortcut to track
.
trackNames(object, ...)
Gets the names of the tracks stored in this session.
genome(x)
,
genome(x) <- value
Gets or sets the genome identifier (e.g. “hg18”) for the session.
close(con, ...)
Close this session.
show(object, ...)
Output a textual description of this session.
Michael Lawrence
browserSession
for obtaining implementations of this
class for a particular genome browser.
Methods for getting browser sessions.
The following methods are defined by rtracklayer.
browserSession(object, ...)
: Creates a
BrowserSession
from a genome browser
identifier. The identifier corresponds to the prefix of the
session class name (e.g. "UCSC" in "UCSCSession"). The arguments
in ... are passed to the initialization function of the class.
Gets the
BrowserSession
for the view.
Calls browserSession("ucsc", ...)
.
An object representing a genome browser view of a particular segment of a genome.
A virtual Class: No objects may be created from it directly. See
browserView
for obtaining
an instance of an implementation for a particular genome browser.
session
:Object of class "BrowserSession"
the
browser session to which this view belongs.
This specifies the API implemented by each browser backend. Note that
a backend is not guaranteed to support all operations. See
the backend-specific documentation for more details. The only built-in
backend is UCSCView
.
browserSession(object)
Obtains the
BrowserSession
to which this view belongs.
close(object)
Close this view.
range(object)
Obtains the
GRanges
displayed by
this view.
trackNames(object)
Gets the names of the visible tracks in the view.
trackNames(object) <- value
Sets the visible tracks by their names.
show(object)
Outputs a textual description of this view.
visible(object)
Get a named logical vector indicating whether each track is visible.
visible(object) <- value
Set a logical vector
indicating the visibility of each track, with the same names and
in the same order as that returned by visible(object)
.
Michael Lawrence
browserView
for obtaining instances of this class.
Methods for creating and getting browser views.
browserView(object, range, track, ...)
browserView(object, range, track, ...)
object |
The object from which to get the views. |
range |
The |
track |
List of track names to make visible in the view. |
... |
Arguments to pass to methods |
The following methods are defined by rtracklayer.
browserView(object, range =
range(object), track = trackNames(object), imagewidth = 800,
...)
: Creates a
BrowserView
of range
with visible
tracks specified by track
. The imagewidth
parameter
specifies the width of the track image in pixels. track
may be an instance
of UCSCTrackModes
. Arguments in ...
are passed to ucscTrackModes
to create the
UCSCTrackModes
instance that will override modes
indicated by the track
parameter.
## Not run: session <- browserSession() browserView(session, GRangesForUCSCGenome("hg19", "chr2", IRanges(20000, 50000))) ## only view "knownGene" track browserView(session, track = "knownGene") ## End(Not run)
## Not run: session <- browserSession() browserView(session, GRangesForUCSCGenome("hg19", "chr2", IRanges(20000, 50000))) ## only view "knownGene" track browserView(session, track = "knownGene") ## End(Not run)
A formal list of
BrowserView
objects. Extends and inherits all
its methods from
Vector
. Usually generated by
passing multiple ranges to the browserView
function.
BrowserViewList(...)
: Concatenates the
BrowserView
objects in ...
into a new
BrowserViewList
. This is rarely called by the user.
Michael Lawrence
Methods for getting browser views.
The following methods are defined by rtracklayer.
Gets the instances of
BrowserView
in the session.
browserView
for creating a browser view.
## Not run: session <- browseGenome() browserViews(session) ## End(Not run)
## Not run: session <- browseGenome() browserViews(session) ## End(Not run)
A Chain
object represents a UCSC chain alignment, typically
imported from a chain
file, and is essentially a list of
ChainBlock
objects. Each ChainBlock
has a corresponding
chromosome (its name in the list) and is a run-length
encoded alignment, mapping a set of intervals on that chromosome to
intervals on the same or other chromosomes.
In the code snippets below, x
and object
are
ChainBlock
objects.
ranges(x)
:
Get the IntegerRanges
object holding
the starts and ends of the "from" ranges. Each range is a
contiguous block of positions aligned without gaps to the other
sequence.
offset(x)
: Integer offset from the "from" start to the
"end" start (which could be in another chromosome).
score(x)
: The score for each mapping.
space(x)
: The space (chromosome) of the "to" range.
reversed(x)
: Whether the mapping inverts the region, i.e.,
the alignment is between different strands.
A Chain
object can be loaded from a UCSC chain format file
simply by passing the path import
function. If the file
extension is not “chain”, then either pass “chain” to
the format
argument, or cast the path to a ChainFile
object. The import.chain
function is provided as a (slight)
convenience. It is documented below, along with the extra
exclude
argument to the import
method.
import.chain(con, exclude = "_", ...)
:
Imports a chain file named con
as a Chain
object, a
list of ChainBlock
s. Alignments for chromosomes matching the
exclude
pattern are not imported.
A chain file essentially details many local alignments, so it is possible for the "from" ranges to map to overlapping regions in the other sequence. The "from" ranges are guaranteed to be disjoint (but do not necessarily cover the entire "from" sequence).
Michael Lawrence
liftOver
for performing lift overs using a chain alignment
A GRanges
object (created by the GGtools
package)
with features from a subset of the SNPs on chromosome 20 from 60
HapMap founders in the CEU cohort. Each SNP has an associated data
value indicating its association with the expression of the CPNE1 gene
according to a Cochran-Armitage 1df test. The top 5000 scoring SNPs
were selected for the track.
data(cpneTrack)
data(cpneTrack)
Each feature (row) is a SNP. The association test scores are
accessible via score
.
Vince Carey and the GGtools
package.
data(cpneTrack) plot(start(cpneTrack), score(cpneTrack))
data(cpneTrack) plot(start(cpneTrack), score(cpneTrack))
These functions support the import and export of the Fasta sequence format, using the Biostrings package.
## S4 method for signature 'FastaFile,ANY,ANY' import(con, format, text, type = c("DNA", "RNA", "AA", "B"), ...) ## S4 method for signature 'ANY,FastaFile,ANY' export(object, con, format, ...) ## S4 method for signature 'XStringSet,FastaFile,ANY' export(object, con, format, ...)
## S4 method for signature 'FastaFile,ANY,ANY' import(con, format, text, type = c("DNA", "RNA", "AA", "B"), ...) ## S4 method for signature 'ANY,FastaFile,ANY' export(object, con, format, ...) ## S4 method for signature 'XStringSet,FastaFile,ANY' export(object, con, format, ...)
con |
A path or |
object |
The object to export, should be an |
format |
If not missing, should be “fasta”. |
text |
If |
type |
Type of biological sequence. |
... |
Arguments to pass down
to |
The FastaFile
class extends
BiocFile
and is
a formal represention of a resource in the Fasta format.
To cast a path, URL or connection to a FastaFile
, pass it to
the FastaFile
constructor.
Michael Lawrence
These functions are implemented by the Biostrings
writeXStringSet
(export) and the
readDNAStringSet
family of functions
(import).
See export-methods in the BSgenome package for exporting a BSgenome object as a FASTA file.
Gets the identifiers of the loaded genome browser drivers.
genomeBrowsers(where = topenv(parent.frame()))
genomeBrowsers(where = topenv(parent.frame()))
where |
The environment in which to search for drivers. |
This searches the specified environment for classes that extend
BrowserSession
. The prefix of the class name,
e.g. "ucsc" in "UCSCSession", is returned for each driver.
A character vector of driver identifiers.
Michael Lawrence
browseGenome
and browserSession
that create browserSession
implementations given an identifier
returned from this function.
The rtracklayer
package adds
convenience methods on top of GenomicRanges
and IntegerRangesList
to manipulate data on genomic ranges.
In the code snippets below,
x
is a GenomicRanges
or IntegerRangesList
object.
chrom(x), chrom(x) <- value
: Gets or
sets the chromosome names for x
. The length of value
should equal the length of x
.
score(x)
: Gets the “score” column from the element
metadata of a GenomicRanges
or GRangesList
. Many
track formats have a score column, so this is often used during
export. The ANY
fallback for this method simply returns
NULL
.
GenomicData(ranges, ..., strand = NULL, chrom = NULL,
genome = NULL)
: Constructs a GRanges
instance with
the given ranges
and variables in ...
(see the
GRanges
constructor).
If non-NULL
, the strand
argument specifies the strand
of each range. It should be a character vector or factor of length
equal to that of ranges
. All values should be either -
,
+
, or *
. To get the levels for strand
, call
levels(strand())
.
chrom
argument is analogous to seqnames
in the
GRanges
constructor.
The genome
argument should be a scalar string. See the examples.
Michael Lawrence and Patrick Aboyoun
range1 <- IRanges(c(1,2,3), c(5,2,8)) ## with some data ## filter <- c(1L, 0L, 1L) score <- c(10L, 2L, NA) strand <- factor(c("+", NA, "-"), levels = levels(strand())) ## GRanges instance gr <- GenomicData(range1, score, chrom = "chr1", genome = "hg18") mcols(gr)[["score"]] strand(gr) ## all '*' gr <- GenomicData(range1, score, filt = filter, strand = strand, chrom = "chr1") mcols(gr)[["filt"]] strand(gr) ## equal to 'strand' ## coercion from data.frame ## df <- as.data.frame(gr)
range1 <- IRanges(c(1,2,3), c(5,2,8)) ## with some data ## filter <- c(1L, 0L, 1L) score <- c(10L, 2L, NA) strand <- factor(c("+", NA, "-"), levels = levels(strand())) ## GRanges instance gr <- GenomicData(range1, score, chrom = "chr1", genome = "hg18") mcols(gr)[["score"]] strand(gr) ## all '*' gr <- GenomicData(range1, score, filt = filter, strand = strand, chrom = "chr1") mcols(gr)[["filt"]] strand(gr) ## equal to 'strand' ## coercion from data.frame ## df <- as.data.frame(gr)
Convenience constructor of a RangedSelection object for selecting a data on a per-chromosome basis for a given genome.
GenomicSelection(genome, chrom = NULL, colnames = character(0))
GenomicSelection(genome, chrom = NULL, colnames = character(0))
genome |
A string identifying a genome. Should match the end of a BSgenome package name, e.g. "hg19". |
chrom |
Character vector naming chromosomes to select. |
colnames |
The column names to select from the dataset. |
A RangedSelection object, selecting entire chromosomes
Michael Lawrence
RangedSelection
, BigWigSelection
# every chromosome from hg19 GenomicSelection("hg19") # chr1 and 2 from hg19, with a score column GenomicSelection("hg19", c("chr1", "chr2"), "score")
# every chromosome from hg19 GenomicSelection("hg19") # chr1 and 2 from hg19, with a score column GenomicSelection("hg19", c("chr1", "chr2"), "score")
These functions support the import and export of the GFF format, of which there are three versions and several flavors.
## S4 method for signature 'GFFFile,ANY,ANY' import(con, format, text, version = c("", "1", "2", "3"), genome = NA, colnames = NULL, which = NULL, feature.type = NULL, sequenceRegionsAsSeqinfo = FALSE) import.gff(con, ...) import.gff1(con, ...) import.gff2(con, ...) import.gff3(con, ...) ## S4 method for signature 'ANY,GFFFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,GFFFile,ANY' export(object, con, format, version = c("1", "2", "3"), source = "rtracklayer", append = FALSE, index = FALSE) ## S4 method for signature 'GenomicRangesList,GFFFile,ANY' export(object, con, format, ...) export.gff(object, con, ...) export.gff1(object, con, ...) export.gff2(object, con, ...) export.gff3(object, con, ...)
## S4 method for signature 'GFFFile,ANY,ANY' import(con, format, text, version = c("", "1", "2", "3"), genome = NA, colnames = NULL, which = NULL, feature.type = NULL, sequenceRegionsAsSeqinfo = FALSE) import.gff(con, ...) import.gff1(con, ...) import.gff2(con, ...) import.gff3(con, ...) ## S4 method for signature 'ANY,GFFFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,GFFFile,ANY' export(object, con, format, version = c("1", "2", "3"), source = "rtracklayer", append = FALSE, index = FALSE) ## S4 method for signature 'GenomicRangesList,GFFFile,ANY' export(object, con, format, ...) export.gff(object, con, ...) export.gff1(object, con, ...) export.gff2(object, con, ...) export.gff3(object, con, ...)
con |
A path, URL, connection or |
object |
The object to export, should be a |
format |
If not missing, should be one of “gff”, “gff1” “gff2”, “gff3”, “gvf”, or “gtf”. |
version |
If the format is given as “gff”, i.e., it does
not specify a version, then this should indicate the GFF version as
one of “” (for import only, from the |
text |
If |
genome |
The identifier of a genome, or a |
colnames |
A character vector naming the columns to parse. These
should name either fixed fields, like |
which |
A |
feature.type |
|
sequenceRegionsAsSeqinfo |
If |
source |
The value for the source column in GFF. This is typically the name of the package or algorithm that generated the feature. |
index |
If |
append |
If |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
The Generic Feature Format (GFF) format is a tab-separated table of intervals. There are three different versions of GFF, and they all have the same number of columns. In GFF1, the last column is a grouping factor, whereas in the later versions the last column holds application-specific attributes, with some conventions defined for those commonly used. This attribute support facilitates specifying extensions to the format. These include GTF (Gene Transfer Format, an extension of GFF2) and GVF (Genome Variation Format, an extension of GFF3). The rtracklayer package recognizes the “gtf” and “gvf” extensions and parses the extra attributes into columns of the result; however, it does not perform any extension-specific processing. Both GFF1 and GFF2 have been proclaimed obsolete; however, the UCSC Genome Browser only supports GFF1 (and GTF), and GFF2 is still in broad use.
GFF is distinguished from the simpler BED format by its flexible
attribute support and its hierarchical structure, as specified by the
group
column in GFF1 (only one level of grouping) and the
Parent
attribute in GFF3. GFF2 does not specify a convention
for representing hierarchies, although its GTF extension provides this
for gene structures. The combination of support for hierarchical data
and arbitrary descriptive attributes makes GFF(3) the preferred format
for representing gene models.
Although GFF features a score
column, large quantitative data
belong in a format like BigWig and alignments from
high-throughput experiments belong in
BAM. For variants, the VCF format (supported
by the VariantAnnotation package) seems to be more widely adopted than
the GVF extension.
A note on the UCSC track line metaformat: track lines are a means for
passing hints to visualization tools like the UCSC Genome Browser and
the Integrated Genome Browser (IGB), and they allow multiple tracks to
be concatenated in the same file. Since GFF is not a UCSC format, it
is not common to annotate GFF data with track lines, but rtracklayer
still supports it. To export or import GFF data in the track line
format, call export.ucsc
or import.ucsc
.
The following is the mapping of GFF elements to a GRanges
object.
NA values are allowed only where indicated.
These appear as a “.” in the file. GFF requires that all columns
are included, so export
generates defaults for missing columns.
the ranges
component.
character vector in the source
column; defaults to “rtracklayer” on export.
character vector in the type
column; defaults
to “sequence_feature” in the output, i.e., SO:0000110.
numeric vector (NA's allowed) in the score
column, accessible via the score
accessor; defaults
to NA
upon export.
strand factor (NA's allowed) in the strand
column, accessible via the strand
accessor; defaults
to NA
upon export.
integer vector, either 0, 1 or 2 (NA's allowed);
defaults to NA
upon export.
a factor (GFF1 only); defaults to the seqid
(e.g., chromosome) on export.
In GFF versions 2 and 3, attributes map to arbitrary columns in the
result. In GFF3, some attributes (Parent
, Alias
,
Note
, DBxref
and Ontology_term
) can have
multiple, comma-separated values; these columns are thus always
CharacterList
objects.
A GRanges
with the metadata columns described in the details.
The GFFFile
class extends BiocFile
and is a formal represention of a resource in the GFF format.
To cast a path, URL or connection to a GFFFile
, pass it to
the GFFFile
constructor. The GFF1File
, GFF2File
,
GFF3File
, GVFFile
and GTFFile
classes all extend
GFFFile
and indicate a particular version of the format.
It has the following utility methods:
genome
: Gets the genome identifier from
the “genome-build” header directive.
Michael Lawrence
test_path <- system.file("tests", package = "rtracklayer") test_gff3 <- file.path(test_path, "genes.gff3") ## basic import test <- import(test_gff3) test ## import.gff functions import.gff(test_gff3) import.gff3(test_gff3) ## GFFFile derivatives test_gff_file <- GFF3File(test_gff3) import(test_gff_file) test_gff_file <- GFFFile(test_gff3) import(test_gff_file) test_gff_file <- GFFFile(test_gff3, version = "3") import(test_gff_file) ## from connection test_gff_con <- file(test_gff3) test <- import(test_gff_con, format = "gff") ## various arguments import(test_gff3, genome = "hg19") import(test_gff3, colnames = character()) import(test_gff3, colnames = c("type", "geneName")) ## 'which' which <- GRanges("chr10:90000-93000") import(test_gff3, which = which) ## Not run: ## 'append' test_gff3_out <- file.path(tempdir(), "genes.gff3") export(test[seqnames(test) == "chr10"], test_gff3_out) export(test[seqnames(test) == "chr12"], test_gff3_out, append = TRUE) import(test_gff3_out) ## 'index' export(test, test_gff3_out, index = TRUE) test_bed_gz <- paste(test_gff3_out, ".gz", sep = "") import(test_bed_gz, which = which) ## End(Not run)
test_path <- system.file("tests", package = "rtracklayer") test_gff3 <- file.path(test_path, "genes.gff3") ## basic import test <- import(test_gff3) test ## import.gff functions import.gff(test_gff3) import.gff3(test_gff3) ## GFFFile derivatives test_gff_file <- GFF3File(test_gff3) import(test_gff_file) test_gff_file <- GFFFile(test_gff3) import(test_gff_file) test_gff_file <- GFFFile(test_gff3, version = "3") import(test_gff_file) ## from connection test_gff_con <- file(test_gff3) test <- import(test_gff_con, format = "gff") ## various arguments import(test_gff3, genome = "hg19") import(test_gff3, colnames = character()) import(test_gff3, colnames = c("type", "geneName")) ## 'which' which <- GRanges("chr10:90000-93000") import(test_gff3, which = which) ## Not run: ## 'append' test_gff3_out <- file.path(tempdir(), "genes.gff3") export(test[seqnames(test) == "chr10"], test_gff3_out) export(test[seqnames(test) == "chr12"], test_gff3_out, append = TRUE) import(test_gff3_out) ## 'index' export(test, test_gff3_out, index = TRUE) test_bed_gz <- paste(test_gff3_out, ".gz", sep = "") import(test_bed_gz, which = which) ## End(Not run)
These functions assist in the creation of
Seqinfo
or
GRanges
for a genome.
GRangesForUCSCGenome(genome, chrom = NULL, ranges = NULL, ...) GRangesForBSGenome(genome, chrom = NULL, ranges = NULL, ...) SeqinfoForUCSCGenome(genome) SeqinfoForBSGenome(genome)
GRangesForUCSCGenome(genome, chrom = NULL, ranges = NULL, ...) GRangesForBSGenome(genome, chrom = NULL, ranges = NULL, ...) SeqinfoForUCSCGenome(genome) SeqinfoForBSGenome(genome)
genome |
A string identifying a genome, usually one assigned by UCSC, like "hg19". |
chrom |
A character vector of chromosome names, or |
ranges |
A |
... |
Additional arguments to pass to
the |
The genome ID is stored in the metadata of the ranges and is
retrievable via the genome
function. The sequence
lengths are also properly initialized for the genome. This mitigates
the possibility of accidentally storing intervals for the wrong
genome.
GRangesForUCSCGenome
obtains sequence information from the UCSC
website, while GRangesForBSGenome
looks for it in an
installed BSGenome
package. Using the latter is more efficient
in the long-run, but requires downloading and installing a potentially
large genome package, or creating one from scratch if it does not yet
exist for the genome of interest.
For the GRangesFor*
functions, a GRanges
object, with the
appropriate seqlengths
and
genome
ID.
The SeqinfoFor*
functions return a Seqinfo
for the
indicated genome.
Michael Lawrence
A UCSC track line for graphical tracks.
Objects can be created by calls of the form new("GraphTrackLine",
...)
or parsed from a character vector track line with
as(text, "GraphTrackLine")
or converted from a
BasicTrackLine
using as(basic, "GraphTrackLine")
.
altColor
:Object of class "integer"
giving an
alternate color, as from col2rgb
.
autoScale
:Object of class "logical"
indicating
whether to automatically scale to min/max of the data.
alwaysZero
:Object of class "logical"
indicating
whether to fix the lower limit of the Y axis at zero.
gridDefault
:Object of class "logical"
indicating whether a grid should be drawn.
maxHeightPixels
:Object of class "numeric"
of
length three (max, default, min), giving the allowable range for
the vertical height of the graph.
graphType
:Object of class "character"
,
specifying the graph type, either "bar" or "points".
viewLimits
:Object of class "numeric"
and of
length two specifying the data range (min, max) shown in the graph.
yLineMark
:Object of class "numeric"
giving the
position of a horizontal line.
yLineOnOff
:Object of class "logical"
indicating whether the yLineMark
should be visible.
windowingFunction
:Object of class "character"
,
one of "maximum", "mean", "minimum", for removing points when the
graph shrinks.
smoothingWindow
:Object of class "numeric"
giving the window size of a smoother to pass over the graph.
type
:Scalar "character"
indicating the type of
the track, either "wig"
or "bedGraph"
.
name
:Object of class "character"
specifying
the name of the track.
description
:Object of class "character"
describing the track.
visibility
:Object of class "character"
indicating the default visible mode of the track, see
UCSCTrackModes
.
color
:Object of class "integer"
representing the
track color (as from col2rgb
).
priority
:Object of class "numeric"
specifying
the rank of this track.
Class "TrackLine"
, directly.
Export line to its string representation.
"BasicTrackLine"
)Convert this line to a basic UCSC track line, using defaults for slots not held in common.
Michael Lawrence
Official documentation: https://genome.ucsc.edu/goldenPath/help/wiggle.html.
export.wig
, export.bedGraph
for exporting
graphical tracks.
Genomic coordinates are often specified in terms of a genome identifier,
chromosome name, start position and end
position. The rtracklayer
package adds convenience methods to
IntegerRangesList
for the manipulation of genomic ranges.
The spaces (or names) of IntegerRangesList
are the chromosome names.
The universe
slot indicates the genome, usually as given by UCSC
(e.g. “hg18”).
In the code snippets below,
x
is a IntegerRangesList
object.
chrom(x), chrom(x) <- value
: Gets or sets the chromosome
names for x
. This is an alias for
names(x)
.
Michael Lawrence
A reimplementation of the UCSC liftover tool for lifting features from one genome build to another. In our preliminary tests, it is significantly faster than the command line tool. Like the UCSC tool, a chain file is required input.
liftOver(x, chain, ...)
liftOver(x, chain, ...)
x |
The intervals to lift-over, usually
a |
chain |
A |
... |
Arguments for methods. |
A GRangesList
object. Each element contains the ranges mapped
from the corresponding element in the input (may be one-to-many).
Michael Lawrence
https://genome.ucsc.edu/cgi-bin/hgLiftOver
## Not run: chain <- import.chain("hg19ToHg18.over.chain") library(TxDb.Hsapiens.UCSC.hg19.knownGene) tx_hg19 <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene) tx_hg18 <- liftOver(tx_hg19, chain) ## End(Not run)
## Not run: chain <- import.chain("hg19ToHg18.over.chain") library(TxDb.Hsapiens.UCSC.hg19.knownGene) tx_hg19 <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene) tx_hg18 <- liftOver(tx_hg19, chain) ## End(Not run)
The Quickload
class represents a Quickload data source,
essentially directory layout separating tracks and sequences by
genome, along with a few metadata files. This interface abstracts
those details and provides access to a Quickload at any URL supported
by R (HTTP, FTP, and local files). This is an easy way to make data
accessible to the Integrated Genome Browser (IGB).
Quickload(uri = "quickload", create = FALSE)
: Constructs a
new Quickload
object, representing a repository at
uri
. If create
is TRUE
, and uri
is
writeable (i.e., local), the repository is created if it does not
already exist. If it does exist, then a message is emitted to
indicate that the repository was not recreated.
In the code snippets below, x
represents a Quickload
object.
x$genome
, x[["genome"]]
: Get
the QuickloadGenome
object for the genome named
genome
. This is where all the data is stored.
length(x)
: number of genomes in the repository
uri(x)
:
Get the URI pointing to the Quickload repository.
genome(x)
:
Get the identifiers of the genomes present in the repository.
Michael Lawrence
ql <- Quickload(system.file("tests", "quickload", package = "rtracklayer")) uri(ql) genome(ql) ql$T_species_Oct_2011
ql <- Quickload(system.file("tests", "quickload", package = "rtracklayer")) uri(ql) genome(ql) ql$T_species_Oct_2011
A Quickload data source is a collection of tracks and sequences,
separated by genome. This class, QuickloadGenome
provides
direct access to the data for one particular genome.
QuickloadGenome(quickload, genome, create = FALSE,
seqinfo = seqinfo(genome),
title = toString(genome))
: Constructs a
new QuickloadGenome
object, representing genome
in
the repository quickload
(a URI string or a
Quickload
object).
The genome
argument can be an ID corresponding to a
genome (potentially) in quickload
or an installed
BSgenome
package. It can also be any instance of a class
which has methods for organism
and releaseDate
. A
good example is BSgenome
or any other
derivative of GenomeDescription
.
Those items are necessary for constructing the canonical
Quickload genome string (G_Species_Month_Year).
If create
is TRUE
, and the genome does not already
exist, the genome will be created, using seqinfo
for the
sequence lengths and title
for the display name of the
genome in a UI. Creation only works if the repository is local
and writeable. Reasonable defaults are used for seqinfo
and title
when the necessary methods are available (and
they are for BSgenome
).
In the code snippets below, x
and object
represent a
Quickload
object.
seqinfo(x)
, seqinfo(x) <- value
: Gets or sets the
Seqinfo
object indicating the lengths
of the sequences in the genome. No circularity information or genome
identifier is stored.
quickload(x)
: Get the Quickload object that contains this
genome.
uri(x)
: Get the uri pointing to the genome directory in the
Quickload repository
genome(x)
: Get the name of the genome, e.g.
“H_sapiens_Feb_2009”.
releaseDate(x)
: Get the release portion of the genome name,
e.g., “Feb_2009”.
organism(object)
: Get the organism portion of the genome name,
e.g., “H sapiens”.
length(x)
: number of datasets
names(x), trackNames(x)
: names of the datasets
mcols(x)
: merged metadata on the datasets
track(x, name), x$name
: get the track called name
track(x, name, format = bestFileFormat(value), ...) <-
value, x$name <- value
: store the track value
under
name
. Note that track storing is only supported
for local repositories, i.e., those with a file://
URI
scheme.
Currently, supported value
types include a
GenomicRanges
, GRangesList
, or a file resource
(copied to the repository). The file resource may be
represented as a path, URL,
BiocFile
or
RsamtoolsFile
. If
not a file name, value
is written in format
. For
generic interval data, this means a BigWig file (if there is a
numeric “score” column) or a BED file otherwise. An
RleList
(e.g., coverage) is output as BigWig. For
UCSCData
values, the format is chosen according to the
type of track line. For RsamtoolsFile
objects, the file
and its index are copied.
The arguments in ...
become attributes in the XML
metadata. The “description” attribute is standard and is
a blurb for describing the track in a UI. For the rest, the
interpretation is up to the client. IGB supports an ever-growing
list; please see its documentation.
referenceSequence(x)
: Get the reference sequence, as a
DNAStringSet
.
referenceSequence(x) <- value
: Set the reference sequence, as a
DNAStringSet
. It is written as a 2bit file. This only works
on local repositories.
Michael Lawrence
tests_dir <- system.file("tests", package = "rtracklayer") ql <- Quickload(file.path(tests_dir, "quickload")) qlg <- QuickloadGenome(ql, "T_species_Oct_2011") seqinfo(qlg) organism(qlg) releaseDate(qlg) names(qlg) mcols(qlg) if (.Platform$OS.type != "windows") { # temporary qlg$bedData } ## Not run: ## populating the test repository ql <- Quickload(file.path(tests_dir, "quickload"), create = TRUE) reference_seq <- import(file.path(tests_dir, "test.2bit")) names(reference_seq) <- "test" qlg <- QuickloadGenome(ql, "T_species_Oct_2011", create = TRUE, seqinfo = seqinfo(reference_seq)) referenceSequence(qlg) <- reference_seq test_bed <- import(file.path(tests_dir, "test.bed")) names(test_bed) <- "test" qlg$bedData <- test_bed test_bedGraph <- import(file.path(tests_dir, "test.bedGraph")) names(test_bedGraph) <- "test" start(test_bedGraph) <- seq(1, 90, 10) width(test_bedGraph) <- 10 track(qlg, "bedGraphData", format = "bw") <- test_bedGraph ## End(Not run)
tests_dir <- system.file("tests", package = "rtracklayer") ql <- Quickload(file.path(tests_dir, "quickload")) qlg <- QuickloadGenome(ql, "T_species_Oct_2011") seqinfo(qlg) organism(qlg) releaseDate(qlg) names(qlg) mcols(qlg) if (.Platform$OS.type != "windows") { # temporary qlg$bedData } ## Not run: ## populating the test repository ql <- Quickload(file.path(tests_dir, "quickload"), create = TRUE) reference_seq <- import(file.path(tests_dir, "test.2bit")) names(reference_seq) <- "test" qlg <- QuickloadGenome(ql, "T_species_Oct_2011", create = TRUE, seqinfo = seqinfo(reference_seq)) referenceSequence(qlg) <- reference_seq test_bed <- import(file.path(tests_dir, "test.bed")) names(test_bed) <- "test" qlg$bedData <- test_bed test_bedGraph <- import(file.path(tests_dir, "test.bedGraph")) names(test_bedGraph) <- "test" start(test_bedGraph) <- seq(1, 90, 10) width(test_bedGraph) <- 10 track(qlg, "bedGraphData", format = "bw") <- test_bedGraph ## End(Not run)
Reads a file in GFF format and creates a data frame or DataFrame object from it. This is a low-level function that should not be called by user code.
readGFF(filepath, version=0, columns=NULL, tags=NULL, filter=NULL, nrows=-1, raw_data=FALSE) GFFcolnames(GFF1=FALSE)
readGFF(filepath, version=0, columns=NULL, tags=NULL, filter=NULL, nrows=-1, raw_data=FALSE) GFFcolnames(GFF1=FALSE)
filepath |
A single string containing the path or URL to the file to read. Alternatively can be a connection. |
version |
|
columns |
The standard GFF columns to load. All of them are loaded by default. |
tags |
The tags to load. All of them are loaded by default. |
filter |
|
nrows |
|
raw_data |
|
GFF1 |
A DataFrame with columns corresponding to those in the GFF.
H. Pages
makeGRangesFromDataFrame
in the
GenomicRanges package for making a
GRanges object from a data frame or
DataFrame object.
makeTxDbFromGFF
in the
txdbmaker package for importing a GFF file as a
TxDb object.
The DataFrame class in the S4Vectors package.
## Standard GFF columns. GFFcolnames() GFFcolnames(GFF1=TRUE) # "group" instead of "attributes" tests_dir <- system.file("tests", package="rtracklayer") test_gff3 <- file.path(tests_dir, "genes.gff3") ## Load everything. df0 <- readGFF(test_gff3) head(df0) ## Load some tags only (in addition to the standard GFF columns). my_tags <- c("ID", "Parent", "Name", "Dbxref", "geneID") df1 <- readGFF(test_gff3, tags=my_tags) head(df1) ## Load no tags (in that case, the "attributes" standard column ## is loaded). df2 <- readGFF(test_gff3, tags=character(0)) head(df2) ## Load some standard GFF columns only (in addition to all tags). my_columns <- c("seqid", "start", "end", "strand", "type") df3 <- readGFF(test_gff3, columns=my_columns) df3 table(df3$seqid, df3$type) makeGRangesFromDataFrame(df3, keep.extra.columns=TRUE) ## Combine use of 'columns' and 'tags' arguments. readGFF(test_gff3, columns=my_columns, tags=c("ID", "Parent", "Name")) readGFF(test_gff3, columns=my_columns, tags=character(0)) ## Use the 'filter' argument to load only features of type "gene" ## or "mRNA" located on chr10. my_filter <- list(type=c("gene", "mRNA"), seqid="chr10") readGFF(test_gff3, filter=my_filter) readGFF(test_gff3, columns=my_columns, tags=character(0), filter=my_filter)
## Standard GFF columns. GFFcolnames() GFFcolnames(GFF1=TRUE) # "group" instead of "attributes" tests_dir <- system.file("tests", package="rtracklayer") test_gff3 <- file.path(tests_dir, "genes.gff3") ## Load everything. df0 <- readGFF(test_gff3) head(df0) ## Load some tags only (in addition to the standard GFF columns). my_tags <- c("ID", "Parent", "Name", "Dbxref", "geneID") df1 <- readGFF(test_gff3, tags=my_tags) head(df1) ## Load no tags (in that case, the "attributes" standard column ## is loaded). df2 <- readGFF(test_gff3, tags=character(0)) head(df2) ## Load some standard GFF columns only (in addition to all tags). my_columns <- c("seqid", "start", "end", "strand", "type") df3 <- readGFF(test_gff3, columns=my_columns) df3 table(df3$seqid, df3$type) makeGRangesFromDataFrame(df3, keep.extra.columns=TRUE) ## Combine use of 'columns' and 'tags' arguments. readGFF(test_gff3, columns=my_columns, tags=c("ID", "Parent", "Name")) readGFF(test_gff3, columns=my_columns, tags=character(0)) ## Use the 'filter' argument to load only features of type "gene" ## or "mRNA" located on chr10. my_filter <- list(type=c("gene", "mRNA"), seqid="chr10") readGFF(test_gff3, filter=my_filter) readGFF(test_gff3, columns=my_columns, tags=character(0), filter=my_filter)
Methods for loading sequences.
No methods are defined by rtracklayer for the
sequence(object, ...) <- value
generic.
These methods support the import and export of
Rsamtools:TabixFile
TabixFile objects. These are generally
useful when working with tabix-indexed files that have a non-standard
format (i.e., not BED nor GFF), as well as exporting an object with
arbitrary columns (like a GRanges) to an indexed, tab-separated
file. This relies on the tabix header, which indicates the columns in
the file that correspond to the chromosome, start and end. The BED and
GFF parsers handle tabix transparently.
## S4 method for signature 'TabixFile,ANY,ANY' import(con, format, text, which = if (is.na(genome)) NULL else as(seqinfoForGenome(genome), "GenomicRanges"), genome = NA, header = TRUE, ...) exportToTabix(object, con, ...)
## S4 method for signature 'TabixFile,ANY,ANY' import(con, format, text, which = if (is.na(genome)) NULL else as(seqinfoForGenome(genome), "GenomicRanges"), genome = NA, header = TRUE, ...) exportToTabix(object, con, ...)
con |
For |
object |
The object to export. It is coerced to a
|
format |
If any known format, like “bed” or “gff” (or one of their variants), then the appropriate parser is applied. If any other value, then the tabix header is consulted for the format. By default, this is taken from the file extension. |
text |
Ignored. |
which |
A range data structure coercible to |
genome |
The identifier of a genome, or |
header |
If |
... |
Extra arguments to pass to the underlying import routine,
which for non-standard formats is |
For import
, a GRanges
object.
For exportToTabix
, a TabixFile
object that is directly
passable to import
.
Michael Lawrence
http://samtools.sourceforge.net/tabix.shtml
scanTabix
and friends
A data frame of human microRNA target sites retrieved from MiRBase. This is a
subset of the hsTargets
data frame in the microRNA
package. See the rtracklayer
vignette for more details.
data(targets)
data(targets)
A data frame with 2981 observations on the following 6 variables.
name
The miRBase ID of the microRNA.
target
The Ensembl ID of the targeted transcript.
chrom
The name of the chromosome for target site.
start
Target start position.
end
Target stop position.
strand
The strand of the target site, "+"
, or
"-"
.
The microRNA
package, dataset hsTargets
. Originally
MiRBase (http://microrna.sanger.ac.uk/).
data(targets) targetTrack <- with(targets, GenomicData(IRanges::IRanges(start, end), strand = strand, chrom = chrom))
data(targets) targetTrack <- with(targets, GenomicData(IRanges::IRanges(start, end), strand = strand, chrom = chrom))
Methods for loading tracks into genome browsers.
track(object, ...) <- value
track(object, ...) <- value
object |
A |
value |
The track(s) to load. |
... |
Arguments to pass on to methods. Can be:
|
track
for getting a track from a session.
## Not run: session <- browserSession() track <- import(system.file("tests", "v1.gff", package = "rtracklayer")) track(session, "My Track") <- track ## End(Not run)
## Not run: session <- browserSession() track <- import(system.file("tests", "v1.gff", package = "rtracklayer")) track(session, "My Track") <- track ## End(Not run)
The TrackDb
class is an abstraction around a database of
tracks. Implementations include BrowserSession
derivatives and QuickloadGenome
. Here, a track is
defined as an interval dataset.
Every implementation should support these methods:
length(x)
: number of tracks
names(x)
, trackNames(x)
: names of the tracks
mcols(x)
: merged metadata on the tracks
track(x, name)
, x$name
, x[[name]]
:
get the track called name
track(x, name) <- value
, x$name <- value
,
x[[name]] <- value
: store the track value
under
name
. Different implementations will support different
types for value
. Generally, an interval data structure like
GenomicRanges
.
Michael Lawrence
The TrackHub
class represents a TrackHub data source,
essentially directory layout separating tracks and sequences by
genome, along with a few metadata files. This interface abstracts
those details and provides access to a TrackHub at any URL supported
by R (HTTP, FTP, and local files). This is an easy way to make data
accessible to the UCSC Genome Browser.
TrackHub(uri, create = FALSE)
: Constructs a
new TrackHub
object, representing a repository at
uri
. If create
is TRUE
, and uri
is
writeable (i.e., local), the repository is created if it does not
already exist. If it does exist, then a message is emitted to
indicate that the repository was not recreated.
In the code snippets below, x
represents a TrackHub
object.
x$genome
, x[["genome"]]
: Get
the TrackHubGenome
object for the genome named
genome
.
length(x)
: number of genomes in the repository.
uri(x)
:
Get the URI pointing to the TrackHub repository.
genome(x)
:
Get the identifiers of the genomes present in the repository.
writeTrackHub(x)
:
Write hub content and genomes from memory representation to the hub file and genomes file.
It also create resources if they are missing like genomes file and genome directory for
newly add genome.
Note that all storing methods(like hub()<-
) are only supported for local repositories, i.e., those with a file:// URI scheme.
hub(x)
: get the value of hub.
hub(x) <- value
: store the value
of hub for x
.
shortLabel(x)
: get the value of hub.
shortLabel(x) <- value
: store the value
of shortLabel for x
.
longLabel(x)
: get the value of hub.
longLabel(x) <- value
: store the value
of longLabel for x
.
genomeFile(x)
: get the value of hub.
genomeFile(x) <- value
: store the value
of genomesFile for x
.
email(x)
: get the value of hub.
email(x) <- value
: store the value
of email for x
.
descriptionUrl(x)
: get the value of hub.
descriptionUrl(x) <- value
: store the value
of descriptionUrl for x
.
genomeField(x, name, field)
: Get the value
of field
for name
genome.
genomeField(x, name, field) <- value
: Set or Update the field
and value
for name
genome.
genomeInfo(x, name)
: Get the Genome
object for name
genome.
genomeInfo(x) <- value
: Add value
(Genome object) to existing genomes list.
Genome
takes named arguemnts of all UCSC supported fields for genome
file(like genome, trackDb, twoBitPath
, etc).
Michael Lawrence
th <- TrackHub(system.file("tests", "trackhub", package = "rtracklayer")) uri(th) genome(th) length(th) th$hg19 th[["hg19"]] hub(th) email(th) ## Not run: hub(th) <- "new_hub" writeTrackHub(th) ## End(Not run)
th <- TrackHub(system.file("tests", "trackhub", package = "rtracklayer")) uri(th) genome(th) length(th) th$hg19 th[["hg19"]] hub(th) email(th) ## Not run: hub(th) <- "new_hub" writeTrackHub(th) ## End(Not run)
A TrackHub data source is a collection of tracks and sequences,
separated by genome. This class, TrackHubGenome
provides
direct access to the data for one particular genome.
TrackHubGenome(trackhub, genome, create = FALSE
: Constructs a
new TrackHubGenome
object, representing genome
in
the repository trackhub
(a URI string or a
TrackHub
object).
The genome
argument can be an ID corresponding to a
genome (potentially) in trackhub
or an installed
BSgenome
package.
If create
is TRUE
, and the trackDb file does not
already exist, it will be created.
Creation only works if the repository is local and
writeable.
In the code snippets below, x
represent a TrackHubGenome
object.
uri(x)
: Get the uri pointing to the genome directory in the
TrackHub repository.
genome(x)
: Get the name of the genome, e.g.
“hg19”.
length(x)
: number of tracks
names(x), trackNames(x)
: names of the tracks
getTracks(x)
: Get the List
of Track
from the tracks
trackhub(x)
: Get the TrackHub object that contains this
genome.
organism(x)
: Get the organism name for this genome,
e.g., “H sapiens”.
trackField(x, name, field)
: Get the value
of field
for name
track.
trackField(x, name, field) <- value
: Store the field
and value
for name
track.
writeTrackHub(x)
:
Write tracks from memory representation to the trackDb file.
track(x, name), x$name
: get the track called name
track(x, name, format = bestFileFormat(value)) <-
value, x$name <- value
: store the track value
under
name
. Note that track storing is only supported
for local repositories, i.e., those with a file://
URI
scheme.
Currently, supported value
types include a
GenomicRanges
, GRangesList
, or a file resource
(copied to the repository). The file resource may be
represented as a path, URL, BiocFile
or
RsamtoolsFile
. If
not a file name, value
is written in format
. For
generic interval data, this means a BigWig file (if there is a
numeric “score” column) or a BED file otherwise. An
RleList
(e.g., coverage) is output as BigWig. For
UCSCData
values, the format is chosen according to the
type of track line. For RsamtoolsFile
objects, the file
and its index are copied.
referenceSequence(x)
: Get the reference sequence, as a
DNAStringSet
.
referenceSequence(x) <- value
: Set the reference sequence, as a
DNAStringSet
. It is written as a 2bit file. This only works
on local repositories.
Michael Lawrence
tests_dir <- system.file("tests", package = "rtracklayer") th <- TrackHub(file.path(tests_dir, "trackhub")) thg <- TrackHubGenome(th, "hg19") length(thg) organism(thg) names(thg) ## Not run: th <- TrackHub(file.path(tests_dir, "trackhub"), create = TRUE) genomesFile(th) <- "genomes.txt" genomeInfo(th) <- Genome(genome = "hg38", trackDb = "hg38/trackDb.txt") genomeField(th, "hg38", "twoBitPath") <- "hg38/seq.2bit" writeTrackHub(th) thg <- TrackHubGenome(th, "hg38", create = TRUE) seq <- import(file.path(tests_dir, "test.2bit")) referenceSequence(thg) <- seq track(thg, "PeaksData") <- paste0(tests_dir, "/test.bigWig") trackField(thg, "wgEncodeUWDukeDnaseGM12878FdrPeaks", "bigDataUrl") <- "hg38/wgEncodeCshlShortRnaSeq.bigWig" trackField(thg, "wgEncodeUWDukeDnaseGM12878FdrPeaks", "color") <- "8,104,172" writeTrackHub(thg) ## End(Not run)
tests_dir <- system.file("tests", package = "rtracklayer") th <- TrackHub(file.path(tests_dir, "trackhub")) thg <- TrackHubGenome(th, "hg19") length(thg) organism(thg) names(thg) ## Not run: th <- TrackHub(file.path(tests_dir, "trackhub"), create = TRUE) genomesFile(th) <- "genomes.txt" genomeInfo(th) <- Genome(genome = "hg38", trackDb = "hg38/trackDb.txt") genomeField(th, "hg38", "twoBitPath") <- "hg38/seq.2bit" writeTrackHub(th) thg <- TrackHubGenome(th, "hg38", create = TRUE) seq <- import(file.path(tests_dir, "test.2bit")) referenceSequence(thg) <- seq track(thg, "PeaksData") <- paste0(tests_dir, "/test.bigWig") trackField(thg, "wgEncodeUWDukeDnaseGM12878FdrPeaks", "bigDataUrl") <- "hg38/wgEncodeCshlShortRnaSeq.bigWig" trackField(thg, "wgEncodeUWDukeDnaseGM12878FdrPeaks", "color") <- "8,104,172" writeTrackHub(thg) ## End(Not run)
An object representing a "track line" in the UCSC
format. There are two concrete types of track lines:
BasicTrackLine
(used for most types of tracks) and
GraphTrackLine
(used for graphical tracks). This class
only declares the common elements between the two.
Objects can be created by calls of the form new("TrackLine",
...)
or parsed from a character vector track line with
as(text, "TrackLine")
.
But note that UCSC only understands one of the
subclasses mentioned above.
name
:Object of class "character"
specifying
the name of the track.
description
:Object of class "character"
describing the track.
visibility
:Object of class "character"
indicating the default visible mode of the track, see
UCSCTrackModes
.
color
:Object of class "integer"
representing the
track color (as from col2rgb
).
priority
:Object of class "numeric"
specifying
the rank of this track.
Export line to its string representation.
Michael Lawrence
https://genome.ucsc.edu/goldenPath/help/customTrack.html#TRACK for the official documentation.
BasicTrackLine
(used for most types of tracks) and
GraphTrackLine
(used for Wiggle/bedGraph tracks).
Methods for getting and setting track names.
The following methods are defined by rtracklayer for getting
track names via the generic trackNames(object, ...)
.
Get the tracks loaded in the session.
Get the visible tracks according to the modes (all tracks not set to "hide").
Get the visible tracks in the view.
The following methods are defined by rtracklayer for setting
track names via the generic trackNames(object) <- value
.
Sets the tracks that should be
visible in the modes. All specified tracks with mode "hide" in
object
are set to mode "full". Any tracks in object
that
are not specified in the value are set to "hide". No other modes are
changed.
Sets the visible tracks in the view. This opens a new web browser with only the specified tracks visible.
These functions support the import and export of the UCSC 2bit
compressed sequence format. The main advantage is speed of subsequence
retrieval, as it only loads the sequence in the requested
intervals. Compared to the FA format supported by Rsamtools, 2bit
offers the additional feature of masking and also has better support
in Java (and thus most genome browsers). The supporting
TwoBitFile
class is a reference to a TwoBit file.
## S4 method for signature 'TwoBitFile,ANY,ANY' import(con, format, text, which = as(seqinfo(con), "GenomicRanges"), ...) ## S4 method for signature 'TwoBitFile' getSeq(x, which = as(seqinfo(x), "GenomicRanges")) import.2bit(con, ...) ## S4 method for signature 'ANY,TwoBitFile,ANY' export(object, con, format, ...) ## S4 method for signature 'DNAStringSet,TwoBitFile,ANY' export(object, con, format) ## S4 method for signature 'DNAStringSet,character,ANY' export(object, con, format, ...) export.2bit(object, con, ...)
## S4 method for signature 'TwoBitFile,ANY,ANY' import(con, format, text, which = as(seqinfo(con), "GenomicRanges"), ...) ## S4 method for signature 'TwoBitFile' getSeq(x, which = as(seqinfo(x), "GenomicRanges")) import.2bit(con, ...) ## S4 method for signature 'ANY,TwoBitFile,ANY' export(object, con, format, ...) ## S4 method for signature 'DNAStringSet,TwoBitFile,ANY' export(object, con, format) ## S4 method for signature 'DNAStringSet,character,ANY' export(object, con, format, ...) export.2bit(object, con, ...)
con |
A path, URL or |
object , x
|
The object to export, either a |
format |
If not missing, should be “twoBit” or “2bit” (case insensitive). |
text |
Not supported. |
which |
A range data structure coercible to |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
For import, a DNAStringSet
.
TwoBitFile
objectsA TwoBitFile
object, an extension of
BiocFile-class
is a reference to a TwoBit file. To cast
a path, URL or connection to a TwoBitFile
, pass it to the
TwoBitFile
constructor.
A TwoBit file embeds the sequence information, which can be retrieved with the following:
seqinfo(x)
:
Gets the Seqinfo
object indicating
the lengths of the sequences for the intervals in the
file. No circularity or genome information is available.
The 2bit format only suports A, C, G, T and N (via an internal
mask). To export sequences with additional IUPAC ambiguity codes,
first pass the object through
replaceAmbiguities
from the Biostrings
package.
Michael Lawrence
export-methods in the BSgenome package for exporting a BSgenome object as a twoBit file.
test_path <- system.file("tests", package = "rtracklayer") test_2bit <- file.path(test_path, "test.2bit") test <- import(test_2bit) test test_2bit_file <- TwoBitFile(test_2bit) import(test_2bit_file) # the whole file which_range <- IRanges(c(10, 40), c(30, 42)) which <- GRanges(names(test), which_range) import(test_2bit, which = which) seqinfo(test_2bit_file) ## Not run: test_2bit_out <- file.path(tempdir(), "test_out.2bit") export(test, test_2bit_out) ## just a character vector test_char <- as.character(test) export(test_char, test_2bit_out) ## End(Not run)
test_path <- system.file("tests", package = "rtracklayer") test_2bit <- file.path(test_path, "test.2bit") test <- import(test_2bit) test test_2bit_file <- TwoBitFile(test_2bit) import(test_2bit_file) # the whole file which_range <- IRanges(c(10, 40), c(30, 42)) which <- GRanges(names(test), which_range) import(test_2bit, which = which) seqinfo(test_2bit_file) ## Not run: test_2bit_out <- file.path(tempdir(), "test_out.2bit") export(test, test_2bit_out) ## just a character vector test_char <- as.character(test) export(test_char, test_2bit_out) ## End(Not run)
Each track in UCSC has an associated
TrackLine
that contains metadata on the track.
trackLine
:Object of class "TrackLine"
holding track metadata.
export.bed(object, con, variant = c("base",
"bedGraph", "bed15"), color, trackLine = TRUE, ...)
Exports the track and its track line (if
trackLine
is TRUE
) to con
in the Browser
Extended Display (BED) format. The arguments
in ...
are passed to export.ucsc
.
export.bed15(object, con, expNames = NULL, ...)
Exports the track and its track line (if
trackLine
is TRUE
) to con
in the Bed15
format. The data is taken from the columns named in
expNames
, which defaults to the expNames
in the
track line, if any, otherwise all column names. The arguments
in ...
are passed to export.ucsc
.
export.gff(object)
Exports the track and its track line (as a comment) to con
in the
General Feature Format (GFF).
export.ucsc(object, con, subformat, ...)
Exports the track and its track line to con
in the
UCSC meta-format.
as(object, "UCSCData")
Constructs a UCSCData
from a GRanges
instance, by adding a default track line
and ensuring that the sequence/chromosome names are compliant with
UCSC conventions. If there is a numeric score, the track line type
is either "bedGraph" or "wig", depending on the feature
density. Otherwise, "bed" is chosen.
Michael Lawrence
import
and export
for reading and writing
tracks to and from connections (files), respectively.
These functions support the import and export of tracks emucscded
within the UCSC track line metaformat, whereby multiple tracks may be
concatenated within a single file, along with metadata mostly oriented
towards visualization. Any UCSCData
object is
automatically exported in this format, if the targeted format is known
to be compatible. The BED and WIG import methods check for a track
line, and delegate to these functions if one is found. Thus, calling
this API directly is only necessary when importing embedded GFF
(rare), or when one wants to create the track line during the export
process.
## S4 method for signature 'UCSCFile,ANY,ANY' import(con, format, text, subformat = "auto", drop = FALSE, genome = NA, ...) import.ucsc(con, ...) ## S4 method for signature 'ANY,UCSCFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,UCSCFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRangesList,UCSCFile,ANY' export(object, con, format, append = FALSE, index = FALSE, ...) ## S4 method for signature 'UCSCData,UCSCFile,ANY' export(object, con, format, subformat = "auto", append = FALSE, index = FALSE, ...) export.ucsc(object, con, ...)
## S4 method for signature 'UCSCFile,ANY,ANY' import(con, format, text, subformat = "auto", drop = FALSE, genome = NA, ...) import.ucsc(con, ...) ## S4 method for signature 'ANY,UCSCFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,UCSCFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRangesList,UCSCFile,ANY' export(object, con, format, append = FALSE, index = FALSE, ...) ## S4 method for signature 'UCSCData,UCSCFile,ANY' export(object, con, format, subformat = "auto", append = FALSE, index = FALSE, ...) export.ucsc(object, con, ...)
con |
A path, URL, connection or |
object |
The object to export, should be a |
format |
If not missing, should be “ucsc”. |
text |
If |
subformat |
The file format to use for the actual features,
between the track lines. Must be a text-based format that is
compatible with track lines (most are). If an
|
genome |
The identifier of a genome, or |
drop |
If |
append |
If |
index |
If |
... |
Should either specify track line parameters or arguments to pass down to the import and export routine for the subformat. |
The UCSC track line permits the storage of multiple tracks in a single
file by separating them with a so-called “track line”, a line
belonging with the word “track” and containing
various key=value
pairs encoding metadata, most related to
visualization. The standard fields in a track depend on the type of
track being annotated. See TrackLine
and its
derivatives for how these lines are represented in R. The
class UCSCData
is an extension
of GRanges
with a formal slot for a TrackLine
.
Each GRanges
in the returned GenomicRangesList
has the
track line stored in its metadata, under the trackLine
key.
For each track object to be exported, if the object is not a
UCSCData
, and there is no trackLine
element in the
metadata, then a new track line needs to be generated. This happens
through the coercion of object
to UCSCData
. The track line
is initialized to have the appropriate type
parameter for the
subformat, and the required name
parameter is taken from the
name of the track in the input list (if any). Otherwise, the default
is simply “R Track”. The db
parameter (specific to BED
track lines) is taken as genome(object)
if not
NA
. Additional arguments passed to the export routines
override parameters in the provided track line.
If the subformat is either WIG or BEDGraph, and the features are stranded, a separate track will be output in the file for each strand. Neither of those formats encodes the strand and disallow overlapping features (which might occur upon destranding).
A GenomicRangesList
unless drop
is TRUE
and there is only a single track in the file. In that case, the first and
only object is extracted from the list and returned.
The structure of that object depends on the format of the
data. The GenomicRangesList
contains UCSCData
objects.
The UCSCFile
class extends
BiocFile
and is a
formal represention of a resource in the UCSC format.
To cast a path, URL or connection to a UCSCFile
, pass it to
the UCSCFile
constructor.
Michael Lawrence
https://genome.ucsc.edu/goldenPath/help/customTrack.html
Get a data.frame
describing the available UCSC genomes.
ucscGenomes(organism=FALSE)
ucscGenomes(organism=FALSE)
organism |
A |
For populating the organism column, the web url https://genome.ucsc.edu/cgi-bin is scraped for every assembly version to get the scientific name.
A data.frame
with the following columns:
db |
UCSC DB identifier (e.g. "hg18") |
species |
The name of the species (e.g. "Human") |
date |
The date the genome was built |
name |
The official name of the genome build |
organism |
The scientific name of the species (e.g. "Homo sapiens") |
Michael Lawrence
UCSCSession
for details on specifying the
genome.
ucscGenomes()
ucscGenomes()
This is a preliminary class that describes a table in the UCSC database. The description includes the table name, corresponding genome, row count, and a textual description of the format. In the future, we could provide more table information, like the links and sample data frame. This is awaiting a use-case.
In the code snippets below, x
/object
is a
UCSCSchema
object.
genome(x)
: Get the genome for the table.
tableName(x)
: Get the name of the table.
nrow(x)
: Get the number of rows in the table.
Michael Lawrence
## Not run: session <- browserSession() genome(session) <- "mm9" query <- ucscTableQuery(session, "knownGene") schema <- ucscSchema(query) nrow(schema) ## End(Not run)
## Not run: session <- browserSession() genome(session) <- "mm9" query <- ucscTableQuery(session, "knownGene") schema <- ucscSchema(query) nrow(schema) ## End(Not run)
An implementation of BrowserSession
for the UCSC genome browser.
Objects can be created by calls of the form
browserSession("ucsc", url =
"https://genome.ucsc.edu/cgi-bin", ...)
. The arguments in ...
correspond to libcurl options, see
httr_options
. Setting these options may be
useful e.g. for getting past a proxy.
url
:Object of class "character"
holding the
base URL of the UCSC browser.
hguid
:Object of class "numeric"
holding the
user identification code.
views
:Object of class "environment"
containing
a list stored under the name "instances". The list holds the instances of
BrowserView
for this session.
Class "BrowserSession"
, directly.
browserView(object, range =
range(object), track = trackNames(object), ...)
Creates a
BrowserView
of range
with visible
tracks specified by track
. track
may be an instance
of UCSCTrackModes
. Arguments in ...
should
match parameters
to a ucscTrackModes
method for creating a
UCSCTrackModes
instance that will be merged with and
override modes indicated by the track
parameter.
browserViews
(object) Gets the
BrowserView
instances for this session.
range
(x) Gets the
GRanges
last
displayed in this session.
genome(x)
Gets the genome identifier of the session,
i.e. genome(range(x))
.
seqinfo
Gets
the Seqinfo
object with the
lengths of the chromosomes in the currenet genome. No
circularity information is available.
range(x) <- value
Sets value
, usually a
GRanges
object or IntegerRangesList
, as the range
of session x
. Note that
this setting only lasts until a view is created or
manipulated. This mechanism is useful, for example, when treating the UCSC
browser as a database, rather than a genome viewer.
genome(x) <- value
Sets the genome identifier on the range of session x
.
getSeq(object, range, track = "Assembly")
Gets the sequence in range
and track
.
track(object, name = names(track),
format = "auto", ...) <- value
Loads a track, stored under name
and formatted as
format
. The "auto" format resolves to "bed" for qualitative
data. For quantitative data, i.e., data with a numeric score
column, "wig" or "bedGraph" is chosen, depending on how well the
data compresses into wig. The arguments in ...
are passed on to
export.ucsc
, so they could be slots in a
TrackLine
subclass (and thus specify
visual attributes like color) or parameters to pass
on to the export function for format
. The value
may
be either a range object (like a GRanges
) or a file
object (like a BEDFile
).
track(object, name, range = range(object),
table = NULL)
Retrieves a GRanges
with features in
range
from track named name
. Some built-in tracks
have multiple series, each stored in a separate database
table. A specific table may be retrieved by passing its name in
the table
parameter. See tableNames
for a way
to list the available tables.
getTable(object, name, range = base::range(object), table =
NULL)
: Retrieves the table indicated by the track name
and
table
name, over range
, as a data.frame
. See
getTable
.
trackNames
(object)Gets the names of the tracks stored in the session.
ucscTrackModes
(object)Gets the default view modes for the tracks in the session.
Michael Lawrence
browserSession
for creating instances of this class.
The UCSC genome browser is backed by a large database,
which is exposed by the Table Browser web interface. Tracks are
stored as tables, so this is also the mechanism for retrieving tracks. The
UCSCTableQuery
class represents a query against the Table
Browser. Storing the query fields in a formal class facilitates
incremental construction and adjustment of a query.
There are six supported fields for a table query:
The provider should be a session, a genome identifier, or a TrackHub URI.
session
: The UCSCSession
instance from
the tables are retrieved. Although all sessions are based on the
same database, the set of user-uploaded tracks, which are represented
as tables, is not the same, in general.
The name of the specific table to retrieve. May be
NULL
, in which case the behavior depends on how the query
is executed, see below.
A genome identifier, a
GRanges
or
a IntegerRangesList
indicating
the portion of the table to retrieve, in genome coordinates.
Simply specifying the genome string is the easiest way to download
data for the entire genome, and GRangesForUCSCGenome
facilitates downloading data for e.g. an entire chromosome.
The URI of the specific TrackHub
A genome identifier of the specific TrackHub, only need to provide it if the provider is up of TrackHub URI.
Names/accessions of the desired features
A common workflow for querying the UCSC database is to create an
instance of UCSCTableQuery
using the ucscTableQuery
constructor, invoke tableNames
to list the available tables for
a track, and finally to retrieve the desired table either as a
data.frame
via getTable
or as a track
via track
. See the examples.
The reason for a formal query class is to facilitate multiple queries
when the differences between the queries are small. For example, one
might want to query multiple tables within the track and/or same
genomic region, or query the same table for multiple regions. The
UCSCTableQuery
instance can be incrementally adjusted for each
new query. Some caching is also performed, which enhances performance.
ucscTableQuery(x, range = seqinfo(x), table = NULL,
names = NULL, hubUrl = NULL, genome = NULL)
: Creates a UCSCTableQuery
with the
UCSCSession
, genome identifier or TrackHub URI given as x
and
the table name given by the single string table
. range
should
be a genome string identifier, a GRanges
instance or
IntegerRangesList
instance, and it effectively defaults to
genome(x)
. If the genome is missing, it is taken from the
provider. Feature names, such as gene identifiers, may be
passed via names
as a character vector.
Below, object
is a UCSCTableQuery
instance.
track(object)
:
Retrieves the indicated table as a track, i.e. a GRanges
object. Note that not all tables are available as tracks.
getTable(object)
: Retrieves the indicated table as a
data.frame
. Note that not all tables are output in
parseable form, and that UCSC will truncate responses if they
exceed certain limits (usually around 100,000 records). The safest
(and most efficient) bet for large queries is to download the file
via FTP and query it locally.
tableNames(object)
: Gets the names of the tables available
for the provider, table and range specified by the query.
In the code snippets below, x
/object
is a
UCSCTableQuery
object.
genome(x)
, genome(x) <- value
: Gets or sets
the genome identifier (e.g. “hg18”) of the object.
hubUrl(x)
, hubUrl(x) <- value
: Gets or sets
the TrackHub URI.
tableName(x)
, tableName(x) <- value
: Get or
set the single string indicating the name of the table to
retrieve. May be NULL
, in which case the table is
automatically determined.
range(x)
, range(x) <- value
: Get or set the
GRanges
indicating the portion of the table to retrieve in
genomic coordinates. Any missing information, such as the genome
identifier, is filled in using range(browserSession(x))
. It
is also possible to set the genome identifier string or
a IntegerRangesList
.
names(x)
, names(x) <- value
: Get or set the
names of the features to retrieve. If NULL
, this filter is
disabled.
ucscSchema(x)
: Get
the UCSCSchema
object describing the selected table.
ucscTables(genome, track)
: Get the list of tables for the
specified track(e.g. “Assembly”) and genome identifier (e.g. “hg19”).
Here genome
and track
must be a single non-NA string.
Michael Lawrence
## Not run: # query using `session` provider session <- browserSession() genome(session) <- "mm9" ## choose the phastCons30way table for a portion of mm9 chr1 query <- ucscTableQuery(session, table = "phastCons30way", range = GRangesForUCSCGenome("mm9", "chr12", IRanges(57795963, 57815592))) ## list the table names tableNames(query) ## retrieve the track data track(query) # a GRanges object ## get the multiz30waySummary track tableName(query) <- "multiz30waySummary" ## get a data.frame summarizing the multiple alignment getTable(query) # query using `genome identifier` provider query <- ucscTableQuery("hg18", table = "snp129", names = c("rs10003974", "rs10087355", "rs10075230")) ucscSchema(query) getTable(query) # query using `TrackHub URI` provider query <- ucscTableQuery("https://ftp.ncbi.nlm.nih.gov/snp/population_frequency/TrackHub/20200227123210/", genome = "hg19", table = "ALFA_GLB") getTable(query) # get the list of tables for 'Assembly' track and 'hg19' genome identifier ucscTables("hg19", "Assembly") ## End(Not run)
## Not run: # query using `session` provider session <- browserSession() genome(session) <- "mm9" ## choose the phastCons30way table for a portion of mm9 chr1 query <- ucscTableQuery(session, table = "phastCons30way", range = GRangesForUCSCGenome("mm9", "chr12", IRanges(57795963, 57815592))) ## list the table names tableNames(query) ## retrieve the track data track(query) # a GRanges object ## get the multiz30waySummary track tableName(query) <- "multiz30waySummary" ## get a data.frame summarizing the multiple alignment getTable(query) # query using `genome identifier` provider query <- ucscTableQuery("hg18", table = "snp129", names = c("rs10003974", "rs10087355", "rs10075230")) ucscSchema(query) getTable(query) # query using `TrackHub URI` provider query <- ucscTableQuery("https://ftp.ncbi.nlm.nih.gov/snp/population_frequency/TrackHub/20200227123210/", genome = "hg19", table = "ALFA_GLB") getTable(query) # get the list of tables for 'Assembly' track and 'hg19' genome identifier ucscTables("hg19", "Assembly") ## End(Not run)
A vector of view modes ("hide", "dense", "full", "pack", "squish") for each track in a UCSC view.
Objects may be created by calls of the form
ucscTrackModes(object =
character(), hide = character(), dense = character(), pack = character(),
squish = character(), full = character())
,
where object
should be a character vector of mode names (with
its names
attribute specifying the corresponding track names).
The other parameters should contain track names that override the
modes in object
. Later parameters override earlier ones, so,
for example, if a track is named in hide
and full
,
it is shown in the full view mode.
.Data
:Object of class "character"
holding the
modes ("hide", "dense", "full", "pack", "squish"), with its
names
attribute holding corresponding track names.
labels
:Object of class "character"
holding
labels (human-readable names) corresponding to each track/mode.
Class "character"
, from data part.
Class "vector"
, by class "character", distance 2.
trackNames(object)
Gets the names of the visible tracks (those that do not have mode "hide").
trackNames(object) <- value
Sets the names of the
visible tracks. Any tracks named in value
are set to "full"
if the are currently set to "hide" in this object. Any tracks not
in value
are set to "hide". All other modes are preserved.
object[i]
Gets the track mode of the tracks indexed
by i
, which can be any type of index supported by character
vector subsetting. If i
is a character vector, it indexes
first by the internal track IDs (the names
on .Data
)
and then by the user-level track names (the labels
slot).
object[i] <- value
Sets the track modes indexed by
i
(in the same way as in object[i]
above) to those
specified in value
.
Michael Lawrence
UCSCView
on which track view modes may be set.
Generics for getting and setting UCSC track visibility modes ("hide", "dense", "full", "pack", "squish").
The following methods are defined by rtracklayer for getting the
track modes through the generic ucscTrackModes(object, ...)
.
function(object, hide = character(),
dense = character(), pack = character(),
squish = character(), full = character())
Creates an instance of UCSCTrackModes
from
object
, a character vector of mode names, with the
corresponding track ids given in the names
attribute. Note
that object
can be a UCSCTrackModes
instance, as
UCSCTrackModes
extends character
.
The other parameters are character vectors identifying the tracks for
each mode and overriding the modes specified by
object
.
The same interface as above, except object
defaults to an empty
character vector.
Gets modes for tracks in the view.
Gets default modes for the tracks in the session. These are the modes that will be used as the default for a newly created view.
The following methods are defined by rtracklayer for setting the
track modes through the generic ucscTrackModes(object) <- value
.
Sets the modes for the tracks in the view.
Sets the modes from a character vector of mode names, with the
corresponding track names given in the names
attribute.
trackNames
and trackNames<-
for just
getting or setting which tracks are visible (not of mode "hide").
# Tracks "foo" and "bar" are fully shown, "baz" is hidden modes <- ucscTrackModes(full = c("foo", "bar"), hide = "baz") # Update the modes to hide track "bar" modes2 <- ucscTrackModes(modes, hide = "bar")
# Tracks "foo" and "bar" are fully shown, "baz" is hidden modes <- ucscTrackModes(full = c("foo", "bar"), hide = "baz") # Update the modes to hide track "bar" modes2 <- ucscTrackModes(modes, hide = "bar")
An object representing a view of a genome in the UCSC browser.
Calling browserView(session, range = range(object), track
= trackNames(object), browse = TRUE, ...)
creates BrowserView
of range
with visible
tracks specified by track
. track
may be an instance of
UCSCTrackModes
. Arguments in ...
should
match parameters to a
ucscTrackModes
method for creating a
UCSCTrackModes
instance that will be merged with and override
modes indicated by the track
parameter. If browse
is
TRUE
(default), automatically launch a browser to display the
view.
hgsid
:Object of class "numeric"
, which
identifies this view to UCSC.
session
:Object of class "BrowserSession"
to
which this view belongs.
Class "BrowserView"
, directly.
activeView(object)
Obtains a logical indicating whether this view is the active view.
range(object)
Obtains the
GRanges
displayed by
this view.
range(object) <- value
Sets the
GRanges
or
IntegerRangesList
displayed by
this view.
trackNames(object)
Gets the names of the visible tracks in this view.
trackNames(object) <- value
Sets the visible tracks by name.
visible(object)
Get a named logical vector indicating whether each track is visible.
visible(object) <- value
Set a logical vector
indicating the visibility of each track, in the same order as
returned by visible(object)
.
ucscTrackModes(object)
Obtains the
UCSCTrackModes
for this view.
ucscTrackModes(object) <- value
Sets the
UCSCTrackModes
for this view. The
value
may be either a UCSCTrackModes
instance or a
character vector that will be coerced by a call to
ucscTrackModes
.
Gets the URL corresponding to the view
x
. Typically used when passing browse=FALSE
to the
constructor in order to display the view in another way.
Michael Lawrence
browserView
for creating instances of this class.
These functions support the import and export of the UCSC WIG (Wiggle) format.
## S4 method for signature 'WIGFile,ANY,ANY' import(con, format, text, genome = NA, trackLine = TRUE, which = NULL, seqinfo = NULL, ...) import.wig(con, ...) ## S4 method for signature 'ANY,WIGFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,WIGFile,ANY' export(object, con, format, dataFormat = c("auto", "variableStep", "fixedStep"), writer = .wigWriter, append = FALSE, ...) ## S4 method for signature 'GenomicRangesList,WIGFile,ANY' export(object, con, format, ...) ## S4 method for signature 'UCSCData,WIGFile,ANY' export(object, con, format, trackLine = TRUE, ...) export.wig(object, con, ...)
## S4 method for signature 'WIGFile,ANY,ANY' import(con, format, text, genome = NA, trackLine = TRUE, which = NULL, seqinfo = NULL, ...) import.wig(con, ...) ## S4 method for signature 'ANY,WIGFile,ANY' export(object, con, format, ...) ## S4 method for signature 'GenomicRanges,WIGFile,ANY' export(object, con, format, dataFormat = c("auto", "variableStep", "fixedStep"), writer = .wigWriter, append = FALSE, ...) ## S4 method for signature 'GenomicRangesList,WIGFile,ANY' export(object, con, format, ...) ## S4 method for signature 'UCSCData,WIGFile,ANY' export(object, con, format, trackLine = TRUE, ...) export.wig(object, con, ...)
con |
A path, URL, connection or |
object |
The object to export, should be a |
format |
If not missing, should be “wig”. |
text |
If |
trackLine |
Whether to parse/output a UCSC track line. An
imported track line will be stored in a |
genome |
The identifier of a genome, or |
seqinfo |
If not |
which |
A range data structure like |
append |
If |
dataFormat |
Probably best left to “auto”. Exists only for historical reasons. |
writer |
Function for writing out the blocks; for internal use only. |
... |
Arguments to pass down to methods to other methods. For
import, the flow eventually reaches the |
The WIG format is a text-based format for efficiently representing a
dense genome-scale score vector. It encodes, for each feature, a range
and score. Features from the same sequence (chromosome) are grouped
together into a block, with a single block header line indicating the
chromosome. There are two block formats: fixed step and variable
step. For fixed step, the number of positions (or step) between
intervals is the same across an entire block. For variable step, the
start position is specified for each feature. For both fixed and
variable step, the span (or width) is specified in the header and thus
must be the same across all features. This requirement of uniform
width dramatically limits the applicability of WIG. For scored
features of variable width, consider BEDGraph or
BigWig, which is generally preferred over both WIG
and BEDGraph. To efficiently convert an existing WIG or BEDGraph file
to BigWig, call wigToBigWig
. Neither WIG, BEDGraph nor
BigWig allow overlapping features.
A GRanges
with the score values in the score
metadata column,
which is accessible via the score
function.
The WIGFile
class extends BiocFile
and is a formal represention of a resource in the WIG format.
To cast a path, URL or connection to a WIGFile
, pass it to
the WIGFile
constructor.
Michael Lawrence
https://genome.ucsc.edu/goldenPath/help/wiggle.html
test_path <- system.file("tests", package = "rtracklayer") test_wig <- file.path(test_path, "step.wig") ## basic import calls test <- import(test_wig) test import.wig(test_wig) test_wig_file <- WIGFile(test_wig) import(test_wig_file) test_wig_con <- file(test_wig) import(test_wig_con, format = "wig") test_wig_con <- file(test_wig) import(WIGFile(test_wig_con)) ## various options import(test_wig, genome = "hg19") import(test_wig, trackLine = FALSE) which <- as(test[3:4,], "IntegerRangesList") import(test_wig, which = which) ## Not run: ## basic export calls test_wig_out <- file.path(tempdir(), "test.wig") export(test, test_wig_out) export.wig(test, test_wig_out) test_foo_out <- file.path(tempdir(), "test.foo") export(test, test_foo_out, format = "wig") test_wig_out_file <- WIGFile(test_wig_out) export(test, test_wig_out_file) ## appending test2 <- test metadata(test2)$trackLine <- initialize(metadata(test)$trackLine, name = "test2") export(test2, test_wig_out_file, append = TRUE) ## passing track line parameters export(test, test_wig_out, name = "test2") ## no track line export(test, test_wig_out, trackLine = FALSE) ## gzip test_wig_gz <- paste(test_wig_out, ".gz", sep = "") export(test, test_wig_gz) ## End(Not run)
test_path <- system.file("tests", package = "rtracklayer") test_wig <- file.path(test_path, "step.wig") ## basic import calls test <- import(test_wig) test import.wig(test_wig) test_wig_file <- WIGFile(test_wig) import(test_wig_file) test_wig_con <- file(test_wig) import(test_wig_con, format = "wig") test_wig_con <- file(test_wig) import(WIGFile(test_wig_con)) ## various options import(test_wig, genome = "hg19") import(test_wig, trackLine = FALSE) which <- as(test[3:4,], "IntegerRangesList") import(test_wig, which = which) ## Not run: ## basic export calls test_wig_out <- file.path(tempdir(), "test.wig") export(test, test_wig_out) export.wig(test, test_wig_out) test_foo_out <- file.path(tempdir(), "test.foo") export(test, test_foo_out, format = "wig") test_wig_out_file <- WIGFile(test_wig_out) export(test, test_wig_out_file) ## appending test2 <- test metadata(test2)$trackLine <- initialize(metadata(test)$trackLine, name = "test2") export(test2, test_wig_out_file, append = TRUE) ## passing track line parameters export(test, test_wig_out, name = "test2") ## no track line export(test, test_wig_out, trackLine = FALSE) ## gzip test_wig_gz <- paste(test_wig_out, ".gz", sep = "") export(test, test_wig_gz) ## End(Not run)
This function calls the Kent C library to efficiently convert a WIG file to a BigWig file, without loading the entire file into memory. This solves the problem where simple tools write out text WIG files, instead of more efficiently accessed binary BigWig files.
wigToBigWig(x, seqinfo, dest = paste(file_path_sans_ext(x, TRUE), "bw", sep = "."), clip = FALSE)
wigToBigWig(x, seqinfo, dest = paste(file_path_sans_ext(x, TRUE), "bw", sep = "."), clip = FALSE)
x |
Path or URL to the WIG file. Connections are not supported. |
seqinfo |
|
dest |
The path to which to write the BigWig file. Defaults to |
clip |
If |
Michael Lawrence
BigWig
import and export support