Package 'ballgown'

Title:	Flexible, isoform-level differential expression analysis
Description:	Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.
Authors:	Jack Fu [aut], Alyssa C. Frazee [aut, cre], Leonardo Collado-Torres [aut], Andrew E. Jaffe [aut], Jeffrey T. Leek [aut, ths]
Maintainer:	Jack Fu <[email protected]>
License:	Artistic-2.0
Version:	2.39.0
Built:	2025-01-28 05:41:53 UTC
Source:	https://github.com/bioc/ballgown

Help Index

The ballgown package for analysis of transcript assemblies
match assembled transcripts to annotated transcripts
Ballgown
constructor function for ballgown objects
load RSEM data into a ballgown object
Toy ballgown object
plot annotated and assembled transcripts together
group a gene's assembled transcripts into clusters
cluster a gene's transcripts and calculate cluster-level expression
determine if one set of GRanges fully contains any of another set of GRanges
extract paths to tablemaker output
extract exon-level expression measurements from ballgown objects
extract expression components from ballgown objects
Replacement method for expr slot in ballgown objects
subset ballgown objects using an expression filter
get gene IDs from a ballgown object
get gene names from a ballgown object
extract a specific field of the "attributes" column of a data frame created from a GTF/GFF file
label assembled transcripts with gene names
extract gene-level expression measurements from ballgown objects
read in GTF/GFF file as a data frame
read in gtf file as GRanges object
extract transcript-level expression measurements from ballgown objects
extract the indexes from ballgown objects
Replace method for indexes slot in ballgown objects
get the last element
extract package version & creation date from ballgown object
calculate percent overlap between two GRanges objects
extract phenotype data from a ballgown object
Replacement method for pData slot in ballgown objects
cluster assembled transcripts and plot the results
visualize transcript abundance by group
visualize structure of assembled transcripts
get names of samples in a ballgown objects
get sequence (chromosome) names from ballgown object
statistical tests for differential expression in ballgown
extract structure components from ballgown objects
subset ballgown objects to specific samples or genomic locations
extract transcript-level expression measurements from ballgown objects
Connect a transcript to its gene
get numeric transcript IDs from a ballgown object
get transcript names from a ballgown object
write files to disk from ballgown object

The ballgown package for analysis of transcript assemblies

Description

Super awesome transcript-level expression analysis

match assembled transcripts to annotated transcripts

Description

match assembled transcripts to annotated transcripts

Usage

annotate_assembly(assembled, annotated)
annotate_assembly(assembled, annotated)

Arguments

`assembled`	`GRangesList` object representing assembled transcripts
`annotated`	`GRangesList` object representing annotated transcripts

Details

If gown is a ballgown object, assembled can be structure(gown)$trans (or any subset). You can generate a GRangesList object containing annotated transcripts from a gtf file using the gffReadGR function and setting splitByTranscripts=TRUE.

Value

data frame, where each row contains assembledInd and annotatedInd (indexes of overlapping transcripts in assembled and annotated), and the percent overlap between the two transcripts.

Author(s)

Alyssa Frazee

Examples

data(bg)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffReadGR(gtfPath, splitByTranscript=TRUE)
info = annotate_assembly(assembled=structure(bg)$trans, annotated=annot)
data(bg)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffReadGR(gtfPath, splitByTranscript=TRUE)
info = annotate_assembly(assembled=structure(bg)$trans, annotated=annot)

Ballgown

Description

S4 class for storing and manipulating expression data from assembled transcriptomes

Slots

expr: tables containing expression data for genomic features (introns, exons, transcripts)
structure: genomic locations of features and their relationships to one another
indexes: tables connecting components of the assembly and providing other experimental information (e.g., phenotype data and locations of read alignment files)
dirs: directories holding data created by tablemaker
mergedDate: date the ballgown object was created
meas: which expression measurement(s) the object contains in its data slot. Vector of one or more of "rcount", "ucount", "mrcount", "cov", "cov_sd", "mcov", "mcov_sd", or "FPKM", if Tablemaker output is used, or one of "TPM" or "FPKM" if RSEM output is used. Can also be "all" for all measurements. See vignette for details.
RSEM: TRUE if object was made from RSEM output, FALSE if object was made from Tablemaker/Cufflinks output.

Author(s)

Alyssa Frazee, Leonardo Collado-Torres, Jeff Leek

Examples

  data(bg)
  class(bg) #"ballgown"
  dim(bg@expr$exon)
  bg@structure$exon
  head(bg@indexes$t2g)
  head(bg@dirs)
  bg@mergedDate
  bg@meas
  bg@RSEM
data(bg)
  class(bg) #"ballgown"
  dim(bg@expr$exon)
  bg@structure$exon
  head(bg@indexes$t2g)
  head(bg@dirs)
  bg@mergedDate
  bg@meas
  bg@RSEM

constructor function for ballgown objects

Description

constructor function for ballgown objects

Usage

ballgown(
  samples = NULL,
  dataDir = NULL,
  samplePattern = NULL,
  bamfiles = NULL,
  pData = NULL,
  verbose = TRUE,
  meas = "all"
)
ballgown(
  samples = NULL,
  dataDir = NULL,
  samplePattern = NULL,
  bamfiles = NULL,
  pData = NULL,
  verbose = TRUE,
  meas = "all"
)

Arguments

`samples`	vector of file paths to folders containing sample-specific ballgown data (generated by `tablemaker`). If `samples` is provided, `dataDir` and `samplePattern` are not used.
`dataDir`	file path to top-level directory containing sample-specific folders with ballgown data in them. Only used if `samples` is NULL.
`samplePattern`	regular expression identifying the subdirectories of\ `dataDir` containing data to be loaded into the ballgown object (and only those subdirectories). Only used if `samples` is NULL.
`bamfiles`	optional vector of file paths to read alignment files for each sample. If provided, make sure to sort properly (e.g., in the same order as `samples`). Default NULL.
`pData`	optional `data.frame` with rows corresponding to samples and columns corresponding to phenotypic variables.
`verbose`	if `TRUE`, print status messages and timing information as the object is constructed.
`meas`	character vector containing either "all" or one or more of: "rcount", "ucount", "mrcount", "cov", "cov_sd", "mcov", "mcov_sd", or "FPKM". The resulting ballgown object will only contain the specified expression measurements, for the appropriate features. See vignette for which expression measurements are available for which features. "all" creates the full object.

Details

Because experimental data is recorded so variably, it is the user's responsibility to format pData correctly. In particular, it's really important that the rows of pData (corresponding to samples) are ordered the same way as samples or the dataDir/samplePattern combo. You can run list.files(path = dataDir, pattern = samplePattern) to see the sample order if samples was not used.

If you are creating a ballgown object for a large experiment, this function may run slowly and use a large amount of RAM. We recommend running this constructor as a batch job and saving the resulting ballgown object as an rda file. The rda file usually has reasonable size on disk, and the object in it shouldn't take up too much RAM when loaded, so the time and memory use in creating the object is a one-time cost.

Value

an object of class ballgown

Author(s)

Leonardo Collado-Torres, Alyssa Frazee

Examples

bg = ballgown(dataDir=system.file('extdata', package='ballgown'),
    samplePattern='sample')
pData(bg) = data.frame(id=sampleNames(bg), group=rep(c(1,0), each=10))
bg = ballgown(dataDir=system.file('extdata', package='ballgown'),
    samplePattern='sample')
pData(bg) = data.frame(id=sampleNames(bg), group=rep(c(1,0), each=10))

load RSEM data into a ballgown object

Description

Loads results of rsem-calculate-expression into a ballgown object for easy visualization, processing, and statistical testing

Usage

ballgownrsem(
  dir = "",
  samples,
  gtf,
  UCSC = TRUE,
  tfield = "transcript_id",
  attrsep = "; ",
  bamout = "transcript",
  pData = NULL,
  verbose = TRUE,
  meas = "all",
  zipped = FALSE
)
ballgownrsem(
  dir = "",
  samples,
  gtf,
  UCSC = TRUE,
  tfield = "transcript_id",
  attrsep = "; ",
  bamout = "transcript",
  pData = NULL,
  verbose = TRUE,
  meas = "all",
  zipped = FALSE
)

Arguments

`dir`	output directory containing RSEM output for all samples (i.e. for each run of rsem-calculate-expression)
`samples`	vector of sample names (i.e., of the `sample_name` arguments used in each RSEM run)
`gtf`	path to GTF file of genes/transcripts used in your RSEM reference. (where the reference location was denoted by the `reference_name` argument used in rsem-calculate-expression). RSEM references can be created with or without a GTF file, but currently the ballgown reader requires the GTF file.
`UCSC`	set to TRUE if `gtf` comes from UCSC: quotes will be stripped from transcript identifiers if so.
`tfield`	What keyword identifies transcripts in the "attributes" field of `gtf`? Default `'transcript_id'`.
`attrsep`	How are attributes separated in the "attributes" field of `gtf`? Default `'; '` (semicolon-space).
`bamout`	set to `'genome'` if `--output-genome-bam` was used when running rsem-calculate-expression; set to `'none'` if `--no-bam-output` was used when running rsem-calculate-expression; otherwise use the default (`'transcript'`).
`pData`	data frame of phenotype data, with rows corresponding to `samples`. The first column of `pData` must be equal to `samples`, and rows must be in the same order as `samples`.
`verbose`	If TRUE (as by default), status messages are printed during data loading.
`meas`	character vector containing either "all" or one of "FPKM" or "TPM". The resulting ballgown object will only contain the specified expression measurement for the transcripts. "all" creates the full object.
`zipped`	set to TRUE if all RSEM results files have been gzipped (end) in ".gz").

Details

Currently exon- and intron-level measurements are not available for RSEM-generated ballgown objects, but development is ongoing.

Value

a ballgown object with the specified expression measurements and structure specified by GTF.

Examples

dataDir = system.file('extdata', package='ballgown')
gtf = file.path(dataDir, 'hg19_genes_small.gtf.gz')
rsemobj = ballgownrsem(dir=dataDir, samples=c('tiny', 'tiny2'), gtf=gtf,
    bamout='none', zipped=TRUE)
rsemobj

dataDir = system.file('extdata', package='ballgown')
gtf = file.path(dataDir, 'hg19_genes_small.gtf.gz')
rsemobj = ballgownrsem(dir=dataDir, samples=c('tiny', 'tiny2'), gtf=gtf,
    bamout='none', zipped=TRUE)
rsemobj

Toy ballgown object

Description

Small ballgown object created with simulated toy data, for demonstration purposes

Format

a ballgown object: 100 transcripts, 633 exons, 536 introns

Author(s)

Alyssa Frazee

Examples

data(bg)
bg
# ballgown instance with 100 transcripts and 20 samples
data(bg)
bg
# ballgown instance with 100 transcripts and 20 samples

plot annotated and assembled transcripts together

Description

plot annotated and assembled transcripts together

Usage

checkAssembledTx(
  assembled,
  annotated,
  ind = 1,
  main = "Assembled and Annotated Transcripts",
  customCol = NULL
)
checkAssembledTx(
  assembled,
  annotated,
  ind = 1,
  main = "Assembled and Annotated Transcripts",
  customCol = NULL
)

Arguments

`assembled`	a GRangesList object where the GRanges objects in the list represent sets of exons comprising assembled transcripts
`annotated`	a GRangesList object where the GRanges objects in the list represent sets of exons comprising annotated transcripts
`ind`	integer; index of `annotated` specifying which annotated transcript to plot. All transcripts (assembled and annotated) overlapping `annotated[[ind]]` will be plotted. Default 1.
`main`	optional character string giving the title for the resulting plot. Default: "Assembled and Annotated Transcripts"
`customCol`	optional vector of custom colors for the annotated transcripts. If not the same length as the number of annotated transcripts in the plot, recycling or truncation might occur.

Value

Plots annotated transcripts on the bottom panel (shaded in gray) and assembled transcripts on the top panel (shaded with diagonal lines).

Author(s)

Alyssa Frazee

Examples


gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffReadGR(gtfPath, splitByTranscript=TRUE)
data(bg)
checkAssembledTx(annotated=annot, assembled=structure(bg)$trans, ind=4)

gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffReadGR(gtfPath, splitByTranscript=TRUE)
data(bg)
checkAssembledTx(annotated=annot, assembled=structure(bg)$trans, ind=4)

group a gene's assembled transcripts into clusters

Description

group a gene's assembled transcripts into clusters

Usage

clusterTranscripts(gene, gown, k = NULL, method = c("hclust", "kmeans"))
clusterTranscripts(gene, gown, k = NULL, method = c("hclust", "kmeans"))

Arguments

`gene`	name of gene whose transcripts will be clustered. When using Cufflinks output, usually of the form `"XLOC_######"`
`gown`	ballgown object containing experimental data
`k`	number of clusters to use
`method`	clustering method to use. Must be one of `"hclust"`, for hierarchical clustering, or `"kmeans"`, for k-means clustering.

Value

list with elements clusters and pctvar. clusters contains columns "cluster" and "t_id", and denotes which transcripts belong to which clusters. pctvar is only non-NULL when using k-means clustering and is the percentage of variation explained by these clusters, defined as the ratio of the between-cluster sum of squares to the total sum of squares.

Author(s)

Alyssa Frazee

Examples

data(bg)
clusterTranscripts('XLOC_000454', bg, k=2, method='kmeans')
# transcripts 1294 and 1301 cluster together, 91% variation explained.
data(bg)
clusterTranscripts('XLOC_000454', bg, k=2, method='kmeans')
# transcripts 1294 and 1301 cluster together, 91% variation explained.

cluster a gene's transcripts and calculate cluster-level expression

Description

cluster a gene's transcripts and calculate cluster-level expression

Usage

collapseTranscripts(
  gene,
  gown,
  meas = "FPKM",
  method = c("hclust", "kmeans"),
  k = NULL
)
collapseTranscripts(
  gene,
  gown,
  meas = "FPKM",
  method = c("hclust", "kmeans"),
  k = NULL
)

Arguments

`gene`	which gene's transcripts should be clustered
`gown`	ballgown object
`meas`	which transcript-level expression measurement to use (`'cov'`, average per-base coverage, or `'FPKM'`)
`method`	which clustering method to use: `'hclust'` (hierarchical clustering) or `'kmeans'` (k-means clustering).
`k`	how many clusters to use.

Value

list with two elements:

tab, a cluster-by-sample table of expression measurements (meas, either cov or FPKM), where the expression measurement for each cluster is the mean (for 'cov') or aggregate (for 'FPKM', as in gexpr) expression measurement for all the transcripts in that cluster. This table can be used as the gowntable argument to stattest, if differential expression results for transcript *clusters* are desired.
cl output from clusterTranscripts that was run to produce tab, for reference. Cluster IDs in the cluster component correspond to row names of tab

Author(s)

Alyssa Frazee

Examples

data(bg)
collapseTranscripts(bg, gene='XLOC_000454', meas='FPKM', method='kmeans')

data(bg)
collapseTranscripts(bg, gene='XLOC_000454', meas='FPKM', method='kmeans')

determine if one set of GRanges fully contains any of another set of GRanges

Description

determine if one set of GRanges fully contains any of another set of GRanges

Usage

contains(transcripts, cds)
contains(transcripts, cds)

Arguments

`transcripts`	`GRangesList` object (assume for now that it represents transcripts)
`cds`	`GRangesList` object (assume for now that it represents sets of coding sequences)

Details

If gown is a ballgown object, transcripts can be structure(gown)$trans (or any subset).

Value

vector with length equal to length(transcripts), where each entry is TRUE if the corresponding transcript contains a coding sequence (i.e., is a superset of at least one entry of cds).

Author(s)

Alyssa Frazee

Examples

## pretend this annotation is coding sequence:
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffReadGR(gtfPath, splitByTranscript=TRUE)
data(bg)
results = contains(structure(bg)$trans, annot)
# results is a boolean vector
sum(results) #61
## pretend this annotation is coding sequence:
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffReadGR(gtfPath, splitByTranscript=TRUE)
data(bg)
results = contains(structure(bg)$trans, annot)
# results is a boolean vector
sum(results) #61

extract paths to tablemaker output

Description

extract paths to tablemaker output

Usage

dirs(x)

## S4 method for signature 'ballgown'
dirs(x)
dirs(x)

## S4 method for signature 'ballgown'
dirs(x)

Arguments

`x`	a ballgown object

Examples

data(bg)
dirs(bg)
data(bg)
dirs(bg)

extract exon-level expression measurements from ballgown objects

Description

extract exon-level expression measurements from ballgown objects

Usage

eexpr(x, meas = "rcount")

## S4 method for signature 'ballgown'
eexpr(x, meas = "rcount")
eexpr(x, meas = "rcount")

## S4 method for signature 'ballgown'
eexpr(x, meas = "rcount")

Arguments

`x`	a ballgown object
`meas`	type of measurement to extract. Can be "rcount", "ucount", "mrcount", "cov", "mcov", or "all". Default "rcount".

Value

exon-by-sample matrix containing exon-level expression values (measured by meas). If meas is "all", or x@RSEM is TRUE, a data frame is returned, containing all measurements and location information.

Examples

data(bg)
exon_rcount_matrix = eexpr(bg)
exon_ucount_matrix = eexpr(bg, 'ucount')
exon_data_frame = eexpr(bg, 'all')
data(bg)
exon_rcount_matrix = eexpr(bg)
exon_ucount_matrix = eexpr(bg, 'ucount')
exon_data_frame = eexpr(bg, 'all')

extract expression components from ballgown objects

Description

extract expression components from ballgown objects

Usage

expr(x)

## S4 method for signature 'ballgown'
expr(x)
expr(x)

## S4 method for signature 'ballgown'
expr(x)

Arguments

`x`	a ballgown object

Value

list containing elements intron, exon, and trans, which are feature-by-sample data frames of expression data.

Examples

data(bg)
names(expr(bg))
class(expr(bg))
dim(expr(bg)$exon)

data(bg)
names(expr(bg))
class(expr(bg))
dim(expr(bg)$exon)

Replacement method for expr slot in ballgown objects

Description

Replacement method for expr slot in ballgown objects

Usage

expr(x) <- value

## S4 replacement method for signature 'ballgown'
expr(x) <- value
expr(x) <- value

## S4 replacement method for signature 'ballgown'
expr(x) <- value

Arguments

`x`	a ballgown object
`value`	the updated value for `expr(x)` or a subcomponent

Examples

data(bg)
n = ncol(bg@expr$trans)
#multiply all transcript expression measurements by 10:
bg@expr$trans[,11:n] = 10*bg@expr$trans[11:n] 
data(bg)
n = ncol(bg@expr$trans)
#multiply all transcript expression measurements by 10:
bg@expr$trans[,11:n] = 10*bg@expr$trans[11:n]

subset ballgown objects using an expression filter

Description

Create a new ballgown object containing only transcripts passing a mean expression filter

Usage

exprfilter(gown, cutoff, meas = "FPKM")
exprfilter(gown, cutoff, meas = "FPKM")

Arguments

`gown`	a ballgown object
`cutoff`	transcripts must have mean expression across samples above this value to be included in the return
`meas`	how should transcript expression be measured? Default FPKM, but can also be `'cov'`.

Value

A new ballgown object derived from gown, but only containing transcripts (and associated exons/introns) with mean meas greater than cutoff across all samples.

Examples

  data(bg)
  # make a ballgown object containing only transcripts with mean FPKM > 100:
  over100 = exprfilter(bg, cutoff=100)  

data(bg)
  # make a ballgown object containing only transcripts with mean FPKM > 100:
  over100 = exprfilter(bg, cutoff=100)

get gene IDs from a ballgown object

Description

get gene IDs from a ballgown object

Usage

geneIDs(x)

## S4 method for signature 'ballgown'
geneIDs(x)
geneIDs(x)

## S4 method for signature 'ballgown'
geneIDs(x)

Arguments

`x`	a ballgown object

Details

This vector differs from that produced by geneNames in that geneIDs produces names of loci created during the assembly process, not necessarily annotated genes.

Value

named vector of gene IDs included in the ballgown object. If object was created using Tablemaker, these gene IDs will be of the form "XLOC_*". Vector is named and ordered by corresponding numeric transcript ID.

Examples

data(bg)
geneIDs(bg)
data(bg)
geneIDs(bg)

get gene names from a ballgown object

Description

get gene names from a ballgown object

Usage

geneNames(x)

## S4 method for signature 'ballgown'
geneNames(x)
geneNames(x)

## S4 method for signature 'ballgown'
geneNames(x)

Arguments

`x`	a ballgown object

Details

This vector differs from that produced by geneIDs in that geneNames produces *annotated* gene names that correspond to assembled transcripts. The return will be empty/blank/NA if the transcriptome assembly is de novo (i.e., was not compared to an annotation before the ballgown object was created). See getGenes for matching transcripts to gene names. Some entries of this vector will be empty/blank/NA if the corresponding transcript did not overlap any annotated genes.

Value

named vector of gene names included in the ballgown object, named and ordered by corresponding numeric transcript ID.

Examples

data(bg)
# this is a de novo assembly, so it does not contain gene info as it stands
# but we can add it:
annot = system.file('extdata', 'annot.gtf.gz', package='ballgown')
gnames = getGenes(annot, structure(bg)$trans, UCSC=FALSE)
gnames_first = lapply(gnames, function(x) x[1]) #just take 1 overlapping gene
expr(bg)$trans$gene_name = gnames_first

# now we can extract these gene names:
geneNames(bg)

data(bg)
# this is a de novo assembly, so it does not contain gene info as it stands
# but we can add it:
annot = system.file('extdata', 'annot.gtf.gz', package='ballgown')
gnames = getGenes(annot, structure(bg)$trans, UCSC=FALSE)
gnames_first = lapply(gnames, function(x) x[1]) #just take 1 overlapping gene
expr(bg)$trans$gene_name = gnames_first

# now we can extract these gene names:
geneNames(bg)

extract a specific field of the "attributes" column of a data frame created from a GTF/GFF file

Description

extract a specific field of the "attributes" column of a data frame created from a GTF/GFF file

Usage

getAttributeField(x, field, attrsep = "; ")
getAttributeField(x, field, attrsep = "; ")

Arguments

`x`	vector representing the "attributes" column of GTF/GFF file
`field`	name of the field you want to extract from the "attributes" column
`attrsep`	separator for the fields in the attributes column. Defaults to '; ', the separator for GTF files outputted by Cufflinks.

Value

vector of nucleotide positions included in the transcript

Author(s)

Wolfgang Huber, in the davidTiling R package (LGPL license)

Examples

gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
gffdata = gffRead(gtfPath)
gffdata$transcriptID = getAttributeField(gffdata$attributes, 
  field = "transcript_id")
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
gffdata = gffRead(gtfPath)
gffdata$transcriptID = getAttributeField(gffdata$attributes, 
  field = "transcript_id")

label assembled transcripts with gene names

Description

label assembled transcripts with gene names

Usage

getGenes(gtf, assembled, UCSC = TRUE, attribute = "gene_id")
getGenes(gtf, assembled, UCSC = TRUE, attribute = "gene_id")

Arguments

`gtf`	path to a GTF file containing locations of annotated transcripts
`assembled`	GRangesList object, with each set of ranges representing exons of an assembled transcript.
`UCSC`	set to `TRUE` if you're using a UCSC gtf file. (Requires some extra text processing).
`attribute`	set to attribute name in `gtf` that gives desired gene identifiers. Default `"gene_id"`; another commone one is `"gene_name"` (for the gene symbol).

Details

chromosome labels in gtf and assembled should match. (i.e., you should provide the path to a gtf corrsponding to the same annotation you used when constructing assembled)

Value

an IRanges CharacterList of the same length as assembled, providing the name(s) of the gene(s) that overlaps each transcript in assembled.

Author(s)

Alyssa Frazee, Andrew Jaffe

Examples

data(bg)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
geneoverlaps = getGenes(gtfPath, structure(bg)$trans, UCSC=FALSE)
data(bg)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
geneoverlaps = getGenes(gtfPath, structure(bg)$trans, UCSC=FALSE)

extract gene-level expression measurements from ballgown objects

Description

For objects created with Cufflinks/Tablemaker, gene-level measurements are calculated by appropriately combining FPKMs from the transcripts comprising the gene. For objects created with RSEM, gene-level measurements are extracted directly from the RSEM output.

Usage

gexpr(x)

## S4 method for signature 'ballgown'
gexpr(x)
gexpr(x)

## S4 method for signature 'ballgown'
gexpr(x)

Arguments

`x`	a ballgown object

Value

gene-by-sample matrix containing per-sample gene measurements.

Examples

data(bg)
gene_matrix = gexpr(bg)
data(bg)
gene_matrix = gexpr(bg)

read in GTF/GFF file as a data frame

Description

read in GTF/GFF file as a data frame

Usage

gffRead(gffFile, nrows = -1, verbose = FALSE)
gffRead(gffFile, nrows = -1, verbose = FALSE)

Arguments

`gffFile`	name of GTF/GFF on disk
`nrows`	number of rows to read in (default -1, which means read all rows)
`verbose`	if TRUE, print status info at beginning and end of file read. Default FALSE.

Value

data frame representing the GTF/GFF file

Author(s)

Kasper Hansen

Examples

gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffRead(gtfPath)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot = gffRead(gtfPath)

read in gtf file as GRanges object

Description

(very) light wrapper for rtracklayer::import

Usage

gffReadGR(
  gtf,
  splitByTranscript = FALSE,
  identifier = "transcript_id",
  sep = "; "
)
gffReadGR(
  gtf,
  splitByTranscript = FALSE,
  identifier = "transcript_id",
  sep = "; "
)

Arguments

`gtf`	name of GTF/GFF file on disk
`splitByTranscript`	if `TRUE`, return a `GRangesList` of transcripts; otherwise return a `GRanges` object containing all genomic features in `gtf`. Default `FALSE`.
`identifier`	name of transcript identifier column of `attributes` field in `gtf`. Default `"transcript_id"`. Only used if `splitByTranscript` is `TRUE`.
`sep`	field separator in the `attributes` field of `gtf`. Default `"; "` (semicolon + space). Only used if `splitByTranscript` is `TRUE`.

Value

if splitByTranscript is FALSE, an object of class GRanges representing the genomic features in gtf. If splitByTranscript is TRUE, an object of class GRangesList, where each element is a GRanges object corresponding to an annotated transcript (designated in names).

Author(s)

Alyssa Frazee

Examples

gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')

# read in exons as GRanges:
annotgr = gffReadGR(gtfPath)

# read in groups of exons as transcripts, in GRangesList:
transcripts_grl = gffReadGR(gtfPath, splitByTranscript=TRUE)

gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')

# read in exons as GRanges:
annotgr = gffReadGR(gtfPath)

# read in groups of exons as transcripts, in GRangesList:
transcripts_grl = gffReadGR(gtfPath, splitByTranscript=TRUE)

extract transcript-level expression measurements from ballgown objects

Description

extract transcript-level expression measurements from ballgown objects

Usage

iexpr(x, meas = "rcount")

## S4 method for signature 'ballgown'
iexpr(x, meas = "rcount")
iexpr(x, meas = "rcount")

## S4 method for signature 'ballgown'
iexpr(x, meas = "rcount")

Arguments

`x`	a ballgown object
`meas`	type of measurement to extract. Can be "rcount", "ucount", "mrcount", or "all". Default "rcount".

Value

intron-by-sample matrix containing the number of reads (measured as specified by meas) supporting each intron, in each sample. If meas is "all", a data frame is returned, containing all measurements and location information.

Examples

data(bg)
intron_rcount_matrix = iexpr(bg)
intron_data_frame = iexpr(bg, 'all')
data(bg)
intron_rcount_matrix = iexpr(bg)
intron_data_frame = iexpr(bg, 'all')

extract the indexes from ballgown objects

Description

extract the indexes from ballgown objects

Usage

indexes(x)

## S4 method for signature 'ballgown'
indexes(x)
indexes(x)

## S4 method for signature 'ballgown'
indexes(x)

Arguments

`x`	a ballgown object

Value

list containing elements e2t, i2t, t2g, bamfiles, and pData, where e2t and i2t are data frames linking exons and introns (respectively) to transcripts, t2g is a data frame linking transcripts to genes, and bamfiles and pData are described in ?ballgown.

Examples

data(bg)
names(indexes(bg))
class(indexes(bg))
head(indexes(bg)$t2g)
data(bg)
names(indexes(bg))
class(indexes(bg))
head(indexes(bg)$t2g)

Replace method for indexes slot in ballgown objects

Description

Replace method for indexes slot in ballgown objects

Usage

indexes(x) <- value

## S4 replacement method for signature 'ballgown'
indexes(x) <- value
indexes(x) <- value

## S4 replacement method for signature 'ballgown'
indexes(x) <- value

Arguments

`x`	a ballgown object
`value`	the updated value for `indexes(x)` or a subcomponent

Examples

data(bg)
indexes(bg)$bamfiles = paste0('/path/to/bamfolder/', 
  sampleNames(bg), '_accepted_hits.bam')
data(bg)
indexes(bg)$bamfiles = paste0('/path/to/bamfolder/', 
  sampleNames(bg), '_accepted_hits.bam')

get the last element

Description

get the last element

Usage

last(x)
last(x)

Arguments

`x`	anything you can call `tail` on (vector, data frame, etc.)

Details

this function is made of several thousand lines of complex code, so be sure to read it carefully.

Value

the last element of x

Author(s)

Alyssa Frazee

Examples

last(c('h', 'e', 'l', 'l', 'o'))
last(c('h', 'e', 'l', 'l', 'o'))

extract package version & creation date from ballgown object

Description

extract package version & creation date from ballgown object

Usage

mergedDate(x)

## S4 method for signature 'ballgown'
mergedDate(x)
mergedDate(x)

## S4 method for signature 'ballgown'
mergedDate(x)

Arguments

`x`	a ballgown object

Examples

data(bg)
mergedDate(bg)
data(bg)
mergedDate(bg)

calculate percent overlap between two GRanges objects

Description

calculate percent overlap between two GRanges objects

Usage

pctOverlap(tx1, tx2)
pctOverlap(tx1, tx2)

Arguments

`tx1`	GRanges object
`tx2`	GRanges object

Details

In the ballgown context, tx1 and tx2 are two transcripts, each represented by GRanges objects whose ranges represent the exons comprising the transcripts. The percent overlap is the number of nucleotides falling within both transcripts divided by the number of nucleotides falling within either transcript. Useful as a measure of transcript closeness (as it is essentially Jaccard distance).

Value

percent overlap between tx1 and tx2, as defined by the ratio of the intersection of tx1 and tx2 to the union of tx1 and tx2.

Author(s)

Alyssa Frazee

Examples

data(bg)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot_grl = gffReadGR(gtfPath, splitByTranscript=TRUE)
pctOverlap(structure(bg)$trans[[2]], annot_grl[[369]]) #79.9%
data(bg)
gtfPath = system.file('extdata', 'annot.gtf.gz', package='ballgown')
annot_grl = gffReadGR(gtfPath, splitByTranscript=TRUE)
pctOverlap(structure(bg)$trans[[2]], annot_grl[[369]]) #79.9%

extract phenotype data from a ballgown object

Description

extract phenotype data from a ballgown object

Usage

pData(object)

## S4 method for signature 'ballgown'
pData(object)
pData(object)

## S4 method for signature 'ballgown'
pData(object)

Arguments

object

a ballgown object

Value

sample-by-phenotype data frame

Examples

data(bg)
pData(bg)
data(bg)
pData(bg)

Replacement method for pData slot in ballgown objects

Description

Replacement method for pData slot in ballgown objects

Usage

pData(object) <- value

## S4 replacement method for signature 'ballgown,ANY'
pData(object) <- value
pData(object) <- value

## S4 replacement method for signature 'ballgown,ANY'
pData(object) <- value

Arguments

`object`	a ballgown object
`value`	the updated value for `pData(x)`.

Examples

# add "timepoint" covariate to ballgown object:
data(bg) # already contains pData
pData(bg) = data.frame(pData(bg), timepoint=rep(1:10, 2))
head(pData(bg))
# add "timepoint" covariate to ballgown object:
data(bg) # already contains pData
pData(bg) = data.frame(pData(bg), timepoint=rep(1:10, 2))
head(pData(bg))

cluster assembled transcripts and plot the results

Description

This is an experimental, first-pass function that clusters assembled transcripts based on their overlap percentage, then plots and colors the transcript clusters.

Usage

plotLatentTranscripts(
  gene,
  gown,
  method = c("hclust", "kmeans"),
  k = NULL,
  choosek = c("var90", "thumb"),
  returncluster = TRUE,
  labelTranscripts = TRUE,
  ...
)
plotLatentTranscripts(
  gene,
  gown,
  method = c("hclust", "kmeans"),
  k = NULL,
  choosek = c("var90", "thumb"),
  returncluster = TRUE,
  labelTranscripts = TRUE,
  ...
)

Arguments

`gene`	string, name of gene whose transcripts should be clustered (e.g., "XLOC_000001")
`gown`	object of class `ballgown` being used for analysis
`method`	clustering method to use. Currently can choose from hierarchical clustering (`hclust`) or K-means (`kmeans`). More methods are in development.
`k`	number of transcripts clusters to use. By default, `k` is `NULL` and thus is chosen using a rule of thumb, but providing `k` overrides those rules of thumb.
`choosek`	if `k` is not provided, how should the number of clusters be chosen? Must be one of "var90" (choose a `k` that explains 90 percent of the observed variation) or "thumb" (`k` is set to be approximately `sqrt(n)`, where n is the total number of transcripts for `gene`)
`returncluster`	if TRUE (as it is by default), return the results of the call to `clusterTrancsripts` so the data is available for later use. Nothing is returned if FALSE.
`labelTranscripts`	if TRUE (as it is by default), print transcript IDs on the y-axis
`...`	other arguments to pass to plotTranscripts

Value

if returncluster is TRUE, the transcript clusters are returned as described in clusterTranscripts. A plot of the transcript clusters is also produced, in the style of plotTranscripts.

Author(s)

Alyssa Frazee

Examples


data(bg)
plotLatentTranscripts('XLOC_000454', bg, method='kmeans', k=2)

data(bg)
plotLatentTranscripts('XLOC_000454', bg, method='kmeans', k=2)

visualize transcript abundance by group

Description

visualize transcript abundance by group

Usage

plotMeans(
  gene,
  gown,
  overall = FALSE,
  groupvar,
  groupname = "all",
  meas = c("cov", "FPKM", "rcount", "ucount", "mrcount", "mcov"),
  colorby = c("transcript", "exon"),
  legend = TRUE,
  labelTranscripts = FALSE
)
plotMeans(
  gene,
  gown,
  overall = FALSE,
  groupvar,
  groupname = "all",
  meas = c("cov", "FPKM", "rcount", "ucount", "mrcount", "mcov"),
  colorby = c("transcript", "exon"),
  legend = TRUE,
  labelTranscripts = FALSE
)

Arguments

`gene`	name of gene whose transcripts will be plotted. When using Cufflinks/Tablemaker output, usually of the form `"XLOC_######"`
`gown`	ballgown object containing experimental and phenotype data
`overall`	if `TRUE`, color features by the overall (experiment-wide) mean rather than a group-specific mean
`groupvar`	string representing the name of the variable denoting which sample belongs to which group. Can be `"none"` (if you want the study-wide mean), or must correspond to the name of a column of `pData(gown)`. Usually a categorical variable.
`groupname`	string representing which group's expression means you want to plot. Can be `"none"` (if you want the study-wide mean), `"all"` (if you want a multipanel plot of each group's mean expression), or any of the levels of `groupvar`.
`meas`	type of expression measurement to plot. One of "cov", "FPKM", "rcount", "ucount", "mrcount", or "mcov". Not all types are valid for all features. (See description of tablemaker output for more information).
`colorby`	one of `"transcript"` or `"exon"`, indicating which feature's abundances should dictate plot coloring.
`legend`	if `TRUE` (as it is by default), a color legend is drawn on top of the plot indicating the scale for feature abundances.
`labelTranscripts`	if `TRUE`, transcript ids are labeled on the left side of the plot. Default `FALSE`.

Value

produces a plot of the transcript structure for the specified gene in the current graphics device, colored by study-wide or group-specific mean expression level.

Author(s)

Alyssa Frazee

Examples


data(bg)
plotMeans('XLOC_000454', bg, groupvar='group', meas='FPKM',
  colorby='transcript')

data(bg)
plotMeans('XLOC_000454', bg, groupvar='group', meas='FPKM',
  colorby='transcript')

visualize structure of assembled transcripts

Description

visualize structure of assembled transcripts

Usage

plotTranscripts(
  gene,
  gown,
  samples = NULL,
  colorby = "transcript",
  meas = "FPKM",
  legend = TRUE,
  labelTranscripts = FALSE,
  main = NULL,
  blackBorders = TRUE,
  log = FALSE,
  logbase = 2,
  customCol = NULL,
  customOrder = NULL
)
plotTranscripts(
  gene,
  gown,
  samples = NULL,
  colorby = "transcript",
  meas = "FPKM",
  legend = TRUE,
  labelTranscripts = FALSE,
  main = NULL,
  blackBorders = TRUE,
  log = FALSE,
  logbase = 2,
  customCol = NULL,
  customOrder = NULL
)

Arguments

`gene`	name of gene whose transcripts will be plotted. When using Cufflinks output, usually of the form `"XLOC_######"`
`gown`	ballgown object containing experimental and phenotype data
`samples`	vector of sample(s) to plot. Can be `'none'` if only one plot (showing transcript structure in gray) is desired. Use `sampleNames(gown)` to see sample names for `gown`. Defaults to `sampleNames(gown)[1]`.
`colorby`	one of `"transcript"`, `"exon"`, or `"none"`, indicating which feature's abundances should dictate plot coloring. If `"none"`, all transcripts are drawn in gray.
`meas`	which expression measurement to color features by, if any. Must match an available measurement for whatever feature you're plotting.
`legend`	if `TRUE` (as it is by default), a color legend is drawn on top of the plot indicating scales for feature abundances.
`labelTranscripts`	if `TRUE`, transcript ids are labeled on the left side of the plot. Default `FALSE`.
`main`	optional string giving the desired plot title.
`blackBorders`	if `TRUE`, exon borders are drawn in black. Otherwise, they are drawn in the same color as their transcript or exon. Switching blackBorders to FALSE can be useful for visualizing abundances for skinny exons and/or smaller plots, which can be the case when `length(samples)` is large.
`log`	if `TRUE`, color transcripts on the log scale. Default `FALSE`. To account for expression values of 0, we add 1 to all expression values before taking the log.
`logbase`	log base to use if `log = TRUE`. Default 2.
`customCol`	an optional vector of custom colors to color transcripts by. There must be the same number of colors as transcripts in the gene being plotted.
`customOrder`	an optional vector of transcript ids (matching ids in `texpr(gown, 'all')$t_id`), indicating which order transcripts will appear in the plot. All transcripts in `gene` must appear in the vector exactly once.

Value

produces a plot of the transcript structure for the specified gene in the current graphics device.

Author(s)

Alyssa Frazee

Examples


data(bg)

# plot one gene for one sample:
plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12', meas='FPKM',
    colorby='transcript', 
    main='transcripts from gene XLOC_000454: sample 12, FPKM')

# plot one gene for many samples:
plotTranscripts('XLOC_000454', bg, 
    samples=c('sample01', 'sample06', 'sample12', 'sample19'), 
    meas='FPKM', colorby='transcript')


data(bg)

# plot one gene for one sample:
plotTranscripts(gene='XLOC_000454', gown=bg, samples='sample12', meas='FPKM',
    colorby='transcript', 
    main='transcripts from gene XLOC_000454: sample 12, FPKM')

# plot one gene for many samples:
plotTranscripts('XLOC_000454', bg, 
    samples=c('sample01', 'sample06', 'sample12', 'sample19'), 
    meas='FPKM', colorby='transcript')

get names of samples in a ballgown objects

Description

get names of samples in a ballgown objects

Usage

sampleNames(object)

## S4 method for signature 'ballgown'
sampleNames(object)
sampleNames(object)

## S4 method for signature 'ballgown'
sampleNames(object)

Arguments

object

a ballgown object

Value

vector of sample IDs for x. If pData exists, samples in its rows correspond to samples in sampleNames(x) (in order).

Examples

data(bg)
sampleNames(bg)
data(bg)
sampleNames(bg)

get sequence (chromosome) names from ballgown object

Description

get sequence (chromosome) names from ballgown object

Usage

seqnames(x)

## S4 method for signature 'ballgown'
seqnames(x)
seqnames(x)

## S4 method for signature 'ballgown'
seqnames(x)

Arguments

`x`	a ballgown object

Value

vector of sequence (i.e., chromosome) names included in the ballgown object

Examples

data(bg)
seqnames(bg)
data(bg)
seqnames(bg)

statistical tests for differential expression in ballgown

Description

Test each transcript, gene, exon, or intron in a ballgown object for differential expression, using comparisons of linear models.

Usage

stattest(
  gown = NULL,
  gowntable = NULL,
  pData = NULL,
  mod = NULL,
  mod0 = NULL,
  feature = c("gene", "exon", "intron", "transcript"),
  meas = c("cov", "FPKM", "rcount", "ucount", "mrcount", "mcov"),
  timecourse = FALSE,
  covariate = NULL,
  adjustvars = NULL,
  gexpr = NULL,
  df = 4,
  getFC = FALSE,
  libadjust = NULL,
  log = TRUE
)
stattest(
  gown = NULL,
  gowntable = NULL,
  pData = NULL,
  mod = NULL,
  mod0 = NULL,
  feature = c("gene", "exon", "intron", "transcript"),
  meas = c("cov", "FPKM", "rcount", "ucount", "mrcount", "mcov"),
  timecourse = FALSE,
  covariate = NULL,
  adjustvars = NULL,
  gexpr = NULL,
  df = 4,
  getFC = FALSE,
  libadjust = NULL,
  log = TRUE
)

Arguments

`gown`	name of an object of class `ballgown`
`gowntable`	matrix or matrix-like object with `rownames` representing feature IDs and columns representing samples, with expression estimates in the cells. Provide the feature name with `feature`. You must provide exactly one of `gown` or `gowntable`. NB: gowntable is log-transformed within `stattest` if `log` is `TRUE`, so provide un-logged expression values in `gowntable`.
`pData`	Required if `gowntable` is provided: data frame giving phenotype data for the samples in the columns of `gowntable`. (Rows of `pData` correspond to columns of `gowntable`). If `gown` is used instead, it must have a non-null, valid `pData` slot (and the `pData` argument to `stattest` should be left `NULL`).
`mod`	object of class `model.matrix` representing the design matrix for the linear regression model including covariates of interest
`mod0`	object of class `model.matrix` representing the design matrix for the linear regression model without the covariates of interest.
`feature`	the type of genomic feature to be tested for differential expression. If `gown` is used, must be one of `"gene"`, `"transcript"`, `"exon"`, or `"intron"`. If `gowntable` is used, this is just used for labeling and can be whatever the rows of `gowntable` represent.
`meas`	the expression measurement to use for statistical tests. Must be one of `"cov"`, `"FPKM"`, `"rcount"`, `"ucount"`, `"mrcount"`, or `"mcov"`. Not all expression measurements are available for all features. Leave as default if `gowntable` is provided.
`timecourse`	if `TRUE`, tests whether or not the expression profiles of genomic features vary over time (or another continuous covariate) in the study. Default `FALSE`. Natural splines are used to fit time profiles, so you must have more timepoints than degrees of freedom used to fit the splines. The default df is 4.
`covariate`	string representing the name of the covariate of interest for the differential expression tests. Must correspond to the name of a column of `pData(gown)`. If `timecourse=TRUE`, this should be the study's time variable.
`adjustvars`	optional vector of strings representing the names of potential confounders. Must correspond to names of columns of `pData(gown)`.
`gexpr`	optional data frame that is the result of calling `gexpr(gown))`. (You can speed this function up by pre-creating `gexpr(gown)`.)
`df`	degrees of freedom used for modeling expression over time with natural cubic splines. Default 4. Only used if `timecourse=TRUE`.
`getFC`	if `TRUE`, also return estimated fold changes (adjusted for library size and confounders) between populations. Only available for 2-group comparisons at the moment. Default `FALSE`.
`libadjust`	library-size adjustment to use in linear models. By default, the adjustment is defined as the sum of the sample's log expression measurements below the 75th percentile of those measurements. To use a different library-size adjustment, provide a numeric vector of each sample's adjustment value. Entries of this vector correspond to samples in in rows of `pData`. If no library size adjustment is desired, set to FALSE.
`log`	if `TRUE`, outcome variable in linear models is log(expression+1), otherwise it's expression. Default TRUE.

Details

At minimum, you need to provide a ballgown object or count table, the type of feature you want to test (gene, transcript, exon, or intron), the expression measurement you want to use (FPKM, cov, rcount, etc.), and the covariate of interest, which must be the name of one of the columns of the 'pData' component of your ballgown object (or provided pData). This covariate is automatically converted to a factor during model fitting in non-timecourse experiments.

By default, models are fit using log2(meas + 1) as the outcome for each feature. To disable the log transformation, provide 'log = FALSE' as an argument to 'stattest'. You can use the gowntable option if you'd like to to use a different transformation.

Library size adjustment is performed by default by using the sum of the log nonzero expression measurements for each sample, up to the 75th percentile of those measurements. This adjustment can be disabled by setting libadjust=FALSE. You can use mod and mod0 to specify alternative library size adjustments.

mod and mod0 are optional arguments. If mod is specified, you must also specify mod0. If neither is specified, mod0 defaults to the design matrix for a model including only a library-size adjustment, and mod defaults to the design matrix for a model including a library-size adjustment and covariate. Note that if you supply mod and mod0, covariate, timecourse, adjustvars, and df are ignored, so make sure your covariate of interest and all appropriate confounder adjustments, including library size, are specified in mod and mod0. By default, the library-size adjustment is the sum of all counts below the 75th percentile of nonzero counts, on the log scale (log2 + 1).

Full model details are described in the supplement of http://biorxiv.org/content/early/2014/03/30/003665.

Value

data frame containing the columns feature, id representing feature id, pval representing the p-value for testing whether this feature was differentially expressed according to covariate, and qval, the estimated false discovery rate using this feature's signal strength as a significance cutoff. An additional column, fc, is included if getFC is TRUE.

Author(s)

Jeff Leek, Alyssa Frazee

References

http://biorxiv.org/content/early/2014/03/30/003665

Examples

data(bg)

# two-group comparison:
stat_results = stattest(bg, feature='transcript', meas='FPKM', 
  covariate='group')

# timecourse test:
pData(bg) = data.frame(pData(bg), time=rep(1:10, 2)) #dummy time covariate
timecourse_results = stattest(bg, feature='transcript', meas='FPKM', 
  covariate='time', timecourse=TRUE)

# timecourse test, adjusting for group:
group_adj_timecourse_results = stattest(bg, feature='transcript', 
  meas='FPKM', covariate='time', timecourse=TRUE, adjustvars='group')

# custom model matrices:
### create example data:
set.seed(43)
sex = sample(c('M','F'), size=nrow(pData(bg)), replace=TRUE)
age = sample(21:52, size=nrow(pData(bg)), replace=TRUE)

### create design matrices:
mod = model.matrix(~ sex + age + pData(bg)$group + pData(bg)$time)
mod0 = model.matrix(~ pData(bg)$group + pData(bg)$time)

### build model: 
adjusted_results = stattest(bg, feature='transcript', meas='FPKM', 
  mod0=mod0, mod=mod)
data(bg)

# two-group comparison:
stat_results = stattest(bg, feature='transcript', meas='FPKM', 
  covariate='group')

# timecourse test:
pData(bg) = data.frame(pData(bg), time=rep(1:10, 2)) #dummy time covariate
timecourse_results = stattest(bg, feature='transcript', meas='FPKM', 
  covariate='time', timecourse=TRUE)

# timecourse test, adjusting for group:
group_adj_timecourse_results = stattest(bg, feature='transcript', 
  meas='FPKM', covariate='time', timecourse=TRUE, adjustvars='group')

# custom model matrices:
### create example data:
set.seed(43)
sex = sample(c('M','F'), size=nrow(pData(bg)), replace=TRUE)
age = sample(21:52, size=nrow(pData(bg)), replace=TRUE)

### create design matrices:
mod = model.matrix(~ sex + age + pData(bg)$group + pData(bg)$time)
mod0 = model.matrix(~ pData(bg)$group + pData(bg)$time)

### build model: 
adjusted_results = stattest(bg, feature='transcript', meas='FPKM', 
  mod0=mod0, mod=mod)

extract structure components from ballgown objects

Description

extract structure components from ballgown objects

Usage

structure(x)

## S4 method for signature 'ballgown'
structure(x)
structure(x)

## S4 method for signature 'ballgown'
structure(x)

Arguments

`x`	a ballgown object

Value

list containing elements intron, exon, and trans. exon and intron are GRanges objects, where each range is an exon or intron, and trans is a GRangesList object, where each GRanges element is a set of exons representing a transcript.

Examples

data(bg)
names(structure(bg))
class(structure(bg))
structure(bg)$exon
data(bg)
names(structure(bg))
class(structure(bg))
structure(bg)$exon

subset ballgown objects to specific samples or genomic locations

Description

subset ballgown objects to specific samples or genomic locations

Usage

subset(x, ...)

## S4 method for signature 'ballgown'
subset(x, cond, genomesubset = TRUE)
subset(x, ...)

## S4 method for signature 'ballgown'
subset(x, cond, genomesubset = TRUE)

Arguments

`x`	a ballgown object
`...`	further arguments to generic subset
`cond`	Condition on which to subset. See details.
`genomesubset`	if TRUE, subset `x` to a specific part of the genome. Otherwise, subset x to only include specific samples. TRUE by default.

Details

To use subset, you must provide the cond argument as a string representing a logical expression specifying your desired subset. The subset expression can either involve column names of texpr(x, "all") (if genomesubset is TRUE) or of pData(x) (if genomesubset is FALSE). For example, if you wanted a ballgown object for only chromosome 22, you might call subset(x, "chr == 'chr22'"). (Be sure to handle quotes within character strings appropriately).

Value

a subsetted ballgown object, containing only the regions or samples satisfying cond.

Author(s)

Alyssa Frazee

Examples

data(bg)
bg_twogenes = subset(bg, "gene_id=='XLOC_000454' | gene_id=='XLOC_000024'")
bg_twogenes 
# ballgown instance with 4 assembled transcripts and 20 samples

bg_group0 = subset(bg, "group == 0", genomesubset=FALSE)
bg_group0 
# ballgown instance with 100 assembled transcripts and 10 samples
data(bg)
bg_twogenes = subset(bg, "gene_id=='XLOC_000454' | gene_id=='XLOC_000024'")
bg_twogenes 
# ballgown instance with 4 assembled transcripts and 20 samples

bg_group0 = subset(bg, "group == 0", genomesubset=FALSE)
bg_group0 
# ballgown instance with 100 assembled transcripts and 10 samples

extract transcript-level expression measurements from ballgown objects

Description

extract transcript-level expression measurements from ballgown objects

Usage

texpr(x, meas = "FPKM")

## S4 method for signature 'ballgown'
texpr(x, meas = "FPKM")
texpr(x, meas = "FPKM")

## S4 method for signature 'ballgown'
texpr(x, meas = "FPKM")

Arguments

`x`	a ballgown object
`meas`	type of measurement to extract. Can be "cov", "FPKM", or "all". Default "FPKM".

Value

transcript-by-sample matrix containing expression values (measured by meas). If meas is "all", a data frame is returned, containing all measurements and location information.

Examples

data(bg)
transcript_fpkm_matrix = texpr(bg)
transcript_data_frame = texpr(bg, 'all')
data(bg)
transcript_fpkm_matrix = texpr(bg)
transcript_data_frame = texpr(bg, 'all')

Connect a transcript to its gene

Description

find the gene to which a transcript belongs

Usage

tGene(bg, transcript, tid = TRUE, gid = TRUE, warnme = TRUE)
tGene(bg, transcript, tid = TRUE, gid = TRUE, warnme = TRUE)

Arguments

`bg`	ballgown object
`transcript`	transcript identifier
`tid`	set to `TRUE` if `transcript` is a numeric transcript identifier (i.e., `t_id` in expression tables), or `FALSE` if `transcript` is a named identifie (e.g., `TCONS_000001` or similar.
`gid`	if `FALSE`, return the gene name associated with `transcript` in `bg` instead of the gene id, which is returned by default. Take care to remember that not all ballgown objects include gene name information. (They do all include gene IDs).
`warnme`	if `TRUE`, and if `gid` is `FALSE`, print a warning if no gene name is available for the transcript. This could either mean the transcript didn't overlap an annotated gene, or that no gene names were included when `bg` was created.

Examples

  data(bg)
  tGene(bg, 10)
  tGene(bg, 'TCONS_00000010', tid=FALSE)
  tGene(bg, 10, gid=FALSE) #empty: no gene names included in bg.

data(bg)
  tGene(bg, 10)
  tGene(bg, 'TCONS_00000010', tid=FALSE)
  tGene(bg, 10, gid=FALSE) #empty: no gene names included in bg.

get numeric transcript IDs from a ballgown object

Description

get numeric transcript IDs from a ballgown object

Usage

transcriptIDs(x)

## S4 method for signature 'ballgown'
transcriptIDs(x)
transcriptIDs(x)

## S4 method for signature 'ballgown'
transcriptIDs(x)

Arguments

`x`	a ballgown object

Value

vector of numeric transcript IDs included in the ballgown object

Examples

data(bg)
transcriptIDs(bg)
data(bg)
transcriptIDs(bg)

get transcript names from a ballgown object

Description

get transcript names from a ballgown object

Usage

transcriptNames(x)

## S4 method for signature 'ballgown'
transcriptNames(x)
transcriptNames(x)

## S4 method for signature 'ballgown'
transcriptNames(x)

Arguments

`x`	a ballgown object

Value

vector of transcript names included in the ballgown object. If object was created using Cufflinks/Tablemaker, these transcript names will be of the form "TCONS_*". Return vector is named and ordered by corresponding numeric transcript ID.

Examples

data(bg)
transcriptNames(bg)
data(bg)
transcriptNames(bg)

write files to disk from ballgown object

Description

create tablemaker-like files on disk from a ballgown object

Usage

writeFiles(gown, dataDir)
writeFiles(gown, dataDir)

Arguments

`gown`	ballgown object
`dataDir`	top-level directory for sample-specific folders

Examples


  data(bg)
  writeFiles(bg, dataDir=getwd())

data(bg)
  writeFiles(bg, dataDir=getwd())

Package 'ballgown'

Help Index

The ballgown package for analysis of transcript assemblies

Description

match assembled transcripts to annotated transcripts

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Ballgown

Description

Slots

Author(s)

Examples

constructor function for ballgown objects

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

load RSEM data into a ballgown object

Description

Usage

Arguments

Details

Value

See Also

Examples

Toy ballgown object

Description

Format

Author(s)

Examples

plot annotated and assembled transcripts together

Description

Usage

Arguments

Value

Author(s)

Examples

group a gene's assembled transcripts into clusters

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

cluster a gene's transcripts and calculate cluster-level expression

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

determine if one set of GRanges fully contains any of another set of GRanges

Description

Usage

Arguments

Details

Value

Author(s)

Examples

extract paths to tablemaker output

Description

Usage

Arguments

Examples

extract exon-level expression measurements from ballgown objects

Description

Usage

Arguments

Value