Package 'sagenhaft' reference manual

Title:	Collection of functions for reading and comparing SAGE libraries
Description:	This package implements several functions useful for analysis of gene expression data by sequencing tags as done in SAGE (Serial Analysis of Gene Expressen) data, i.e. extraction of a SAGE library from sequence files, sequence error correction, library comparison. Sequencing error correction is implementing using an Expectation Maximization Algorithm based on a Mixture Model of tag counts.
Authors:	Tim Beissbarth <[email protected]>, with contributions from Gordon Smyth <[email protected]>
Maintainer:	Tim Beissbarth <[email protected]>
License:	GPL (>= 2)
Version:	1.77.0
Built:	2025-02-17 03:29:10 UTC
Source:	https://github.com/bioc/sagenhaft

Estimate sequencing errors and compute corrected counts

Description

These functions are used to compute sequencing error correction in a library. They are automatically called when extracting tags from sequences and therefore usually do not have to be called directly.

Usage

estimate.errors.mean(lib)
compute.sequence.neighbors(tags, taglength=10, quality.scores=NULL,
                           output="character") 
em.estimate.error.given(lib, maxstep=50, ...)
estimate.errors.mean(lib)
compute.sequence.neighbors(tags, taglength=10, quality.scores=NULL,
                           output="character") 
em.estimate.error.given(lib, maxstep=50, ...)

Arguments

`lib`	A sage library object
`tags`	A character vector or numeric vector containing tags
`taglength`	length of tag
`quality.scores`	A matrix containing base quality scores as -10 log10 Pe
`maxstep`	iterations of EM algorithm
`output`	Output type for `compute.sequence.neighbors`, either character or numeric
`...`	Other arguments ignored.

Author(s)

Tim Beissbarth

References

http://tagcalling.mbgproject.org

Examples

library(sagenhaft)
B6Hypo <-read.sage.library(system.file("extdata", "B6HypothalHFI.sage",
                           package="sagenhaft")) 
E15post <- read.sage.library(system.file("extdata", "E15postHFI.sage",
                             package="sagenhaft")) 
testlib <- combine.libs(B6Hypo, E15post)
testlib <- estimate.errors.mean(testlib)
testlib <- em.estimate.error.given(testlib)
tagneighbors <- compute.sequence.neighbors(testlib$seqs[,"seq"], 10,
                          testlib$seqs[,paste("q", 1:10, sep="")]) 
library(sagenhaft)
B6Hypo <-read.sage.library(system.file("extdata", "B6HypothalHFI.sage",
                           package="sagenhaft")) 
E15post <- read.sage.library(system.file("extdata", "E15postHFI.sage",
                             package="sagenhaft")) 
testlib <- combine.libs(B6Hypo, E15post)
testlib <- estimate.errors.mean(testlib)
testlib <- em.estimate.error.given(testlib)
tagneighbors <- compute.sequence.neighbors(testlib$seqs[,"seq"], 10,
                          testlib$seqs[,paste("q", 1:10, sep="")])

Functions for SAGE library extraction

Description

Functions to extract the tags in a library from sequences or base-caller output.

Usage

extract.lib.from.zip(zipfile, libname=sub(".zip","",basename(zipfile)),
                     ...)
extract.lib.from.directory(dirname, libname=basename(dirname),
                           pattern, ...)
extract.library.tags(filelist, base.caller.format="phd",
                     remove.duplicate.ditags=TRUE, 
                     remove.N=FALSE, remove.low.quality=10,
                     taglength=10, min.ditag.length=(2*taglength-2),
                     max.ditag.length=(2*taglength+4),
                     cut.site="catg", default.quality=NA, verbose=TRUE,
                     ...) 
reestimate.lib.from.tagcounts(tagcounts, libname, default.quality=20, ...) 
compute.unique.tags(lib)
combine.libs(..., artifacts=c("Linker", "Ribosomal", "Mitochondrial"))
remove.sage.artifacts(lib,
                      artifacts=c("Linker","Ribosomal","Mitochondrial"),
                      ...)
read.phd.file(file)
read.seq.qual.filepair(file, default.quality=NA)
extract.ditags(sequence, taglength=10, filename=NA,
               min.ditag.length=(2*taglength-2),
               max.ditag.length=(2*taglength+4), cut.site="catg")

extract.lib.from.zip(zipfile, libname=sub(".zip","",basename(zipfile)),
                     ...)
extract.lib.from.directory(dirname, libname=basename(dirname),
                           pattern, ...)
extract.library.tags(filelist, base.caller.format="phd",
                     remove.duplicate.ditags=TRUE, 
                     remove.N=FALSE, remove.low.quality=10,
                     taglength=10, min.ditag.length=(2*taglength-2),
                     max.ditag.length=(2*taglength+4),
                     cut.site="catg", default.quality=NA, verbose=TRUE,
                     ...) 
reestimate.lib.from.tagcounts(tagcounts, libname, default.quality=20, ...) 
compute.unique.tags(lib)
combine.libs(..., artifacts=c("Linker", "Ribosomal", "Mitochondrial"))
remove.sage.artifacts(lib,
                      artifacts=c("Linker","Ribosomal","Mitochondrial"),
                      ...)
read.phd.file(file)
read.seq.qual.filepair(file, default.quality=NA)
extract.ditags(sequence, taglength=10, filename=NA,
               min.ditag.length=(2*taglength-2),
               max.ditag.length=(2*taglength+4), cut.site="catg")

Arguments

`zipfile`, `dirname`	Name of a ZIP file or a directory that contains base-caller output files
`libname`	`libname` a character string to be assigned as library name
`pattern`	Regular expression to specify pattern for the files that will be read
`filelist`	List of files to be read
`base.caller.format`	`base.caller.format` can be "phd" or "seq" or a character vector of the length of the filelist
`remove.duplicate.ditags`	Remove duplicate ditags. TRUE or FALSE
`remove.N`	Remove all tags that contain N. TRUE or FALSE
`remove.low.quality`	Remove all tags with an average quality score of less than `remove.low.quality`. Skipped if < 0
`taglength`	Length of tags. Usually 10 or 17
`min.ditag.length`, `max.ditag.length`	Minimum and maximum length for ditags
`cut.site`	Restriction enzyme cut site. Usually CATG
`verbose`	Display information during process
`lib`	Library object
`file`, `filename`	Character string indicating file name
`default.quality`	Quality value to use on sequences, if quality files are missing
`sequence`	Construct containing sequence and quality values returned by read.phd.file or read.seq.qual.filepair
`artifacts`	Types of artificially generated tags to remove.
`...`	Arguments passed on to extraction functions.
`tagcounts`	Tagcounts from library. Integer Vecotor with Tag sequences as names.

Details

The functions extract.lib.from.zip or extract.lib.from.directory should be used to extract the SAGE TAGS from the sequences of a library, the sequences need to be provided by the output files from the base caller software either in a ZIP archive or in a directory. These are usually the only functions that should directly be called by the user. The other functions are called by these and should only be used directly by experienced users to get more direct control over the process. Most arguments are passed on and can be specified in the high level functions. Zipfilenames must be specified using relative pathnames!

Value

lib returns an SAGE library object.

Author(s)

Tim Beissbarth

References

http://tagcalling.mbgproject.org

Examples

#library(sagenhaft)
#file.copy(system.file("extdata", "E15postHFI.zip",package="sagenhaft"),
#          "E15postHFI.zip")
#E15post<-extract.lib.from.zip("E15postHFI.zip", taglength=10,
#                              min.ditag.length=20, max.ditag.length=24)
#E15post
#library(sagenhaft)
#file.copy(system.file("extdata", "E15postHFI.zip",package="sagenhaft"),
#          "E15postHFI.zip")
#E15post<-extract.lib.from.zip("E15postHFI.zip", taglength=10,
#                              min.ditag.length=20, max.ditag.length=24)
#E15post

Class sage.library

Description

The SAGE library class contains all the data and annotation for a SAGE library. It can contain two data.frames.

Usage

read.sage.library(file)
write.sage.library(x, file=paste(x$libname, "sage", sep="."),
                   what="complete") 
read.sage.library(file)
write.sage.library(x, file=paste(x$libname, "sage", sep="."),
                   what="complete")

Arguments

`x`	A sage library object
`file`	File name to read or write to
`what`	"complete", read complete librarary tags and sequences; "tags", read only tags and counts

Details

SAGE library objects consists of one or two data.frames. The data.frame "tags" contains all the unique tags in the library and its counts. The data.frame "seqs" contains all the individual tag sequences and associated quality values. read.sage.library and write.sage.library are utility functions to read and write SAGE libraries.

Author(s)

Tim Beissbarth

References

http://tagcalling.mbgproject.org

Examples

library(sagenhaft)
E15postHFI <- read.sage.library(system.file("extdata", "E15postHFI.sage",
                                package="sagenhaft")) 
E15postHFI
library(sagenhaft)
E15postHFI <- read.sage.library(system.file("extdata", "E15postHFI.sage",
                                package="sagenhaft")) 
E15postHFI

Class sage.library.comparison

Description

Class for storing the data of a pairwise comparison between two SAGE libraries.

Usage

read.sage.library.comparison(file)
write.sage.library.comparison(x, file=paste(x$name, "sagecomp", sep="."))
compare.lib.pair(lib1, lib2) 
read.sage.library.comparison(file)
write.sage.library.comparison(x, file=paste(x$name, "sagecomp", sep="."))
compare.lib.pair(lib1, lib2)

Arguments

`x`, `lib1`, `lib2`	A sage library object
`file`	File name to read or write to

Details

SAGE library comparison objects consists of one data.frames. It stores a A and an M value which are the log2 average expression and log2 ratio, respectively. It also has a column for the resulting p.values from sage.test. read.sage.library.comparison and write.sage.library.comparison are utility functions to read and write SAGE library comparisons. compare.lib.pair can be used to generate SAGE library comparisons.

Author(s)

Tim Beissbarth

References

http://tagcalling.mbgproject.org

Examples

library(sagenhaft)
B6Hypo <- read.sage.library(system.file("extdata", "B6HypothalHFI.sage",
                            package="sagenhaft")) 
E15post <- read.sage.library(system.file("extdata","E15postHFI.sage",
                             package="sagenhaft")) 
libcomp <- compare.lib.pair(B6Hypo, E15post)
plot(libcomp)
libcomp
library(sagenhaft)
B6Hypo <- read.sage.library(system.file("extdata", "B6HypothalHFI.sage",
                            package="sagenhaft")) 
E15post <- read.sage.library(system.file("extdata","E15postHFI.sage",
                             package="sagenhaft")) 
libcomp <- compare.lib.pair(B6Hypo, E15post)
plot(libcomp)
libcomp

Compare Two SAGE Libraries

Description

Compute p-values for differential expression for each tag between two SAGE libraries.

Usage

sage.test(x, y, n1=sum(x), n2=sum(y))
sage.test(x, y, n1=sum(x), n2=sum(y))

Arguments

`x`	integer vector giving counts in first library. Non-integer values are rounded to the nearest integer.
`y`	integer vector giving counts in second library. Non-integer values are rounded to the nearest integer.
`n1`	total number of tags in first library. Non-integer values are rounded to the nearest integer.
`n2`	total number of tags in second library. Non-integer values are rounded to the nearest integer.

Details

This function uses a binomial approximation to the Fisher Exact test for each tag. The approximation is accurate when n1 and n2 are large and x and y are small in comparison.

Value

Numeric vector of p-values.

Author(s)

Gordon Smyth

Examples

library(sagenhaft)
sage.test(c(0,5,10),c(0,30,50),n1=10000,n2=15000)
#  Exact equivalents
fisher.test(matrix(c(0,0,10000-0,15000-0),2,2))$p.value
fisher.test(matrix(c(5,30,10000-5,15000-30),2,2))$p.value
fisher.test(matrix(c(10,50,10000-10,15000-50),2,2))$p.value
library(sagenhaft)
sage.test(c(0,5,10),c(0,30,50),n1=10000,n2=15000)
#  Exact equivalents
fisher.test(matrix(c(0,0,10000-0,15000-0),2,2))$p.value
fisher.test(matrix(c(5,30,10000-5,15000-30),2,2))$p.value
fisher.test(matrix(c(10,50,10000-10,15000-50),2,2))$p.value

Utilities

Description

Different utilities to use with SAGE data.

Usage

tagnum2tagmatrix(tags, length)
tagmatrix2tagnum(tags, length=ncol(tags))
tagnum2tagsequence(tags, length)
tagsequence2tagnum(tags, length)
revcomp(seq)
tagnum2tagmatrix(tags, length)
tagmatrix2tagnum(tags, length=ncol(tags))
tagnum2tagsequence(tags, length)
tagsequence2tagnum(tags, length)
revcomp(seq)

Arguments

`tags`	integer or character vector giving SAGE tags.
`length`	Length of SAGE tags.
`seq`	Character vector or list of sequences.
`...`	SAGE library objects.

Details

These functions are utility functions used in SAGE tag extraction, e.g. to convert SAGE tag sequences to numeric values, i.e. base 4 for efficient storage and handling, and to reverse complement sequences.

Author(s)

Tim Beissbarth

Examples

library(sagenhaft)
tags <- c("aaa", "ttt", "ccc")
tagsnumeric <- tagsequence2tagnum(tags, 3)
tagsmatrix <- tagnum2tagmatrix(tagsnumeric, 3)
tags <- tagnum2tagsequence(tagmatrix2tagnum(tagsmatrix, 3), 3)
revcomp(tags)
library(sagenhaft)
tags <- c("aaa", "ttt", "ccc")
tagsnumeric <- tagsequence2tagnum(tags, 3)
tagsmatrix <- tagnum2tagmatrix(tagsnumeric, 3)
tags <- tagnum2tagsequence(tagmatrix2tagnum(tagsmatrix, 3), 3)
revcomp(tags)

Simulate SAGE libraries

Description

Function to simulate SAGE libraries with sequencing errors.

Usage

sagelibrary.simulate(taglength = 4, lambda = 1000, mean.error = 0.01,
                 error.sd = 1, withintagerror.sd = 0.2,
                 ngenes = min(4^taglength, 1e+05), base.lib = NULL,
                 libseed = -1, ...)
sagelibrary.simulate(taglength = 4, lambda = 1000, mean.error = 0.01,
                 error.sd = 1, withintagerror.sd = 0.2,
                 ngenes = min(4^taglength, 1e+05), base.lib = NULL,
                 libseed = -1, ...)

Arguments

`taglength`	Tag length for library.
`lambda`	Aproximate size of library.
`mean.error`	Mean amount of sequencing errors.
`error.sd`	Standard deviation for sequencing errors.
`withintagerror.sd`	Standard deviation for sequencing errors within tags.
`ngenes`	Number of genes to generate tags from.
`base.lib`	Simulate library based on tags in other lib and create variations.
`libseed`	Seed for random number generator.
`...`	Arguments passed to em.estimate.

Details

We set the number of possible transcripts and assign a random SAGE tag to each of them out of all 4\^taglength possible SAGE tags. For each SAGE tag a random proportion p within the library is generated from a log-normal distribution, and the proportions are then adjusted to have a sum of 1. The true counts of a tag are simulated by sampling from Poisson distributions with parameters p lambda, where p is the proportion of the tag in the library and lambda is a parameter for setting the size of the library. The simulation of the sequencing errors is done on each individual occurrence of a tag sequence. For each tag sequence a mean sequencing quality value is generated from a log-normal distribution. The individual quality values for each base are then generated from log-normal distributions with means equal to the simulated sequencing quality values for the tag sequences. We have noticed that with experimentally generated data the within tag sequence variation of sequencing quality values is usually about 1/5 of the between tag sequence variation. From each true tag sequence one observed tag sequence is generated using the simulated quality values of the true sequence as the multinomial probabilities, i.e. replacing each base with either one of the 3 other bases with the probability specified by the sequencing quality value of that base. The counts of these generated tags are then summed to represent the observed tags. When generating several simulated libraries for comparisons, we use the same proportions of the genes for all libraries, replacing up to 1/3 of the proportions by proportions with a known differential factor.

Author(s)

Tim Beissbarth

References

http://tagcalling.mbgproject.org

Examples

library(sagenhaft)
testlib1 <- sagelibrary.simulate(taglength=10, lambda=10000,
                             mean.error=0.01)
testlib2 <- sagelibrary.simulate(taglength=10, lambda=20000,
                             mean.error=0.02, base.lib=testlib1)
testlib3 <- sagelibrary.simulate(taglength=10, lambda=10000,
                             mean.error=0.01, libseed=testlib1$seed)
library(sagenhaft)
testlib1 <- sagelibrary.simulate(taglength=10, lambda=10000,
                             mean.error=0.01)
testlib2 <- sagelibrary.simulate(taglength=10, lambda=20000,
                             mean.error=0.02, base.lib=testlib1)
testlib3 <- sagelibrary.simulate(taglength=10, lambda=10000,
                             mean.error=0.01, libseed=testlib1$seed)

Package 'sagenhaft'

Help Index

Estimate sequencing errors and compute corrected counts

Description

Usage

Arguments

Author(s)

References

See Also

Examples

Functions for SAGE library extraction

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Class sage.library

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Class sage.library.comparison

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Compare Two SAGE Libraries

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Utilities

Description

Usage

Arguments

Details

Author(s)

Examples

Simulate SAGE libraries

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples