Package 'occugene' reference manual

Title:	Functions for Multinomial Occupancy Distribution
Description:	Statistical tools for building random mutagenesis libraries for prokaryotes. The package has functions for handling the occupancy distribution for a multinomial and for estimating the number of essential genes in random transposon mutagenesis libraries.
Authors:	Oliver Will <[email protected]>
Maintainer:	Oliver Will <[email protected]>
License:	GPL (>= 2)
Version:	1.67.0
Built:	2025-03-29 04:30:46 UTC
Source:	https://github.com/bioc/occugene

Histogram Breakpoints

Description

Returns the histogram breakpoints for fast insertion.

Usage

binHist(orf,overlap=NULL,bp=6264403)
binHist(orf,overlap=NULL,bp=6264403)

Arguments

`orf`	2-column matrix of annotation
`overlap`	number position of overlap
`bp`	number of base pairs in genome

Details

Returns a vector of breakpoints for the binInsertHist function.

Value

`end.pt`	Position of last target
`orf`	orfID
`overlap`	Number of targets in overlap

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **.

Examples

# **
# **

Insert Locations

Description

Returns the number of ORF knockouts.

Usage

binInsert(insert,orf,returnCounts=FALSE,overlap=NULL,DEBUG=FALSE)
binInsert(insert,orf,returnCounts=FALSE,overlap=NULL,DEBUG=FALSE)

Arguments

`insert`	List of insertion locations
`orf`	2-column matrix of annotation
`returnCounts`	Return the number of insertions
`overlap`	Number of shared targets
`DEBUG`	Flag to debug the code

Details

Finds the number of ORFs that have an insertion given a list of locations. If the returnCounts flag is true, the function returns the number of insertions per ORF. Uses the function hist for gains in speed.

Value

Returns a numeric or an object

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **.

Examples

# **
# **

Insert Locations Quickly

Description

Given a list of locations, returns the number of ORFs hit.

Usage

binInsertHist(insert,orfHist,returnCounts=FALSE)
binInsertHist(insert,orfHist,returnCounts=FALSE)

Arguments

`insert`	List of insertion locations
`orfHist`	Histogram breakpoints
`returnCounts`	Return the number of insertions

Details

Value

Returns a numeric or an object

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

# **
# **

Checks the Format of Annotation and Insertions

Description

Checks the format of the annotation and insertions.

Usage

checkFormat(anno,clone)
checkFormat(anno,clone)

Arguments

`anno`	2-column matrix of annotation
`clone`	vector

Details

Checks the format of the annotation and insertions list. Annotation has to be a matrix of the first and last target in the ORF. Insertions has to be a vector. Will stop if not correct format.

Value

Returns a boolean.

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
if (checkFormat(anno,clone)) {print("Looks good.");}
data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
if (checkFormat(anno,clone)) {print("Looks good.");}

Number of New Knockouts

Description

Point estimate for the number of new ORF knockouts in the next d clones.

Usage

delta0(d,anno,clone)
delta0(d,anno,clone)

Arguments

`d`	Number of clones to be made
`anno`	2-column matrix of annotation
`clone`	Vector of insertions

Details

Use the parametric form of the cumulative occupancy distribution to estimate the number of new ORF knockouts in the next d clones.

Value

A numeric

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
delta0(10,anno,clone)
data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
delta0(10,anno,clone)

Expected Value of the Occupancy Distribution

Description

Returns the expected value of the occupancy distribution based on a multinomial distribution.

Usage

eMult(n, p, iter=NULL, seed=NULL, experimental=NULL)
eMult(n, p, iter=NULL, seed=NULL, experimental=NULL)

Arguments

`n`	number of attempts in the multinomial distribution
`p`	probabilities for landing in a specific bin
`iter`	number of iterations used in the Monte-Carlo approximation
`seed`	seed for the random number generator
`experimental`	access to other functions of multinomials

Details

This functions computes the expected value of the occupancy distribution for a multinomial. In other words, the expected number of bins with at least one ball. The experimental argument "oneBall" computes expected number of bins with exactly one ball and the experimental argument "nextTo" computes the expected number of bins with one ball next to a bin with zero balls. Consider any functionality through the experimental argument untested.

Value

Returns a numeric

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in ** for specific details about this package or Johnson, N. L. and Kotz, S. (1977) Urn Models and Their Application: An Approach to Modern Discrete Probability Theory. John Wiley & Sons, New York, NY.

Examples

n <- 20
p <- c(seq(10,1,-1),47)/100
p <- p/sum(p)
eMult(n,p)
eMult(n,p,iter=1000,seed=4)
n <- 20
p <- c(seq(10,1,-1),47)/100
p <- p/sum(p)
eMult(n,p)
eMult(n,p,iter=1000,seed=4)

Number of New ORF Knockouts

Description

Estimates the number of new knockouts in next d clones.

Usage

etDelta(d,anno,clone)
etDelta(d,anno,clone)

Arguments

`d`	number of new clones
`anno`	2-column matrix of annotation
`clone`	vector

Details

Estimates the number of new ORF knockouts in the next d clones using the method outlined by Efron and Thisted.

Value

`expected`	Expected value
`variance`	Variance

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in ** and also Efron, B. and Thisted, R. (1976) Estimating the number of unseen species: How many words did Shakespere know? Biometrika. 63, 435-447.

Examples

data(sampleAnnotation)
data(sampleInsertions)
a.data <- sampleAnnotation
experiment <- sampleInsertions
orf <- cbind(a.data$first,a.data$last)
clone <- experiment$position
etDelta(10,orf,clone)
data(sampleAnnotation)
data(sampleInsertions)
a.data <- sampleAnnotation
experiment <- sampleInsertions
orf <- cbind(a.data$first,a.data$last)
clone <- experiment$position
etDelta(10,orf,clone)

Parametric Function for the Cumulative Occupancy Distribution

Description

Returns values for parameterized cumulative occupancy distributions.

Usage

fCumul(x,b0,b1,b2)
fCumul(x,b0,b1,b2)

Arguments

`x`	Point to evaluate
`b0`	Parameter b0
`b1`	Parameter b1
`b2`	Parameter b2

Details

Function fitted to the cumulative occupancy distribution for a multinomial distribution. Exponential model := b0-b1*exp(-b2*x).

Value

Returns a numeric

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

x <- 2
b0 <- 3
b1 <- 3
b2 <- 0.01
val <- fCumul(x,b0,b1,b2)
x <- 2
b0 <- 3
b1 <- 3
b2 <- 0.01
val <- fCumul(x,b0,b1,b2)

Parametric Fit for the Cumulative Occupancy Distribution

Description

Parameterizes the cumulative occupancy distribution.

Usage

fFit(anno,clone,TR=TRUE,b0=0,b1=0,b2=.0)
fFit(anno,clone,TR=TRUE,b0=0,b1=0,b2=.0)

Arguments

`anno`	2-column matrix of annotation
`clone`	vector
`TR`	Report a trace
`b0`	Starting value b0
`b1`	Starting value b1
`b2`	Starting value b2

Details

Fits various parametric functions to the occupancy distribution for a multinomial. Using the starting values of b0=0, b1=0, and b2=0 forces the function to find starting values for you.

Value

Returns a object.

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
TR <- TRUE
fm <- fFit(anno,clone,TR)
data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
TR <- TRUE
fm <- fFit(anno,clone,TR)

Loads Annotation File

Description

Loads and checks an annotation file.

Usage

loadAnnotation(fileName)
loadAnnotation(fileName)

Arguments

fileName

Name of file

Details

Annotation file need four columns: idNum, first, last, and overlap.

Value

Returns a data frame

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

# No self contained example
# No self contained example

Load Genome Annotation File

Description

Loads a list of insertion locations.

Usage

loadInsertions(fileName)
loadInsertions(fileName)

Arguments

fileName

Name of the file

Details

Loads a list of insertion locations created in a transposon mutagenesis library.

Value

Returns a data frame

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

# No self contained example
# No self contained example

Convert Occupancy Format to Negenes

Description

Convert the annotation and insertion formation of the occupancy package into the format for the negenes package.

Usage

occup2Negenes(anno,clone,INTERGENIC=FALSE)
occup2Negenes(anno,clone,INTERGENIC=FALSE)

Arguments

`anno`	2-column matrix of annotation
`clone`	vector of insertion locations
`INTERGENIC`	Process the intergenic region as last ORF.

Details

Convert the annotation and insertion formation of the occupancy package into the format for the negenes package. Of the returned data frame, column 1 is n.sites, column 2, n.sites2, column 3, counts, column 4, counts2.

Value

Returns a data frame

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
occup2Negenes(anno,clone)
data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
occup2Negenes(anno,clone)

Annotation for a Hypothetical Prokayote

Description

This dataset has the annotation for a hypothetical bacterium.

Usage

data(sampleAnnotation)
data(sampleAnnotation)

Format

A data frame containing 4 columns with 10 rows.

Author(s)

Oliver Will [email protected]

Source

Randomly generated.

References

See the book chapter O. Will (**) in **

Insertions for a Hypothetical Clonal Library

Description

Insertion locations for a simple random mutagenesis library example.

Usage

data(sampleInsertions)
data(sampleInsertions)

Format

A data frame containing 1 column with 20 rows.

Author(s)

Oliver Will [email protected]

Source

Randomly generated.

References

See the book chapter O. Will (**) in **

Unbiased Estimator of the Number of Non-essential ORFs

Description

Unbiased point estimate and confidence intervals for the number of non-essential ORFs.

Usage

unbiasB0(anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)
unbiasB0(anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)

Arguments

`anno`	2-column matrix of annotation
`clone`	Vector of insertions
`iter`	Number of iterations for the bootstrap
`seed`	Seed for the random number generator
`alpha`	Type I error
`TR`	Report a trace

Details

Fits a parametric function to the cumulative occupancy distribution. Uses a parametric bootstrap to correct for bias and find confidence intervals for the number of non-essential ORFs.

Value

`b0`	Unbiased point estimate
`CI`	Confidence interval at the alpha specified

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
TR <- TRUE
iter <- 10
seed <- 4
unbiasB0(anno,clone,iter,seed,TR=TR)
data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
TR <- TRUE
iter <- 10
seed <- 4
unbiasB0(anno,clone,iter,seed,TR=TR)

Unbiased Number of New Knockouts

Description

Unbiased point estimate and confidence intervals for the number of new ORF knockouts in the next d clones.

Usage

unbiasDelta0(d,anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)
unbiasDelta0(d,anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)

Arguments

`d`	Number of new clones
`anno`	2-column matrix of annotation
`clone`	Vector of insertions
`iter`	Number of iterations for the bootstrap
`seed`	Seed for the random number generator
`alpha`	Type I error
`TR`	Report a trace

Details

Fits a parametric function to the cumulative occupancy distribution. Uses a parametric bootstrap to correct for bias and find confidence intervals for the number of new ORF knockouts in the next d clones.

Value

`delta0`	Unbiased point estimate
`CI`	Confidence interval at the alpha specified

Author(s)

Oliver Will [email protected]

References

See the book chapter O. Will (**) in **

Examples

data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
TR <- TRUE
iter <- 10
seed <- 4
unbiasDelta0(10,anno,clone,iter,seed,TR=TR)
data(sampleAnnotation)
data(sampleInsertions)
anno <- cbind(sampleAnnotation$first,sampleAnnotation$last)
clone <- sampleInsertions$position
TR <- TRUE
iter <- 10
seed <- 4
unbiasDelta0(10,anno,clone,iter,seed,TR=TR)

Variance of the Occupancy Distribution

Description

Returns the variance of the occupancy distribution based on a multinomial distribution.

Usage

varMult(n, p, iter=NULL, seed=NULL, experimental=NULL)
varMult(n, p, iter=NULL, seed=NULL, experimental=NULL)

Arguments

`n`	number of attempts in the multinomial distribution
`p`	probabilities for landing in a specific bin
`iter`	number of iterations used in the Monte-Carlo approximation
`seed`	seed for the random number generator
`experimental`	access to other functions of multinomials

Details

This functions computes the variance of the occupancy distribution for a multinomial. In other words, the expected number of bins with at least one ball. The experimental argument "oneBall" computes variance of bins with exactly one ball and the experimental argument "nextTo" computes the variance of bins with one ball next to a bin with zero balls. Consider any functionality through the experimental argument untested.

Value

Returns a numeric

Author(s)

Oliver Will [email protected]

References

Examples

n <- 20
p <- c(seq(10,1,-1),47)/100
p <- p/sum(p)
varMult(n,p)
varMult(n,p,iter=1000,seed=4)
n <- 20
p <- c(seq(10,1,-1),47)/100
p <- p/sum(p)
varMult(n,p)
varMult(n,p,iter=1000,seed=4)

Package 'occugene'

Help Index

Histogram Breakpoints

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Insert Locations

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Insert Locations Quickly

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Checks the Format of Annotation and Insertions

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Number of New Knockouts

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Expected Value of the Occupancy Distribution

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Number of New ORF Knockouts

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Parametric Function for the Cumulative Occupancy Distribution

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Parametric Fit for the Cumulative Occupancy Distribution

Description

Usage