Title: | Functions for Multinomial Occupancy Distribution |
---|---|
Description: | Statistical tools for building random mutagenesis libraries for prokaryotes. The package has functions for handling the occupancy distribution for a multinomial and for estimating the number of essential genes in random transposon mutagenesis libraries. |
Authors: | Oliver Will <[email protected]> |
Maintainer: | Oliver Will <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.67.0 |
Built: | 2024-10-30 09:04:30 UTC |
Source: | https://github.com/bioc/occugene |
Returns the histogram breakpoints for fast insertion.
binHist(orf,overlap=NULL,bp=6264403)
binHist(orf,overlap=NULL,bp=6264403)
orf |
2-column matrix of annotation |
overlap |
number position of overlap |
bp |
number of base pairs in genome |
Returns a vector of breakpoints for the binInsertHist function.
end.pt |
Position of last target |
orf |
orfID |
overlap |
Number of targets in overlap |
Oliver Will [email protected]
See the book chapter O. Will (**) in **.
binInsertHist
# **
# **
Returns the number of ORF knockouts.
binInsert(insert,orf,returnCounts=FALSE,overlap=NULL,DEBUG=FALSE)
binInsert(insert,orf,returnCounts=FALSE,overlap=NULL,DEBUG=FALSE)
insert |
List of insertion locations |
orf |
2-column matrix of annotation |
returnCounts |
Return the number of insertions |
overlap |
Number of shared targets |
DEBUG |
Flag to debug the code |
Finds the number of ORFs that have an insertion given a list of locations. If the returnCounts flag is true, the function returns the number of insertions per ORF. Uses the function hist for gains in speed.
Returns a numeric or an object
Oliver Will [email protected]
See the book chapter O. Will (**) in **.
# **
# **
Given a list of locations, returns the number of ORFs hit.
binInsertHist(insert,orfHist,returnCounts=FALSE)
binInsertHist(insert,orfHist,returnCounts=FALSE)
insert |
List of insertion locations |
orfHist |
Histogram breakpoints |
returnCounts |
Return the number of insertions |
Finds the number of ORFs that have an insertion given a list of locations. If the returnCounts flag is true, the function returns the number of insertions per ORF. Uses the function hist for gains in speed.
Returns a numeric or an object
Oliver Will [email protected]
See the book chapter O. Will (**) in **
binHist
# **
# **
Checks the format of the annotation and insertions.
checkFormat(anno,clone)
checkFormat(anno,clone)
anno |
2-column matrix of annotation |
clone |
vector |
Checks the format of the annotation and insertions list. Annotation has to be a matrix of the first and last target in the ORF. Insertions has to be a vector. Will stop if not correct format.
Returns a boolean.
Oliver Will [email protected]
See the book chapter O. Will (**) in **
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position if (checkFormat(anno,clone)) {print("Looks good.");}
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position if (checkFormat(anno,clone)) {print("Looks good.");}
Point estimate for the number of new ORF knockouts in the next d clones.
delta0(d,anno,clone)
delta0(d,anno,clone)
d |
Number of clones to be made |
anno |
2-column matrix of annotation |
clone |
Vector of insertions |
Use the parametric form of the cumulative occupancy distribution to estimate the number of new ORF knockouts in the next d clones.
A numeric
Oliver Will [email protected]
See the book chapter O. Will (**) in **
unbiasDelta0
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position delta0(10,anno,clone)
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position delta0(10,anno,clone)
Returns the expected value of the occupancy distribution based on a multinomial distribution.
eMult(n, p, iter=NULL, seed=NULL, experimental=NULL)
eMult(n, p, iter=NULL, seed=NULL, experimental=NULL)
n |
number of attempts in the multinomial distribution |
p |
probabilities for landing in a specific bin |
iter |
number of iterations used in the Monte-Carlo approximation |
seed |
seed for the random number generator |
experimental |
access to other functions of multinomials |
This functions computes the expected value of the occupancy distribution for a multinomial. In other words, the expected number of bins with at least one ball. The experimental argument "oneBall" computes expected number of bins with exactly one ball and the experimental argument "nextTo" computes the expected number of bins with one ball next to a bin with zero balls. Consider any functionality through the experimental argument untested.
Returns a numeric
Oliver Will [email protected]
See the book chapter O. Will (**) in ** for specific details about this package or Johnson, N. L. and Kotz, S. (1977) Urn Models and Their Application: An Approach to Modern Discrete Probability Theory. John Wiley & Sons, New York, NY.
n <- 20 p <- c(seq(10,1,-1),47)/100 p <- p/sum(p) eMult(n,p) eMult(n,p,iter=1000,seed=4)
n <- 20 p <- c(seq(10,1,-1),47)/100 p <- p/sum(p) eMult(n,p) eMult(n,p,iter=1000,seed=4)
Estimates the number of new knockouts in next d clones.
etDelta(d,anno,clone)
etDelta(d,anno,clone)
d |
number of new clones |
anno |
2-column matrix of annotation |
clone |
vector |
Estimates the number of new ORF knockouts in the next d clones using the method outlined by Efron and Thisted.
expected |
Expected value |
variance |
Variance |
Oliver Will [email protected]
See the book chapter O. Will (**) in ** and also Efron, B. and Thisted, R. (1976) Estimating the number of unseen species: How many words did Shakespere know? Biometrika. 63, 435-447.
data(sampleAnnotation) data(sampleInsertions) a.data <- sampleAnnotation experiment <- sampleInsertions orf <- cbind(a.data$first,a.data$last) clone <- experiment$position etDelta(10,orf,clone)
data(sampleAnnotation) data(sampleInsertions) a.data <- sampleAnnotation experiment <- sampleInsertions orf <- cbind(a.data$first,a.data$last) clone <- experiment$position etDelta(10,orf,clone)
Returns values for parameterized cumulative occupancy distributions.
fCumul(x,b0,b1,b2)
fCumul(x,b0,b1,b2)
x |
Point to evaluate |
b0 |
Parameter b0 |
b1 |
Parameter b1 |
b2 |
Parameter b2 |
Function fitted to the cumulative occupancy distribution for a multinomial distribution. Exponential model := b0-b1*exp(-b2*x).
Returns a numeric
Oliver Will [email protected]
See the book chapter O. Will (**) in **
x <- 2 b0 <- 3 b1 <- 3 b2 <- 0.01 val <- fCumul(x,b0,b1,b2)
x <- 2 b0 <- 3 b1 <- 3 b2 <- 0.01 val <- fCumul(x,b0,b1,b2)
Parameterizes the cumulative occupancy distribution.
fFit(anno,clone,TR=TRUE,b0=0,b1=0,b2=.0)
fFit(anno,clone,TR=TRUE,b0=0,b1=0,b2=.0)
anno |
2-column matrix of annotation |
clone |
vector |
TR |
Report a trace |
b0 |
Starting value b0 |
b1 |
Starting value b1 |
b2 |
Starting value b2 |
Fits various parametric functions to the occupancy distribution for a multinomial. Using the starting values of b0=0, b1=0, and b2=0 forces the function to find starting values for you.
Returns a object.
Oliver Will [email protected]
See the book chapter O. Will (**) in **
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position TR <- TRUE fm <- fFit(anno,clone,TR)
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position TR <- TRUE fm <- fFit(anno,clone,TR)
Loads and checks an annotation file.
loadAnnotation(fileName)
loadAnnotation(fileName)
fileName |
Name of file |
Annotation file need four columns: idNum, first, last, and overlap.
Returns a data frame
Oliver Will [email protected]
See the book chapter O. Will (**) in **
# No self contained example
# No self contained example
Loads a list of insertion locations.
loadInsertions(fileName)
loadInsertions(fileName)
fileName |
Name of the file |
Loads a list of insertion locations created in a transposon mutagenesis library.
Returns a data frame
Oliver Will [email protected]
See the book chapter O. Will (**) in **
# No self contained example
# No self contained example
Convert the annotation and insertion formation of the occupancy package into the format for the negenes package.
occup2Negenes(anno,clone,INTERGENIC=FALSE)
occup2Negenes(anno,clone,INTERGENIC=FALSE)
anno |
2-column matrix of annotation |
clone |
vector of insertion locations |
INTERGENIC |
Process the intergenic region as last ORF. |
Convert the annotation and insertion formation of the occupancy package into the format for the negenes package. Of the returned data frame, column 1 is n.sites, column 2, n.sites2, column 3, counts, column 4, counts2.
Returns a data frame
Oliver Will [email protected]
See the book chapter O. Will (**) in **
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position occup2Negenes(anno,clone)
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position occup2Negenes(anno,clone)
This dataset has the annotation for a hypothetical bacterium.
data(sampleAnnotation)
data(sampleAnnotation)
A data frame containing 4 columns with 10 rows.
Oliver Will [email protected]
Randomly generated.
See the book chapter O. Will (**) in **
Insertion locations for a simple random mutagenesis library example.
data(sampleInsertions)
data(sampleInsertions)
A data frame containing 1 column with 20 rows.
Oliver Will [email protected]
Randomly generated.
See the book chapter O. Will (**) in **
Unbiased point estimate and confidence intervals for the number of non-essential ORFs.
unbiasB0(anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)
unbiasB0(anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)
anno |
2-column matrix of annotation |
clone |
Vector of insertions |
iter |
Number of iterations for the bootstrap |
seed |
Seed for the random number generator |
alpha |
Type I error |
TR |
Report a trace |
Fits a parametric function to the cumulative occupancy distribution. Uses a parametric bootstrap to correct for bias and find confidence intervals for the number of non-essential ORFs.
b0 |
Unbiased point estimate |
CI |
Confidence interval at the alpha specified |
Oliver Will [email protected]
See the book chapter O. Will (**) in **
fFit
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position TR <- TRUE iter <- 10 seed <- 4 unbiasB0(anno,clone,iter,seed,TR=TR)
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position TR <- TRUE iter <- 10 seed <- 4 unbiasB0(anno,clone,iter,seed,TR=TR)
Unbiased point estimate and confidence intervals for the number of new ORF knockouts in the next d clones.
unbiasDelta0(d,anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)
unbiasDelta0(d,anno,clone,iter=1000,seed=NULL,alpha=0.05,TR=TRUE)
d |
Number of new clones |
anno |
2-column matrix of annotation |
clone |
Vector of insertions |
iter |
Number of iterations for the bootstrap |
seed |
Seed for the random number generator |
alpha |
Type I error |
TR |
Report a trace |
Fits a parametric function to the cumulative occupancy distribution. Uses a parametric bootstrap to correct for bias and find confidence intervals for the number of new ORF knockouts in the next d clones.
delta0 |
Unbiased point estimate |
CI |
Confidence interval at the alpha specified |
Oliver Will [email protected]
See the book chapter O. Will (**) in **
delta0
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position TR <- TRUE iter <- 10 seed <- 4 unbiasDelta0(10,anno,clone,iter,seed,TR=TR)
data(sampleAnnotation) data(sampleInsertions) anno <- cbind(sampleAnnotation$first,sampleAnnotation$last) clone <- sampleInsertions$position TR <- TRUE iter <- 10 seed <- 4 unbiasDelta0(10,anno,clone,iter,seed,TR=TR)
Returns the variance of the occupancy distribution based on a multinomial distribution.
varMult(n, p, iter=NULL, seed=NULL, experimental=NULL)
varMult(n, p, iter=NULL, seed=NULL, experimental=NULL)
n |
number of attempts in the multinomial distribution |
p |
probabilities for landing in a specific bin |
iter |
number of iterations used in the Monte-Carlo approximation |
seed |
seed for the random number generator |
experimental |
access to other functions of multinomials |
This functions computes the variance of the occupancy distribution for a multinomial. In other words, the expected number of bins with at least one ball. The experimental argument "oneBall" computes variance of bins with exactly one ball and the experimental argument "nextTo" computes the variance of bins with one ball next to a bin with zero balls. Consider any functionality through the experimental argument untested.
Returns a numeric
Oliver Will [email protected]
See the book chapter O. Will (**) in ** for specific details about this package or Johnson, N. L. and Kotz, S. (1977) Urn Models and Their Application: An Approach to Modern Discrete Probability Theory. John Wiley & Sons, New York, NY.
n <- 20 p <- c(seq(10,1,-1),47)/100 p <- p/sum(p) varMult(n,p) varMult(n,p,iter=1000,seed=4)
n <- 20 p <- c(seq(10,1,-1),47)/100 p <- p/sum(p) varMult(n,p) varMult(n,p,iter=1000,seed=4)