Title: | NPMLE for Censored and Truncated Data |
---|---|
Description: | Many functions for computing the NPMLE for censored and truncated data. |
Authors: | R. Gentleman and Alain Vandal |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.79.0 |
Built: | 2024-11-27 04:39:16 UTC |
Source: | https://github.com/bioc/Icens |
Most of the optimizations in Icens have a one dimensional root-finding component. Since the quantities involved are generally restricted to a subset of [0,1] we use bisection to find the roots.
Bisect(tA, pvec, ndir, Meps, tolbis=1e-07)
Bisect(tA, pvec, ndir, Meps, tolbis=1e-07)
tA |
The transpose of the clique matrix. |
pvec |
The current estimate of the probability vector. |
ndir |
The direction to explore. |
Meps |
Machine epsilon, elements of |
tolbis |
The tolerance used to determine if the algorithm has converged. |
We search from pvec
in the direction ndir
to obtain the
new value of pvec
that maximizes the likelihood.
The new estimate of pvec
.
Alain Vandal and Robert Gentleman.
Any book on optimization.
The maximal cliques of the intersection graph are obtained by first finding the cliques for the marginal data and then combining them using the algorithm in Gentleman and Vandal (1999).
BVcliques(intvlx, intvly, Lxopen=TRUE, Rxopen=FALSE, Lyopen=TRUE, Ryopen=FALSE )
BVcliques(intvlx, intvly, Lxopen=TRUE, Rxopen=FALSE, Lyopen=TRUE, Ryopen=FALSE )
intvlx |
The cliques for one marginal component, alternatively the marginal intervals can be supplied. |
intvly |
The cliques for the other marginal component, alternatively the marginal intervals can be supplied. |
Lxopen |
Boolean indicating whether the left end point in the x coordinate is open. |
Rxopen |
Boolean indicating whether the right end point in the x coordinate is open. |
Lyopen |
Boolean indicating whether the left end point in the y coordinate is open. |
Ryopen |
Boolean indicating whether the right end point in the y coordinate is open. |
A list of the maximal cliques of the intersection graph of the data.
A. Vandal and R. Gentleman
Graph–Theoretical Aspects of Bivariate Censored Data, R. Gentleman and A. Vandal, 1999, submitted.
data(cmv) cmv.cl <- BVcliques(cmv[,1:2], cmv[,3:4], Lxopen=FALSE, Lyopen=FALSE )
data(cmv) cmv.cl <- BVcliques(cmv[,1:2], cmv[,3:4], Lxopen=FALSE, Lyopen=FALSE )
Given the clique list, obtained from BVcliques
, the
clique matrix is obtained. This is the m (number of cliques) by n
(number of observations) matrix. A[i,j] is one if individual j is in
maximal clique i.
BVclmat(cliques)
BVclmat(cliques)
cliques |
The clique list. |
The m by n clique matrix.
A. Vandal and R. Gentleman
Graph–Theoretical Aspects of Bivariate Censored Data, R. Gentleman and A. Vandal, 1999, submitted.
data(cmv) bcl <- BVcliques(cmv[,1:2], cmv[,3:4]) A <- BVclmat(bcl)
data(cmv) bcl <- BVcliques(cmv[,1:2], cmv[,3:4]) A <- BVclmat(bcl)
Given the regions where the events occurred and the cliques of the intersection graph the support of the cliques is computed. For each clique it is the intersection of the event time regions for all observations in that clique.
BVsupport(intvlx, intvly, cliques=BVcliques(intvlx, intvly))
BVsupport(intvlx, intvly, cliques=BVcliques(intvlx, intvly))
intvlx |
The event time intervals for one dimension. |
intvly |
The event time intervals for the other dimension. |
cliques |
The list of maximal cliques of the intersection graph, optionally. |
An m by 4 matrix containing the corners of the intervals of support for the maximal cliques of the intersection graph corresponding to the first two arguments to the function.
A. Vandal and R. Gentleman
Graph–Theoretical Aspects of Bivariate Censored Data, R. Gentleman and A. Vandal, 1999, submitted.
data(cmv) cmv.cl <- BVcliques(cmv[,1:2], cmv[,3:4]) boxes <- BVsupport(cmv[,1:2], cmv[,3:4], cmv.cl)
data(cmv) cmv.cl <- BVcliques(cmv[,1:2], cmv[,3:4]) boxes <- BVsupport(cmv[,1:2], cmv[,3:4], cmv.cl)
The cmv
data frame has 204 rows and 4 columns.
The intervals should be treated as closed at both ends to replicate
the analysis in Betensky and Finkelstein.
This data frame contains the following columns:
The left end of the CMV shedding interval.
The right end of the CMV shedding interval.
The left end of the MAC colonization interval.
The right end of the MAC colonization interval.
Betensky and Finkelstein, 1999 present data from the AIDS Clinical Trials Group protocol ACTG 181. This was a natural history substudy of a comparative trial. Patients were scheduled for clinic visits during follow–up and data was collected on the time until two events; shedding of cytomegalovirus (CMV) in the urine and blood and for colonization of mycobacterium avium complex (MAC) in the sputum or stool.
Betensky, R. A. and Finkelstein, D. M., 1999, A nonparametric maximum likelihood estimator for bivariate interval censored data, Statistics in Medicine,
data(cmv)
data(cmv)
The cosmesis
data frame has 95 rows and 3 columns.
This data frame contains the following columns:
The left end point of the cosmetic deterioration interval.
The right end point of the cosmetic deterioration interval.
The treatment indicator. It is zero for those that received radiotherapy.
A semiparametric model for regression analysis of interval-censored failure time data, D. M. Finkelstein and R. A. Wolfe, 1985, Biometrics.
data(cosmesis)
data(cosmesis)
The incidence matrix, A
is the m by n matrix that represents the
data. There are m probabilities that must be estimated. The EM, or
expectation maximization, method is applied to these data.
EM(A, pvec, maxiter=500, tol=1e-12)
EM(A, pvec, maxiter=500, tol=1e-12)
A |
The incidence matrix. |
pvec |
The probability vector. |
maxiter |
The maximum number of iterations. |
tol |
The tolerance used to judge convergence. |
Lots.
An object of class icsurv
containing the following
components:
pf |
The NPMLE of the probability vector. |
numiter |
The number of iterations used. |
converge |
A boolean indicating whether the algorithm converged. |
intmap |
If present indicates the real representation of the
support for the values in |
Alain Vandal and Robert Gentleman.
The EM algorithm applied to the maximal cliques of the intersection graph of the censored data. The empirical distribution function with arbitrarily grouped, censored and truncated data, B. W. Turnbull, 1976, JRSS;B.
data(cosmesis) csub1 <- subset(cosmesis, subset= Trt==0, select=c(L,R)) EM(csub1) data(pruitt) EM(pruitt)
data(cosmesis) csub1 <- subset(cosmesis, subset= Trt==0, select=c(L,R)) EM(csub1) data(pruitt) EM(pruitt)
An implementation of the hybrid EM ICM (Iterative convex minorant) estimator of the distribution function proposed by Wellner and Zahn (1997).
EMICM(A, EMstep=TRUE, ICMstep=TRUE, keepiter=FALSE, tol=1e-07, maxiter=1000)
EMICM(A, EMstep=TRUE, ICMstep=TRUE, keepiter=FALSE, tol=1e-07, maxiter=1000)
A |
Either the m by n clique matrix or the n by 2 matrix containing the event time intervals. |
EMstep |
Boolean, indicating whether to take an EM step in the iteration. |
ICMstep |
Boolean, indicating whether to take an ICM step. |
keepiter |
Boolean determining whether to keep the iteration states. |
tol |
The maximal L1 distance between successive estimates before stopping iteration. |
maxiter |
The maximal number of iterations to perform before stopping. |
Lots, and they're complicated too!
An object of class icsurv
containing the following
components:
pf |
The estimated probabilities. |
sigma |
The NPMLE of the survival function on the maximal antichains. |
weights |
The diagonal of the likelihood function's second derivative. |
lastchange |
A vector of differences between the last two iterations. |
numiter |
The total number of iterations performed. |
iter |
Is only present if |
intmap |
The real representation associated with the
probabilities reported in |
Alain Vandal and Robert Gentleman
A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data, J. A. Wellner and Y. Zhan, 1997, JASA.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) EMICM(csub1) data(pruitt) EMICM(pruitt)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) EMICM(csub1) data(pruitt) EMICM(pruitt)
The hiv
data frame has 257 rows and 4 columns.
This data frame contains the following columns:
The left end point of the infection time interval.
The right end point of the infection time interval.
The left end point of the disease onset interval.
The right end point of the disease onset interval.
Coded as 1 if the estimated age at infection was less than 20 and 2 if the estimated age at infection was greater than 20.
Treatment, Light or Heavy
The setting is as follows. Individuals were infected with the HIV virus
at some unknown time they subsequently develop AIDS at a second
unknown time. The data consist of two intervals,
and
, such that the infection time was in the first
interval and the time of disease onset was in the second interval.
A quantity of interest is the incubation time of the disease which is
. The authors argue persuasively that
this should be considered as bivariate interval censored data.
They note that simply forming the differences
and
analysing the
resultant data assumes an incorrect likelihood.
DeGruttola and Lagakos transform the problem slightly to study the joint
distribution of
and
.
This is equivalent to estimating the joint distribution of
and
then transforming.
The data, as reported, have been discretized into six month intervals.
We use the data as reported in Table 1 of DeGruttola and Lagakos, 1989. The patients were 257 persons with Type A or B hemophilia treated at two hospitals in France. They were then examined intermittently (as they came in for treatment?) and their HIV and AIDS status was determined. Kim, De Gruttola and Lagakos report some covariate information and their paper is concerned with the modeling of that information. In this paper we concentrate only on the event times and ignore the covariate information; that topic being worthy of separate investigation.
DeGruttola, V. and Lagakos, S.W., 1989, Analysis of doubly-censored survival data, with application to AIDS, Biometrics.
Kim, Mimi Y. and De Gruttola, Victor G. and Lagakos, Stephen W., 1993, Analyzing Doubly Censored Data With Covariates, With Application to AIDS, Biometrics.
data(hiv)
data(hiv)
Internal Icens functions
These are not to be called by the user.
An object of class icsurv
must contain the following
components:
A boolean indicating whether the iteration producing
pf
converged.
The probability vector.
It can optionally contain any of the following components:
The clique matrix used to obtain pf
.
The real representations of the support for the
components of pf
.
A matrix containing every iterative estimate of
pf
, useful for debugging.
The value of the log likelihood at pf
.
The number of iterations taken.
The cumulative sum of pf
.
Weights used in the EMICM algorithm.
Alain Vandal and Robert Gentleman.
ISDM is a method for estimating the NPMLE of censored data.
ISDM(A, pvec, maxiter=500, tol=1e-07, tolbis=1e-08, verbose=FALSE)
ISDM(A, pvec, maxiter=500, tol=1e-07, tolbis=1e-08, verbose=FALSE)
A |
The m by n incidence, or clique, matrix. Or the n by 2 matrix containing the event intervals. |
pvec |
An initial estimate of the probability vector; not required. |
maxiter |
Maximum number of iterations to be made. |
tol |
The tolerance used to determine convergence. |
tolbis |
A second tolerance used for the steps. |
verbose |
Boolean, should verbose output be printed. |
Lots of complicated stuff should go here.
A list containing:
pf |
The estimated NPMLE of the probability vector. |
numiter |
The number of iterations performed. |
Alain Vandal and Robert Gentleman
An Algorithm for Computing the Nonparametric MLE of a Mixing Distribution, Lesperance, Mary L. and Kalbfleisch, John D., JASA, 1992
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) ISDM(csub1) # data(pruitt) # ISDM(pruitt)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) ISDM(csub1) # data(pruitt) # ISDM(pruitt)
Returns a list of maximal cliques of the intersection graph of the
real valued intervals supplied in m
. These are one dimensional
intervals with one interval for each individual. The algorithm is
coded in interpreted code and should be moved to compiled code for speed.
How do we handle exact failure times?
Which algorithm is used?
Maclist(intvls, Lopen=TRUE, Ropen=FALSE)
Maclist(intvls, Lopen=TRUE, Ropen=FALSE)
intvls |
A n by 2 matrix, the first column is the left endpoints and the second column contains the right endpoints of the failure time intervals. |
Lopen |
A boolean indicating whether the intervals are open on the left. |
Ropen |
A boolean indicating whether the intervals are open on the right. |
A list of length m. Each element of the list corresponds to one
maximal antichain. The row numbers (from m
) identify the
individuals and all row numbers for the individuals in the maximal
clique. Maximal cliques occur in their natural (left to right) order.
Alain Vandal and Robert Gentleman
Computational Methods for Censored Data using Intersection Graphs, R. Gentleman and A. Vandal, JCGS, 2000.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) ml1 <- Maclist(csub1)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) ml1 <- Maclist(csub1)
Returns the Petrie matrix and Petrie pairs of an interval order given its
list of maximal antichains. These can be obtained from
Maclist
.
Macmat(ml)
Macmat(ml)
ml |
A list containing the maximal cliques of the intersection graph of the data. |
Not worth mentioning?
A list containing two components.
pmat |
The Petrie or clique matrix of the underlying interval order. |
ppairs |
The Petrie pairs for each observation. These indicate the first and last maximal clique occupied by the observation. |
Alain Vandal and Robert Gentleman
Computational Methods for Censored Data using Intersection Graphs, R. Gentleman and A. Vandal, JCGS, 2000.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) ml1 <- Maclist(csub1) mm1 <- Macmat(ml1)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) ml1 <- Maclist(csub1) mm1 <- Macmat(ml1)
The intervals on the real line that corresponds to the intersections of the maximal cliques are computed and returned.
MLEintvl(intvls, ml=Maclist(intvls))
MLEintvl(intvls, ml=Maclist(intvls))
intvls |
The n by 2 matrix containing the event time intervals for the individuals under study. |
ml |
The |
An m by 2 matrix, where m is the number of maximal cliques. The first column contains the left end point of the real representation for the appropriate maximal clique and the second column contains the right end point.
Alain Vandal and Robert Gentleman
Computational Methods for Censored Data using Intersection Graphs, R. Gentleman and A. Vandal, JCGS, 2000.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) MLEintvl(csub1)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) MLEintvl(csub1)
An estimate of the NPMLE is obtained by using projected gradient methods. This method is a special case of the methods described in Wu (1978).
PGM(A, pvec, maxiter = 500, tol=1e-07, told=2e-05, tolbis=1e-08, keepiter=FALSE)
PGM(A, pvec, maxiter = 500, tol=1e-07, told=2e-05, tolbis=1e-08, keepiter=FALSE)
A |
|
pvec |
An initial estimate of the probability vector. |
maxiter |
The maximum number of iterations to take. |
tol |
The tolerance for decreases in likelihood. |
told |
|
tolbis |
The tolerance used in the bisection code. |
keepiter |
A boolean indicating whether to return the number of iterations. |
New directions are selected by the projected gradient method. The new
optimal pvec
is obtained using the bisection algorithm, moving
in the selected direction. Convergence requires both the
distance for the improved
pvec
and the change in likelihood to
be below tol
.
An object of class icsurv
containing the following
components:
pf |
The NPMLE of |
sigma |
The cumulative sum of |
lval |
The value of the log likelihood at |
clmat |
The clique matrix. |
method |
The method used, currently only "MPGM" is possible. |
lastchange |
The difference between |
numiter |
The number of iterations carried out. |
eps |
The tolerances used. |
converge |
A boolean indicating whether convergence occurred
within |
iter |
If |
Alain Vandal and Robert Gentleman.
Some Algorithmic Aspects of the Theory of Optimal Designs, C.–F. Wu, 1978, Annals.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) PGM(csub1) data(pruitt) PGM(pruitt)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) PGM(csub1) data(pruitt) PGM(pruitt)
Produces nice plots of the estimated NPMLE.
## S3 method for class 'icsurv' plot(x, type="eq", surv=FALSE, bounds=FALSE, shade=3, density=30, angle=45, lty=1, new=TRUE, xlab="Time", ylab="Probability", main="GMLE", ltybnds=2, ...)
## S3 method for class 'icsurv' plot(x, type="eq", surv=FALSE, bounds=FALSE, shade=3, density=30, angle=45, lty=1, new=TRUE, xlab="Time", ylab="Probability", main="GMLE", ltybnds=2, ...)
x |
The estimate of the NPMLE. |
type |
Three options, "eq" for equivalence call, "gw" for the Groeneboom-Wellner estimate, and "lc" for the left-continuous estimate. |
surv |
Logical indicating whether or not to plot the survival curve. |
bounds |
Logical indicating whether or not to include bounds around the estimate. |
shade |
An integer in 1, 2, or 3 denoting what component of the plot to shade. |
density |
The density of shading lines, in lines per inch. |
angle |
The slope of shading lines, given as an angle in degrees (counter-clockwise). |
lty |
The line type for the estimates. |
new |
Logical indicating whether or not to create a new plot. |
xlab |
The x-axis label. |
ylab |
The y-axis label. |
main |
The main title for the plot. |
ltybnds |
The line type for the bounds on the estimates. |
... |
Additional arguments passed to the plot function. |
No value is returned. A plot of the NPMLE is made on the active graphics device.
Alain Vandal and Robert Gentleman.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) e1 <- VEM(csub1) par(mfrow=c(2,2)) plot(e1) data(pruitt) e2 <- EM(csub1) plot(e2) e3 <- PGM(csub1) plot(e3) e4 <- EMICM(csub1) plot(e4)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) e1 <- VEM(csub1) par(mfrow=c(2,2)) plot(e1) data(pruitt) e2 <- EM(csub1) plot(e2) e3 <- PGM(csub1) plot(e3) e4 <- EMICM(csub1) plot(e4)
Plot rectangles described by the interval given in the first two arguments.
Plotboxes(int1, int2, textp=FALSE, showmac=FALSE, showsupp=FALSE, showmp=FALSE, cliques=NULL, macprod=NULL, density=c(2, 8, 20), col=c(2, 3, 4), offsetx=0.02, offsety=0.03)
Plotboxes(int1, int2, textp=FALSE, showmac=FALSE, showsupp=FALSE, showmp=FALSE, cliques=NULL, macprod=NULL, density=c(2, 8, 20), col=c(2, 3, 4), offsetx=0.02, offsety=0.03)
int1 |
The intervals for the x dimension. |
int2 |
The intervals for the y dimension. |
textp |
Boolen, if true add text. |
showmac |
Boolean, if true then the maximal cliques are shown in a different color? |
showsupp |
Boolean, if true show support boxes. |
showmp |
Boolean |
cliques |
Maximal cliques. |
macprod |
macprod |
density |
The density of the polygon shading lines, in lines per inch. |
col |
Color for plotting features. |
offsetx |
Offset for x-axis. |
offsety |
Offset for y-axis. |
No value is returned. The event rectangles are plotted on the active graphics device.
A. Vandal and R. Gentleman
Graph–Theoretical Aspects of Bivariate Censored Data, R. Gentleman and A. Vandal, 1999, submitted.
data(cmv) Plotboxes(cmv[,1:2], cmv[,3:4], showmac=TRUE)
data(cmv) Plotboxes(cmv[,1:2], cmv[,3:4], showmac=TRUE)
For isotonization problems some increase in speed and decrease in complexity can be achieved through the use of the pool monotone groups algorithm of Y.L. Zhang and M.A. Newton (1997). It isotonizes a weighted and ordered set of values.
PMGA(est, ww=rep(1, length(est)))
PMGA(est, ww=rep(1, length(est)))
est |
The vector of values, in the appropriate order. |
ww |
The weight vector. |
To be supplied at some later date.
An object containing the following components:
est |
The isotonized estimates. |
ww |
The weights associated with the isotonized estimates. |
poolnum |
The number of values pooled in the current estimate. |
passes |
The number of passes which were required to isotonize the list. |
Alain Vandal and Robert Gentleman.
Y.L. Zhang and M.A. Newton (1997), http://www.stat.wisc.edu/~newton/newton.html)
The pruitt
data was given in Pruitt (1993) as an example for
testing different methods of estimating the bivariate NPMLE for right
censored data.
This matrix represents the clique matrix of the intersection graph of
the data set given by Pruitt.
This data frame contains 8 columns, labeled A through H that represent the observations. There are seven rows corresponding to the seven maximal cliques in the intersection graph.
Small Sample Comparison of Six Bivariate Survival Curve Estimators, Journal of Statistical Computation and Simulation, R. Pruitt, 1993.
data(pruitt)
data(pruitt)
The Vertex Exchange Method is used to obtain the NPMLE of p
.
VEM(A, pvec, maxiter=500, tol=1e-07, tolbis=1e-07, keepiter=FALSE)
VEM(A, pvec, maxiter=500, tol=1e-07, tolbis=1e-07, keepiter=FALSE)
A |
The m by n incidence matrix or the n by 2 matrix of intervals. |
pvec |
The initial estimate for the probability vector. |
maxiter |
The maximum number of iterations allowed. |
tol |
The tolerance used to determine convergence. |
tolbis |
The tolerance used in the bisection stage of the algorithm. |
keepiter |
Should iteration information be retained and returned. |
Lots.
An object of class icsurv
with the following components.
pf |
The NPMLE of the probability vector. |
numiter |
The number of iterations used. |
lval |
The value of the logarithm of the likelihood at the NPMLE. |
converge |
Boolean stating whether the iteration converged. |
intmap |
If present it contains the real representations for the
maximal cliques. These are the intervals (on the real line) where
the mass in |
Robert Gentleman and Alain Vandal
A Vertex-exchange-method in $D$-optimal Design Theory , D. Bohning, Metrika, 1986.
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) VEM(csub1) data(pruitt) VEM(pruitt)
data(cosmesis) csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R)) VEM(csub1) data(pruitt) VEM(pruitt)