Anderson-Darling test for complete and right-censored data

Function ADcens computes the Anderson-Darling test statistic and p-value for right-censored data against eight possible predefined or user-specified distributions using bootstrapping. This function also accounts for complete data.

# Default S3 method
ADcens(times, cens = rep(1, length(times)),
       distr = c("exponential", "gumbel", "weibull", "normal",
                 "lognormal", "logistic", "loglogistic", "beta"),
       betaLimits = c(0, 1), igumb = c(10, 10), BS = 999,
       params0 = list(shape = NULL, shape2 = NULL,
                       location = NULL, scale = NULL, theta = NULL),
       tol = 1e-04, start = NULL, ...)
# S3 method for class 'formula'
ADcens(formula, data, ...)

Arguments

times: Numeric vector of times until the event of interest.
cens: Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact.
distr: A string specifying the name of the distribution to be studied. The possible distributions are the exponential ("exponential"), the Weibull ("weibull"), the Gumbel ("gumbel"), the normal ("normal"), the lognormal ("lognormal"), the logistic ("logistic"), the loglogistic ("loglogistic"), and the beta ("beta") distribution. In addition, if the character string used is "name", every distribution for which the corresponiding density (dname), probability (pname) and random generator (rname) functions are defined, can be used.
betaLimits: Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered.
igumb: Two-components vector with the initial values for the estimation of the Gumbel distribution parameters.
BS: Number of bootstrap samples.
params0: List specifying the parameters of the theoretical distribution. By default, parameters are set to NULL and estimated with the maximum likelihood method. This argument is only considered, if all parameters of the studied distribution are specified.
tol: Precision of survival times.
formula: A formula with a numeric vector as response (which assumes no censoring) or Surv object.
data: Data frame for variables in formula.
start: A named list giving the initial values of parameters of the named distribution or a function of data computing initial values and returning a named list. This argument may be omitted (default) for the eight prespecified distributions. See more details in mledist.
...: Additional arguments for the boot function of the boot package.

Details

The parameter estimation is acomplished with the fitdistcens function of the fitdistrplus package.

To avoid long computation times due to bootstrapping, an alternative with complete data is the function ad.test of the goftest package.

The precision of the survival times is important mainly in the data generation step of the bootstrap samples.

Value

ADcens returns an object of class "ADcens".

An object of class "ADcens" is a list containing the following components:

Distribution: Null distribution.
Hypothesis: Parameters under the null hypothesis (if params0 is provided).
Test: Vector containing the value of the Anderson-Darling statistic (AD) and the estimated p-value (p-value).
Estimates: Vector with the maximum likelihood estimates of the parameters of the distribution under study.
StdErrors: Vector containing the estimated standard errors.
aic: The Akaike information criterion.
bic: The so-called BIC or SBC (Schwarz Bayesian criterion).
BS: The number of bootstrap samples used.

References

G. Marsaglia and J. Marsaglia. Evaluating the Aderson-Darling Distrinution. In: Journal os Statistical Software, Articles, 9 (2) (2004), 1-5. URL: https://doi.org/10.18637/jss.v009.i02

Author

K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.

Warning

If the amount of data is large, the execution time of the function can be elevated. The parameter BS can limit the number of random censored samples generated and reduce the execution time.

Examples

# Complete data
set.seed(123)
ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull",
       BS = 199)
#> Null hypothesis: the data follows a weibull distribution 
#> 
#> AD Test results:
#>      AD p-value 
#>   0.176   0.955 
#> 
summary(ADcens(times = rweibull(100, 12, scale = 4), distr = "exponential",
       BS = 199), outp = "table", print.BIC = FALSE, print.infoBoot = TRUE)
#> Distribution: exponential 
#> 
#> AD Test results:
#> ------- | -------
#> Metric  | Value  
#> ------- | -------
#> AD      | 37.067 
#> p-value | 0.005  
#> ------- | -------
#> 
#> Parameter estimates:
#> --------- | --------- | ---------
#> Parameter | Value     | s.e.     
#> --------- | --------- | ---------
#> scale     | 3.834     | 0.383    
#> --------- | --------- | ---------
#> 
#> AIC: 470.79 
#> 
#> Number of bootstrap samples: 199 
#> 

if (FALSE) { # \dontrun{
# Censored data
set.seed(123)
colonsamp <- colon[sample(nrow(colon), 300), ]
ADcens(Surv(time, status) ~ 1, colonsamp, distr = "normal")
} # }