NAG Library Routine Document

g11caf (condl_logistic)

1
Purpose

g11caf returns parameter estimates for the conditional logistic analysis of stratified data, for example, data from case-control studies and survival analyses.

2
Specification

Fortran Interface
Subroutine g11caf ( n, m, ns, z, ldz, isz, ip, ic, isi, dev, b, se, sc, cov, nca, nct, tol, maxit, iprint, wk, lwk, ifail)
Integer, Intent (In):: n, m, ns, ldz, isz(m), ip, ic(n), isi(n), maxit, iprint, lwk
Integer, Intent (Inout):: ifail
Integer, Intent (Out):: nca(ns), nct(ns)
Real (Kind=nag_wp), Intent (In):: z(ldz,m), tol
Real (Kind=nag_wp), Intent (Inout):: b(ip)
Real (Kind=nag_wp), Intent (Out):: dev, se(ip), sc(ip), cov(ip*(ip+1)/2), wk(lwk)
C Header Interface
#include <nagmk26.h>
void  g11caf_ (const Integer *n, const Integer *m, const Integer *ns, const double z[], const Integer *ldz, const Integer isz[], const Integer *ip, const Integer ic[], const Integer isi[], double *dev, double b[], double se[], double sc[], double cov[], Integer nca[], Integer nct[], const double *tol, const Integer *maxit, const Integer *iprint, double wk[], const Integer *lwk, Integer *ifail)

3
Description

In the analysis of binary data, the logistic model is commonly used. This relates the probability of one of the outcomes, say y=1, to p explanatory variates or covariates by
Proby=1=expα+zTβ 1+expα+zTβ ,  
where β is a vector of unknown coefficients for the covariates z and α is a constant term. If the observations come from different strata or groups, α would vary from strata to strata. If the observed outcomes are independent then the ys follow a Bernoulli distribution, i.e., a binomial distribution with sample size one and the model can be fitted as a generalized linear model with binomial errors.
In some situations the number of observations for which y=1 may not be independent. For example, in epidemiological research, case-control studies are widely used in which one or more observed cases are matched with one or more controls. The matching is based on fixed characteristics such as age and sex, and is designed to eliminate the effect of such characteristics in order to more accurately determine the effect of other variables. Each case-control group can be considered as a stratum. In this type of study the binomial model is not appropriate, except if the strata are large, and a conditional logistic model is used. This considers the probability of the cases having the observed vectors of covariates given the set of vectors of covariates in the strata. In the situation of one case per stratum, the conditional likelihood for ns strata can be written as
L=i=1nsexpziTβ lSiexpzlTβ , (1)
where Si is the set of observations in the ith stratum, with associated vectors of covariates zl, lSi, and zi is the vector of covariates of the case in the ith stratum. In the general case of ci cases per strata then the full conditional likelihood is
L=i=1nsexpsiTβ lCiexpslTβ , (2)
where si is the sum of the vectors of covariates for the cases in the ith stratum and sl, lCi refer to the sum of vectors of covariates for all distinct sets of ci observations drawn from the ith stratum. The conditional likelihood can be maximized by a Newton–Raphson procedure. The covariances of the parameter estimates can be estimated from the inverse of the matrix of second derivatives of the logarithm of the conditional likelihood, while the first derivatives provide the score function, Ujβ, for j=1,2,,p, which can be used for testing the significance of parameters.
If the strata are not small, Ci can be large so to improve the speed of computation, the algorithm in Howard (1972) and described by Krailo and Pike (1984) is used.
A second situation in which the above conditional likelihood arises is in fitting Cox's proportional hazard model (see g12baf) in which the strata refer to the risk sets for each failure time and where the failures are cases. When ties are present in the data g12baf uses an approximation. For an exact estimate, the data can be expanded using g12zaf to create the risk sets/strata and g11caf used.

4
References

Cox D R (1972) Regression models in life tables (with discussion) J. Roy. Statist. Soc. Ser. B 34 187–220
Cox D R and Hinkley D V (1974) Theoretical Statistics Chapman and Hall
Howard S (1972) Remark on the paper by Cox, D R (1972): Regression methods J. R. Statist. Soc. B 34 and life tables 187–220
Krailo M D and Pike M C (1984) Algorithm AS 196. Conditional multivariate logistic analysis of stratified case-control studies Appl. Statist. 33 95–103
Smith P G, Pike M C, Hill P, Breslow N E and Day N E (1981) Algorithm AS 162. Multivariate conditional logistic analysis of stratum-matched case-control studies Appl. Statist. 30 190–197

5
Arguments

1:     n – IntegerInput
On entry: n, the number of observations.
Constraint: n2.
2:     m – IntegerInput
On entry: the number of covariates in array z.
Constraint: m1.
3:     ns – IntegerInput
On entry: the number of strata, ns.
Constraint: ns1.
4:     zldzm – Real (Kind=nag_wp) arrayInput
On entry: the ith row must contain the covariates which are associated with the ith observation.
5:     ldz – IntegerInput
On entry: the first dimension of the array z as declared in the (sub)program from which g11caf is called.
Constraint: ldzn.
6:     iszm – Integer arrayInput
On entry: indicates which subset of covariates are to be included in the model.
If iszj1, the jth covariate is included in the model.
If iszj=0, the jth covariate is excluded from the model and not referenced.
Constraint: iszj0 and at least one value must be nonzero.
7:     ip – IntegerInput
On entry: p, the number of covariates included in the model as indicated by isz.
Constraint: ip1 and ip= number of nonzero values of isz.
8:     icn – Integer arrayInput
On entry: indicates whether the ith observation is a case or a control.
If ici=0, indicates that the ith observation is a case.
If ici=1, indicates that the ith observation is a control.
Constraint: ici=0 or 1, for i=1,2,,n.
9:     isin – Integer arrayInput
On entry: stratum indicators which also allow data points to be excluded from the analysis.
If isii=k, indicates that the ith observation is from the kth stratum, where k=1,2,,ns.
If isii=0, indicates that the ith observation is to be omitted from the analysis.
Constraint: 0isiins and more than ip values of isii>0, for i=1,2,,n.
10:   dev – Real (Kind=nag_wp)Output
On exit: the deviance, that is, minus twice the maximized log-likelihood.
11:   bip – Real (Kind=nag_wp) arrayInput/Output
On entry: initial estimates of the covariate coefficient parameters β. bj must contain the initial estimate of the coefficent of the covariate in z corresponding to the jth nonzero value of isz.
Suggested value: in many cases an initial value of zero for bj may be used. For another suggestion see Section 9.
On exit: bj contains the estimate β^i of the coefficient of the covariate stored in the ith column of z where i is the jth nonzero value in the array isz.
12:   seip – Real (Kind=nag_wp) arrayOutput
On exit: sej is the asymptotic standard error of the estimate contained in bj and score function in scj, for j=1,2,,ip.
13:   scip – Real (Kind=nag_wp) arrayOutput
On exit: scj is the value of the score function Ujβ for the estimate contained in bj.
14:   covip×ip+1/2 – Real (Kind=nag_wp) arrayOutput
On exit: the variance-covariance matrix of the parameter estimates in b stored in packed form by column, i.e., the covariance between the parameter estimates given in bi and bj, ji, is given in covjj-1/2+i.
15:   ncans – Integer arrayOutput
On exit: ncai contains the number of cases in the ith stratum, for i=1,2,,ns.
16:   nctns – Integer arrayOutput
On exit: ncti contains the number of controls in the ith stratum, for i=1,2,,ns.
17:   tol – Real (Kind=nag_wp)Input
On entry: indicates the accuracy required for the estimation. Convergence is assumed when the decrease in deviance is less than tol×1.0+CurrentDeviance. This corresponds approximately to an absolute accuracy if the deviance is small and a relative accuracy if the deviance is large.
Constraint: tol10×machine precision.
18:   maxit – IntegerInput
On entry: the maximum number of iterations required for computing the estimates. If maxit is set to 0 then the standard errors, the score functions and the variance-covariance matrix are computed for the input value of β in b but β is not updated.
Constraint: maxit0.
19:   iprint – IntegerInput
On entry: indicates if the printing of information on the iterations is required.
iprint0
No printing.
iprint1
The deviance and the current estimates are printed every iprint iterations. When printing occurs the output is directed to the current advisory message unit (see x04abf).
Suggested value: iprint=0.
20:   wklwk – Real (Kind=nag_wp) arrayWorkspace
21:   lwk – IntegerInput
On entry: the dimension of the array wk as declared in the (sub)program from which g11caf is called.
Constraint: lwkpn0+cm+1p+1p+2/2+cm, where n0 is the number of observations included in the model, i.e., the number of observations for which isii0 and cm is the maximum number of observations in any stratum.
22:   ifail – IntegerInput/Output
On entry: ifail must be set to 0, -1 or 1. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1 or 1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this argument, the recommended value is 0. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6
Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
ifail=1
On entry, ip=value.
Constraint: ip1.
On entry, ldz=value.
Constraint: ldzn.
On entry, m=value.
Constraint: m1.
On entry, maxit=value.
Constraint: maxit0.
On entry, n=value.
Constraint: n2.
On entry, ns=value.
Constraint: ns1.
On entry, tol=value.
Constraint: tol10×machine precision.
ifail=2
On entry, i=value, isii=value and ns=value.
Constraint: 0isiins.
On entry, i=value and ici=value.
Constraint: ici=0 or 1.
On entry, i=value and iszi<value.
Constraint: iszi0.
On entry, there are not ip values of isz>0.
On entry, too few observations included in model.
ifail=3
On entry, lwk is too small, lwk=value minimum value =value.
ifail=4
Overflow in calculations. Try using different starting values.
ifail=5
The matrix of second partial derivatives is singular. Try different starting values or include fewer covariates.
ifail=6
Convergence not achieved in value iterations. The progress towards convergence can be examined by using a nonzero value of iprint. Any non-convergence may be due to a linear combination of covariates being monotonic with time. Full results are returned.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7
Accuracy

The accuracy is specified by tol.

8
Parallelism and Performance

g11caf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g11caf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9
Further Comments

The other models described in Section 3 can be fitted using the generalized linear modelling routines g02gbf and g02gcf.
The case with one case per stratum can be analysed by having a dummy response variable y such that y=1 for a case and y=0 for a control, and fitting a Poisson generalized linear model with a log link and including a factor with a level for each strata. These models can be fitted by using g02gcf.
g11caf uses mean centering, which involves subtracting the means from the covariables prior to computation of any statistics. This helps to minimize the effect of outlying observations and accelerates convergence. In order to reduce the risk of the sums computed by Howard's algorithm becoming too large, the scaling factor described in Krailo and Pike (1984) is used.
If the initial estimates are poor then there may be a problem with overflow in calculating expβTzi or there may be non-convergence. Reasonable estimates can often be obtained by fitting an unconditional model.

10
Example

The data was used for illustrative purposes by Smith et al. (1981) and consists of two strata and two covariates. The data is input, the model is fitted and the results are printed.

10.1
Program Text

Program Text (g11cafe.f90)

10.2
Program Data

Program Data (g11cafe.d)

10.3
Program Results

Program Results (g11cafe.r)