Numerical Algorithms Group - Binary Data

Binary data is becoming increasingly important in developing predictive relationships for a range of applications that include both scientific (e.g., the presence or absence of a genetic trait) and business analytics (has/has not a customer visited a particular web page). The data can be coded as either a 0 or a 1. Further, even when a variable can take on only one of several values the result can be modelled by converting the variable into series of 'dummy' variables, one of which is set to 1 (and the others set to 0). A logistic regression model, a particular case of the general linear model can handle these data. NAG has routines in both its C and Fortran libraries to perform logistic regression and to create class, or 'dummy', variables...

The analysis of binary data, in which the response is either a yes or a no, is becoming increasingly important in many areas. Examples of binary data are whether or not an insurance claim is made in a given period, whether a lead becomes a customer, or whether a machine fails under different circumstances. The data can be coded as either a 0 or a 1.

The most useful model for this type of data is logistic regression. A logistic regression is a particular case of what is known as a generalised linear model. These models are generalisations of the ordinary regression model that allow different types of data, such as binary data, and different types of links. A link connects the explanatory part of the model to the response. A logistic link means that the fitted values for the model stay between 0 and 1.

To fit a logistic regression, use NAG function G02GBF/g02gbc, which is for generalised linear models with binomial data. Set the link parameter appropriately for using a logistic link and set all elements of the denominator array to 1.

Often the explanatory part of the model is given by category variables that define groups: for example, occupation or age in ten-year groups. To add these to the model they need to be converted to a set of 0/1 variables that define the groups (these are often known as dummy variables). These dummy variables can be calculated using the NAG function G04EAF/g04eac.

As well as being a library routine, G02GBF is available in the NAG Statistical Add-Ins for Excel as BINOMIAL_GLM.

If you have any technical questions or queries about using NAG products, please email us at infodesk@nag.com.