# NAG Library Function Document

## nag_blgm_lm_describe_data (g22ybc)

Note: please be advised that this function is classed as ‘experimental’ and its interface may be developed further in the future. Please see Section 3.1.1 in How to Use the NAG Library and its Documentation for further information.

## 1Purpose

nag_blgm_lm_describe_data (g22ybc) describes a data matrix.

## 2Specification

 #include #include
 void nag_blgm_lm_describe_data (void **hddesc, Integer nobs, Integer nvar, const Integer levels[], Integer lvnames, const char *vnames[], NagError *fail)

## 3Description

Let $D$ denote a data matrix with $n$ observations on ${m}_{d}$ independent variables, denoted ${V}_{1},{V}_{2},\dots ,{V}_{{m}_{d}}$. The $j$th independent variable, ${V}_{j}$ can be classified as either binary, categorical, ordinal or continuous, where:
Binary
${V}_{j}$ can take the value $1$ or $0$.
Categorical
${V}_{j}$ can take one of ${L}_{j}$ distinct values or levels. Each level represents a discrete category but does not necessarily imply an ordering. The value used to represent each level is therefore arbitrary and, by convention and for convenience, is taken to be the integers from $1$ to ${L}_{j}$.
Ordinal
As with a categorical variable ${V}_{j}$ can take one of ${L}_{j}$ distinct values or levels. However, unlike a categorical variable, the levels of an ordinal variable imply an ordering and hence the value used to represent each level is not arbitrary. For example, ${V}_{j}=4$ implies a value that is twice as large as ${V}_{j}=2$.
Continuous
${V}_{j}$ can take any real value.
nag_blgm_lm_describe_data (g22ybc) returns a G22 handle containing a description of a data matrix, $D$. The data matrix makes no distinction between binary, ordinal or continuous variables.
A name can also be assigned to each variable. If names are not supplied then the default vector of names, $\left\{\text{'V1'},\text{'V2'},\dots \right\}$ is used.
None.

## 5Arguments

1:    $\mathbf{hddesc}$void **Input/Output
On entry: must be set to NULL.
As an alternative an existing G22 handle may be supplied in which case this function will destroy the supplied G22 handle as if nag_blgm_handle_free (g22zac) had been called.
On exit: holds a G22 handle to the internal data structure containing a description of the data matrix, $D$. You must not change the G22 handle other than through the functions in Chapter g22.
2:    $\mathbf{nobs}$IntegerInput
On entry: $n$, the number of observations in the data matrix, $D$.
Constraint: ${\mathbf{nobs}}\ge 0$.
3:    $\mathbf{nvar}$IntegerInput
On entry: ${m}_{d}$, the number of variables in the data matrix, $D$.
Constraint: ${\mathbf{nvar}}\ge 0$.
4:    $\mathbf{levels}\left[{\mathbf{nvar}}\right]$const IntegerInput
On entry: ${\mathbf{levels}}\left[\mathit{j}-1\right]$ contains the number of levels associated with the $\mathit{j}$th variable of the data matrix, for $\mathit{j}=1,2,\dots ,{\mathbf{nvar}}$.
If the $j$th variable is binary, ordinal or continuous, ${\mathbf{levels}}\left[j-1\right]$ should be set to $1$; otherwise ${\mathbf{levels}}\left[j-1\right]$ should be set to the number of levels associated with the $j$th variable and the corresponding column of the data matrix is assumed to take the value $1$ to ${\mathbf{levels}}\left[j-1\right]$.
Constraint: ${\mathbf{levels}}\left[\mathit{i}-1\right]\ge 1$, for $\mathit{i}=1,2,\dots ,{\mathbf{nvar}}$.
5:    $\mathbf{lvnames}$IntegerInput
On entry: the number of variable names supplied in vnames.
Constraint: ${\mathbf{lvnames}}=0$,  or ${\mathbf{nvar}}$.
6:    $\mathbf{vnames}\left[{\mathbf{lvnames}}\right]$const char *Input
On entry: if ${\mathbf{lvnames}}\ne 0$, ${\mathbf{vnames}}\left[\mathit{j}-1\right]$ must contain the name of the $\mathit{j}$th variable, for $\mathit{j}=1,2,\dots ,{\mathbf{nvar}}$. If ${\mathbf{lvnames}}=0$, vnames is not referenced and may be NULL.
The names supplied in vnames should be at most $50$ characters long and be unique. If a name longer than $50$ characters is supplied it will be truncated.
Variable names must not contain any of the characters +.*-:^()@.
7:    $\mathbf{fail}$NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

## 6Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
NE_ARRAY_SIZE
On entry, ${\mathbf{lvnames}}=〈\mathit{\text{value}}〉$ and ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{lvnames}}=0$,  or ${\mathbf{nvar}}$.
NE_BAD_PARAM
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
NE_HANDLE
On entry, hddesc is not NULL or a recognised G22 handle.
NE_INT
On entry, ${\mathbf{nobs}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nobs}}\ge 0$.
On entry, ${\mathbf{nvar}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{nvar}}\ge 0$.
NE_INT_ARRAY
On entry, $j=〈\mathit{\text{value}}〉$ and ${\mathbf{levels}}\left[j-1\right]=〈\mathit{\text{value}}〉$
Constraint: ${\mathbf{levels}}\left[\mathit{i}-1\right]\ge 1$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_INVALID_FORMAT
On entry, variable name $i$ contains one more invalid characters, $i=〈\mathit{\text{value}}〉$.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_NON_UNIQUE
On entry, variable names $i$ and $j$ are not unique (possibly due to truncation), $i=〈\mathit{\text{value}}〉$ and $j=〈\mathit{\text{value}}〉$.
Maximum variable name length is $50$.
On entry, variable names $i$ and $j$ are not unique, $i=〈\mathit{\text{value}}〉$ and $j=〈\mathit{\text{value}}〉$.
NW_TRUNCATED
At least one variable name was truncated to $50$ characters. Each truncated name is unique and will be used in all output.

Not applicable.

## 8Parallelism and Performance

nag_blgm_lm_describe_data (g22ybc) is not threaded in any implementation.

None.

## 10Example

This example performs a linear regression using nag_regsn_mult_linear (g02dac). The linear regression model is defined via a text string which is parsed using nag_blgm_lm_formula (g22yac). The corresponding design matrix associated with the model and the dataset described via a call to nag_blgm_lm_describe_data (g22ybc) is generated using nag_blgm_lm_design_matrix (g22ycc).
Verbose labels for the parameters of the model are constructed using information returned in vinfo by nag_blgm_lm_submodel (g22ydc).
See also the examples in nag_blgm_lm_formula (g22yac), nag_blgm_lm_design_matrix (g22ycc) and nag_blgm_lm_submodel (g22ydc).

### 10.1Program Text

Program Text (g22ybce.c)

### 10.2Program Data

Program Data (g22ybce.d)

### 10.3Program Results

Program Results (g22ybce.r)

## 11Optional Parameters

As well as the optional parameters common to all G22 handles described in nag_blgm_optset (g22zmc) and nag_blgm_optget (g22znc), a number of additional optional parameters can be specified for a G22 handle holding the description of a data matrix as returned by nag_blgm_lm_describe_data (g22ybc) in hddesc.
Each writeable optional parameter has an associated default value; to set any of them to a non-default value, use nag_blgm_optset (g22zmc). The value of an optional parameter can be queried using nag_blgm_optget (g22znc).
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The following is a list of the optional parameters available. A full description of each optional parameter is provided in Section 11.1.

### 11.1Description of the Optional Parameters

For each option, we give a summary line, a description of the optional parameter and details of constraints.
The summary line contains:
• a parameter value, where the letters $a$, $i$ and $r$ denote options that take character, integer and real values respectively;
• the default value.
Keywords and character values are case and white space insensitive.
 Number of Observations $i$
If queried, this optional parameter will return $n$, the number of observations in the data matrix.
 Number of Variables $i$
If queried, this optional parameter will return ${m}_{d}$, the number of variables in the data matrix.
 Storage Order $a$ Default $\text{}=\mathrm{OBSVAR}$
This optional parameter states how the data matrix, $D$, will be stored in its input array.
If ${\mathbf{Storage Order}}=\mathrm{OBSVAR}$, ${D}_{ij}$, the value for the $j$th variable of the $i$th observation of the data matrix is stored in ${\mathbf{dat}}\left[\left(j-1\right)×{\mathbf{pddat}}+i-1\right]$.
If ${\mathbf{Storage Order}}=\mathrm{VAROBS}$, ${D}_{ij}$, the value for the $j$th variable of the $i$th observation of the data matrix is stored in ${\mathbf{dat}}\left[\left(i-1\right)×{\mathbf{pddat}}+j-1\right]$.
Where dat is the input parameter of the same name in nag_blgm_lm_design_matrix (g22ycc).
Constraint: ${\mathbf{Storage Order}}=\mathrm{OBSVAR}$ or $\mathrm{VAROBS}$.
© The Numerical Algorithms Group Ltd, Oxford, UK. 2017