NAG CL Interface
g22yac (lm_​formula)

Note: please be advised that this function is classed as ‘experimental’ and its interface may be developed further in the future. Please see Section 4 in How to Use the NAG Library for further information.
Settings help

CL Name Style:


1 Purpose

g22yac parses a text string containing a formula specifying a linear model and outputs a G22 handle to an internal data structure. This G22 handle can then be passed to various functions in Chapter G22. In particular, the G22 handle can be passed to g22ycc to produce a design matrix or g22ydc to produce a vector of column inclusion flags suitable for use with functions in Chapter G02.

2 Specification

#include <nag.h>
void  g22yac (void **hform, const char *formula, NagError *fail)
The function may be called by the names: g22yac or nag_blgm_lm_formula.

3 Description

3.1 Background

Let D denote a data matrix with n observations on md independent variables, denoted V1, V2, , Vmd . Let y denote a vector of n observations on a dependent variable.
A linear model, M, as the term is used in this function, expresses a relationship between the independent variables, Vj, and the dependent variable. This relationship can be expressed as a series of additive terms T1+ T2+ , with each term, Tt, representing either a single independent variable Vj, called the main effect of Vj, or the interaction between two or more independent variables. An interaction term, denoted here using the . operator, allows the effect of an independent variable on the dependent variable to depend on the value of one or more other independent variables. As an example, the three-way interaction between V1, V2 and V3 is denoted V1. V2. V3 and describes a situation where the effect of one of these three variables is influenced by the value of the other two.
This function takes a description of M, supplied as a text string containing a formula, and outputs a G22 handle to an internal data structure. This G22 handle can then be passed to g22ycc to produce a design matrix for use in analysis functions from other chapters, for example the regression functions of Chapter G02.
A more detailed description of what is meant by a G22 handle can be found in Section 2.1 in the G22 Chapter Introduction.

3.2 Syntax

In its most verbose form M can be described by one or more variable names, Vj, and the two operators, + and .. In order to allow a wide variety of models to be specified compactly this syntax is extended to six operators (+, ., *, -, :, ^) and parentheses.
A formula describing the model is supplied to g22yac via a character string which must obey the following rules:
  1. 1.Variables can be denoted by arbitrary names, as long as
    1. (i)The names used are a subset of those supplied to g22ybc when describing D.
    2. (ii)The names do not contain any of the characters in +.*-:^()@.
  2. 2.The . operator denotes an interaction between two or more variables or terms, with V1. V2. V3 denoting the three-way interaction between V1, V2 and V3.
  3. 3.A term in M can contain one or more variable names, separated using the . operator, i.e., a term can be either a main effect or an interaction term between two or more variables.
    1. (i)If a variable appears in an interaction term more than once, all subsequent appearances, after the first, are ignored, therefore, V1. V2. V1 is the same as V1. V2 .
    2. (ii)The ordering of the variables in an interaction term is ignored when comparing terms, therefore, V1. V2 is the same as V2. V1 . This ordering may have an effect when the resulting G22 handle is passed to another function, for example g22ycc.
    3. (iii)Applying the . operator to two terms appends one to the other, for example, if T1= V1. V2 and T2= V3. V4 , T1. T2= V1. V2. V3. V4 .
  4. 4.The + operator allows additional terms to be included in M, therefore, T1+T2 is a model that includes terms T1 and T2.
    1. (i)If a term is added to M more than once, all subsequent appearances, after the first, are ignored, therefore, T1+ T2+ T1 is the same as T1+ T2 .
    2. (ii)The ordering of the terms is ignored whilst parsing the formula, therefore, T1+ T2 is the same as T2+ T1 . This ordering may have an effect when the resulting G22 handle is passed to another function, for example g22ycc.
    3. (iii)Internally, the terms are reordered so that all main effects come first, followed by two-way interactions, then three-way interactions, etc. The ordering within each of these categories is preserved.
  5. 5.The * operator can be used as a shorthand notation denoting the main effects and all interactions between the variables involved. Therefore, T1* T2 is equivalent to T1+ T2+ T1. T2 and T1* T2* T3 is equivalent to T1+ T2+ T3+ T1. T2+ T1. T3+ T2. T3+ T1. T2. T3 .
  6. 6.The - operator removes a term from M, therefore, T1* T2* T3- T1. T2. T3 is equivalent to T1+ T2+ T3+ T1. T2+ T1. T3+ T2. T3 as the three-way interaction, T1. T2. T3, usually present due to T1* T2* T3 has been removed.
  7. 7.The : operator is a shorthand way of specifying a series of variables, with V1: Vj being equivalent to V1+ V2+ + Vj .
    1. (i)This operator can only be used if the variable names end in a numeric, therefore, VAR2:VAR4 would be valid, but FVAR:LVAR would not.
    2. (ii)The root part of both variable names (i.e., the part before the trailing numeric, so VAR in the valid example above) must be the same.
    3. (iii)The trailing numeric parts of the two variable names must be in ascending order.
  8. 8.The ^ operator is a shorthand notation for a series of * operators. (T1+T2+T3) ^2 is equivalent to (T1+T2+T3) * (T1+T2+T3) which in turn is equivalent to T1+ T2+ T3+ T1. T2+ T1. T3+ T2. T3 .
    1. (i)This notation is present primarily for use with the : operator in examples of the form, (V1:V5) ^3 which specifies a model containing the main effects for variables V1 to V5 as well as all two- and three-way interactions.
    2. (ii)Using the ^ operator on a single term has no effect, therefore, T2^2 is the same as T2.

3.2.1 Precedence

Each operator has an associated default precedence, but this can be overridden through the use of parentheses. The default precedence is:
  1. 1.The : operator, with the resulting expression is treated as if it was surrounded by parentheses. Therefore, V1+ V3: V6* V7 is equivalent to V1+ (V3+V4+V5+V6) * V7 .
  2. 2.The ^ operator, with the resulting expression is treated as if it was surrounded by parentheses. Therefore, (T1+T2+T3) ^2. T4 is equivalent to ((T1+T2+T3)^2) . T4 , which is the equivalent to T1. T4+ T2. T4+ T3. T4+ T1. T2. T4+ T1. T3. T4+ T2. T3. T4 .
  3. 3.The . operator, so T1* T2. T3 is equivalent to T1* (T2.T3) .
  4. 4.The * operator.
    1. (i)When using parentheses with the * or . operators the usual rules of multiplication apply, therefore, (T1+T3.T4) . (T5+T7) is equivalent to T1. T5+ T1. T7+ T3. T4. T5+ T3. T4. T7 and (T1+T3.T4) * (T5+T7) is equivalent to T1+ T5+ T7+ T3. T4+ T1. T5+ T1. T7+ T3. T4. T5+ T3. T4. T7 .
    2. (ii)Syntax of the following form is invalid: T1 o (T2) o T3 , where o indicates an operator, unless one or more of those operators are + and/or -. Therefore, T1. (T2+T3) * T4 is invalid, whilst T1. (T2+T3)+ T4 is valid.
  5. 5.The + and - operators have equal precedence.
    1. (i)If the terms associated with a - operator do not occur in the current expression they are ignored, therefore, T1+ (T2-T1) is the equivalent to T1+ T2 ; the (T2-T1) part of the expression is calculated first and results in T2 as the T1 term does not exist in this particular sub-expression so cannot be removed.

3.2.2 Mean Effect / Intercept Term

A mean effect (or intercept term) can be explicitly added to a formula by specifying 1 and can be explicitly excluded from the formula by specifying -1. For example, 1+V1+V2 indicates a model with the main effects of two variables and a mean effect, whereas V1+V2-1 denotes the same model, but without the mean effect. The mean indicator can appear anywhere in the formula string as long as it is not contained within parentheses.
If the mean effect is not explicitly mentioned in the model formula, the model is assumed to include a mean effect.

3.3 Optional Parameters

g22yac accepts a number of optional parameters described in Section 11. Usually these parameters are set via call to g22zmc, however when specifying a subject term in a mixed effects linear regression model it is often more convenient to supply the information along with the rest of the formula. Therefore, writeable optional parameters can be set via the formula argument. The delimiter / must be used between the main formula and the optional parameter. For example, supplying a formula of the form V1+V2/SUBJECT=V3.V4, would specify a model formula of V1+V2 and set the optional parameter Subject to V3.V4.

4 References

None.

5 Arguments

1: hform void ** Input/Output
On entry: must be set to NULL, alternatively an existing G22 handle may be supplied in which case this function will destroy the supplied G22 handle as if g22zac had been called.
On exit: holds a G22 handle to the internal data structure containing a description of the model M as specified in formula. You must not change the G22 handle other than through functions in Chapter G22.
2: formula const char * Input
On entry: a string containing the formula specifying M. See Section 3 for details on the allowed model syntax.
3: fail NagError * Input/Output
The NAG error argument (see Section 7 in the Introduction to the NAG Library CL Interface).

6 Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 3.1.2 in the Introduction to the NAG Library CL Interface for further information.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_HANDLE
On entry, hform is not NULL or a recognised G22 handle.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 7.5 in the Introduction to the NAG Library CL Interface for further information.
NE_INVALID_FORMAT
After processing, the model contains no terms.
An invalid contrast specifier has been supplied.
The position in the formula string of the error is value.
An operator was missing.
The position in the formula string of the error is value.
Invalid specification for the colon operator.
The position in the formula string of the error is value.
Invalid specification for the mean.
The position in the formula string of the error is value.
Invalid specification for the power operator.
The position in the formula string of the error is value.
Invalid use of an operator.
The position in the formula string of the error is value.
Invalid variable name.
The position in the formula string of the error is value.
Missing variable name.
The position in the formula string of the error is value.
On entry, an option was supplied in formula, but the expected delimiter ‘=’ was not found.
On entry, an option was supplied in formula, but the supplied optval was invalid.
The formula contained a mismatched parenthesis.
The position in the formula string of the error is value.
NE_INVALID_OPTION
On entry, an invalid option was supplied in formula.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library CL Interface for further information.
NW_POTENTIAL_PROBLEM
A term contained a repeated variable with a different contrast specifier.

7 Accuracy

Not applicable.

8 Parallelism and Performance

Background information to multithreading can be found in the Multithreading documentation.
g22yac is not threaded in any implementation.

9 Further Comments

None.

10 Example

This example reads in and parses a formula specifying a model, M, and displays the processed formula. A data matrix, D, is then read in and a design matrix constructed from D and M using g22ycc.
The design matrix includes an explicit term for the mean effect.
See also the examples for g22ybc, g22ycc and g22ydc.

10.1 Program Text

Program Text (g22yace.c)

10.2 Program Data

Program Data (g22yace.d)

10.3 Program Results

Program Results (g22yace.r)

11 Optional Parameters

As well as the optional parameters common to all G22 handles described in g22zmc and g22znc, a number of additional optional parameters can be specified for a G22 handle holding the description of a model, as returned by g22yac in hform.
Each writeable optional parameter has an associated default value; to set any of them to a non-default value, use g22zmc. The value of any optional parameter can be queried using g22znc.
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The following is a list of the optional parameters available. A full description of each optional parameter is provided in Section 11.1.
All functions that make use of the G22 handle returned by g22yac combine it with a description of a data matrix, D, to construct a design matrix, X.

11.1 Description of the Optional Parameters

For each option, we give a summary line, a description of the optional parameter and details of constraints.
The summary line contains:
Keywords and character values are case and white space insensitive.
ContrastaDefault =FIRST
This parameter controls the default contrasts used for the categorical independent variables appearing in the model. Six types of contrasts and dummy variables are available:
FIRST
Treatment contrasts relative to the first level of the variable will be used.
LAST
Treatment contrasts relative to the last level of the variable will be used.
SUM FIRST
Sum contrasts relative to the first level of the variable will be used.
SUM LAST
Sum contrasts relative to the last level of the variable will be used.
HELMERT
Helmert contrasts will be used.
POLYNOMIAL
Polynomial contrasts will be used.
DUMMY
Dummy variables will be used rather than a contrast.
See g22ycc for more information on contrasts, their effect on the design matrix and how they are constructed.
This parameter may have an instance identifier associated with it (see g22zmc and g22znc). The instance identifier must be the name of one of the variables appearing in the model supplied in formula when the G22 handle was created. For example, CONTRAST : VAR1 = HELMERT would set Helmert contrasts for the variable named VAR1.
If no instance identifier is specified, the default contrast for all categorical variables in the model is changed, otherwise only the default contrast for the named variable is changed.
In some situations it might be necessary for a variable to use a different contrast, depending on where it appears in the model formula. In order to allow contrasts to be specified on a term by term basis the @ operator can be used in the model formula. The syntax for this operator is Vj@c, where c is one of: F, L, SF, SL, H, P or D, corresponding to treatment contrasts relative to the first and last levels, sum contrasts relative to the first and last levels, Helmert contrasts, polynomial contrasts or dummy variables respectively.
If the contrast has not been explicitly specified via the @ operator, the value obtained from the optional parameter Contrast is used.
For example, setting formula to VAR1 + VAR1@H.VAR2@P + VAR2@H.VAR3, specifies that the variable named VAR1 should use the default contrasts in the first term and Helmert contrasts in the second term. The variable named VAR2 should use polynomial contrasts in the second term and Helmert contrasts in the third term. The variable named VAR3 should use the default contrasts in the third term.
Constraint: Contrast=FIRST, LAST, SUM FIRST, SUM LAST, HELMERT, POLYNOMIAL or DUMMY.
Explicit MeanaDefault =NO
If Explicit Mean=YES, any mean effect included in the model will be explicitly added to the design matrix, X, as a column of 1s.
If Explicit Mean=NO, it is assumed that the function to which X will be passed treats the mean effect as a special case, see mean in g02dac for example.
Constraint: Explicit Mean=YES or NO.
Formulaa
This parameter returns a verbose version of the model formula specified in formula, expanded and simplified to only contain variable names, the operators + and . and any contrast identifiers present.
Storage OrderaDefault =OBSVAR
This optional parameter controls how the design matrix, X, should be stored in its output array and only has an effect if the design matrix is being constructed using g22ycc.
If Storage Order=OBSVAR, Xij, the value for the jth variable of the ith observation of the design matrix is stored in x[(j-1)×pdx+i-1].
If Storage Order=VAROBS, Xij, the value for the jth variable of the ith observation of the design matrix is stored in x[(i-1)×pdx+j-1].
Where x is the output parameter of the same name in g22ycc.
Constraint: Storage Order=OBSVAR or VAROBS.
Subjecta
This parameter gives the subject terms associated with the formula in a linear mixed effects model.
The supplied value must consist of a single term, representing either a single independent variable, or a single interaction term between two or more independent variables. All variables in the subject term must not also appear in the model formula.