NAG Library Routine Document
G04EAF
1 Purpose
G04EAF computes orthogonal polynomial or dummy variables for a factor or classification variable.
2 Specification
INTEGER |
N, LEVELS, IFACT(N), LDX, IFAIL |
REAL (KIND=nag_wp) |
X(LDX,*), V(*), REP(LEVELS) |
CHARACTER(1) |
TYP |
|
3 Description
In the analysis of an experimental design using a general linear model the factors or classification variables that specify the design have to be coded as dummy variables. G04EAF computes dummy variables that can then be used in the fitting of the general linear model using
G02DAF.
If the factor of length n has k levels then the simplest representation is to define k dummy variables, Xj such that Xj=1 if the factor is at level j and 0 otherwise for j=1,2,…,k. However, there is usually a mean included in the model and the sum of the dummy variables will be aliased with the mean. To avoid the extra redundant parameter k-1 dummy variables can be defined as the contrasts between one level of the factor, the reference level, and the remaining levels. If the reference level is the first level then the dummy variables can be defined as Xj=1 if the factor is at level j and 0 otherwise, for j=2,3,…,k. Alternatively, the last level can be used as the reference level.
A second way of defining the
k-1 dummy variables is to use a Helmert matrix in which levels
2,3,…,k are compared with the average effect of the previous levels. For example if
k=4 then the contrasts would be:
Thus variable
j, for
j=1,2,…,k-1 is given by
- Xj=-1 if factor is at level less than j+1
- Xj=∑i=1jri/rj+1 if factor is at level j+1
- Xj=0 if factor is at level greater than j+1
where
rj is the number of replicates of level
j.
If the factor can be considered as a set of values from an underlying continuous variable then the factor can be represented by a set of
k-1 orthogonal polynomials representing the linear, quadratic etc. effects of the underlying variable. The orthogonal polynomial is computed using Forsythe's algorithm (
Forsythe (1957), see also
Cooper (1968)). The values of the underlying continuous variable represented by the factor levels have to be supplied to the routine.
The orthogonal polynomials are standardized so that the sum of squares for each dummy variable is one. For the other methods integer (±1) representations are retained except that in the Helmert representation the code of level j+1 in dummy variable j will be a fraction.
4 References
Cooper B E (1968) Algorithm AS 10. The use of orthogonal polynomials
Appl. Statist. 17 283–287
Forsythe G E (1957) Generation and use of orthogonal polynomials for data fitting with a digital computer
J. Soc. Indust. Appl. Math. 5 74–88
5 Parameters
- 1: TYP – CHARACTER(1)Input
On entry: the type of dummy variable to be computed.
- If TYP='P', an orthogonal Polynomial representation is computed.
- If TYP='H', a Helmert matrix representation is computed.
- If TYP='F', the contrasts relative to the First level are computed.
- If TYP='L', the contrasts relative to the Last level are computed.
- If TYP='C', a Complete set of dummy variables is computed.
Constraint:
TYP='P', 'H', 'F', 'L' or 'C'.
- 2: N – INTEGERInput
On entry: n, the number of observations for which the dummy variables are to be computed.
Constraint:
N≥LEVELS.
- 3: LEVELS – INTEGERInput
On entry: k, the number of levels of the factor.
Constraint:
LEVELS≥2.
- 4: IFACT(N) – INTEGER arrayInput
On entry: the n values of the factor.
Constraint:
1≤IFACTi≤LEVELS, for i=1,2,…,n.
- 5: X(LDX,*) – REAL (KIND=nag_wp) arrayOutput
-
Note: the second dimension of the array
X
must be at least
LEVELS-1 if
TYP='P',
'H',
'F' or
'L' and at least
LEVELS if
TYP='C'.
On exit: the n by k* matrix of dummy variables, where k*=k-1 if TYP='P', 'H', 'F' or 'L' and k*=k if TYP='C'.
- 6: LDX – INTEGERInput
On entry: the first dimension of the array
X as declared in the (sub)program from which G04EAF is called.
Constraint:
LDX≥N.
- 7: V(*) – REAL (KIND=nag_wp) arrayInput
-
Note: the dimension of the array
V
must be at least
LEVELS if
TYP='P', and at least
1 otherwise.
On entry: if
TYP='P', the
k distinct values of the underlying variable for which the orthogonal polynomial is to be computed.
If
TYP≠'P',
V is not referenced.
Constraint:
if
TYP='P', the
k values of
V must be distinct.
- 8: REP(LEVELS) – REAL (KIND=nag_wp) arrayOutput
On exit: the number of replications for each level of the factor,
ri, for i=1,2,…,k.
- 9: IFAIL – INTEGERInput/Output
-
On entry:
IFAIL must be set to
0,
-1 or 1. If you are unfamiliar with this parameter you should refer to
Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value
-1 or 1 is recommended. If the output of error messages is undesirable, then the value
1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is
0.
When the value -1 or 1 is used it is essential to test the value of IFAIL on exit.
On exit:
IFAIL=0 unless the routine detects an error or a warning has been flagged (see
Section 6).
6 Error Indicators and Warnings
If on entry
IFAIL=0 or
-1, explanatory error messages are output on the current error message unit (as defined by
X04AAF).
Errors or warnings detected by the routine:
- IFAIL=1
On entry, | LEVELS<2, |
or | N<LEVELS, |
or | LDX<N, |
or | TYP≠'P', 'H', 'F', 'L' or 'C'. |
- IFAIL=2
On entry, | a value of IFACT is not in the range 1≤IFACTi≤LEVELS, for i=1,2,…,n, |
or | TYP='P' and not all values of V are distinct, |
or | not all levels are represented in IFACT. |
- IFAIL=3
An orthogonal polynomial has all values zero. This will be due to some values of
V being very close together. Note this can only occur if
TYP='P'.
7 Accuracy
The computations are stable.
8 Further Comments
Other routines for fitting polynomials can be found in
Chapter E02.
9 Example
Data are read in from an experiment with four treatments and three observations per treatment with the treatment coded as a factor. G04EAF is used to compute the required dummy variables and the model is then fitted by
G02DAF.
9.1 Program Text
Program Text (g04eafe.f90)
9.2 Program Data
Program Data (g04eafe.d)
9.3 Program Results
Program Results (g04eafe.r)