G02ECF (PDF version)
G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

NAG Library Routine Document

G02ECF

Note:  before using this routine, please read the Users' Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent details.

+ Contents

    1  Purpose
    7  Accuracy

1  Purpose

G02ECF calculates R2 and Cp-values from the residual sums of squares for a series of linear regression models.

2  Specification

SUBROUTINE G02ECF ( MEAN, N, SIGSQ, TSS, NMOD, NTERMS, RSS, RSQ, CP, IFAIL)
INTEGER  N, NMOD, NTERMS(NMOD), IFAIL
REAL (KIND=nag_wp)  SIGSQ, TSS, RSS(NMOD), RSQ(NMOD), CP(NMOD)
CHARACTER(1)  MEAN

3  Description

When selecting a linear regression model for a set of n observations a balance has to be found between the number of independent variables in the model and fit as measured by the residual sum of squares. The more variables included the smaller will be the residual sum of squares. Two statistics can help in selecting the best model.
(a) R2 represents the proportion of variation in the dependent variable that is explained by the independent variables.
R2=Regression Sum of SquaresTotal Sum of Squares,
where Total Sum of Squares=TSS= y-y- 2 (if mean is fitted, otherwise TSS=y2) and
Regression Sum of Squares=RegSS=TSS-RSS, where
RSS=residual sum of squares= y-y^ 2.
The R2-values can be examined to find a model with a high R2-value but with small number of independent variables.
(b) Cp statistic.
Cp=RSSσ^2 -n-2p,
where p is the number of parameters (including the mean) in the model and σ^2 is an estimate of the true variance of the errors. This can often be obtained from fitting the full model.
A well fitting model will have Cpp. Cp is often plotted against p to see which models are closest to the Cp=p line.
G02ECF may be called after G02EAF which calculates the residual sums of squares for all possible linear regression models.

4  References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Weisberg S (1985) Applied Linear Regression Wiley

5  Parameters

1:     MEAN – CHARACTER(1)Input
On entry: indicates if a mean term is to be included.
MEAN='M'
A mean term, intercept, will be included in the model.
MEAN='Z'
The model will pass through the origin, zero-point.
Constraint: MEAN='M' or 'Z'.
2:     N – INTEGERInput
On entry: n, the number of observations used in the regression model.
Constraint: N must be greater than 2×pmax, where pmax is the largest number of independent variables fitted (including the mean if fitted).
3:     SIGSQ – REAL (KIND=nag_wp)Input
On entry: the best estimate of true variance of the errors, σ^2.
Constraint: SIGSQ>0.0.
4:     TSS – REAL (KIND=nag_wp)Input
On entry: the total sum of squares for the regression model.
Constraint: TSS>0.0.
5:     NMOD – INTEGERInput
On entry: the number of regression models.
Constraint: NMOD>0.
6:     NTERMS(NMOD) – INTEGER arrayInput
On entry: NTERMSi must contain the number of independent variables (not counting the mean) fitted to the ith model, for i=1,2,,NMOD.
7:     RSS(NMOD) – REAL (KIND=nag_wp) arrayInput
On entry: RSSi must contain the residual sum of squares for the ith model.
Constraint: RSSiTSS, for i=1,2,,NMOD.
8:     RSQ(NMOD) – REAL (KIND=nag_wp) arrayOutput
On exit: RSQi contains the R2-value for the ith model, for i=1,2,,NMOD.
9:     CP(NMOD) – REAL (KIND=nag_wp) arrayOutput
On exit: CPi contains the Cp-value for the ith model, for i=1,2,,NMOD.
10:   IFAIL – INTEGERInput/Output
On entry: IFAIL must be set to 0, -1​ or ​1. If you are unfamiliar with this parameter you should refer to Section 3.3 in the Essential Introduction for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1​ or ​1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, if you are not familiar with this parameter, the recommended value is 0. When the value -1​ or ​1 is used it is essential to test the value of IFAIL on exit.
On exit: IFAIL=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6  Error Indicators and Warnings

If on entry IFAIL=0 or -1, explanatory error messages are output on the current error message unit (as defined by X04AAF).
Errors or warnings detected by the routine:
IFAIL=1
On entry,NMOD<1,
orSIGSQ0.0,
orTSS0.0.
orMEAN'M' or 'Z'.
IFAIL=2
On entry, the number of parameters for a model is too large for the number of observations, i.e., 2×pn.
IFAIL=3
On entry, RSSi>TSS, for some i=1,2,,NMOD.
IFAIL=4
A value of Cp is less than 0.0. This may occur if SIGSQ is too large or if RSS, N or IP are incorrect.

7  Accuracy

Accuracy is sufficient for all practical purposes.

8  Further Comments

None.

9  Example

The data, from an oxygen uptake experiment, is given by Weisberg (1985). The independent and dependent variables are read and the residual sums of squares for all possible models computed using G02EAF. The values of R2 and Cp are then computed and printed along with the names of variables in the models.

9.1  Program Text

Program Text (g02ecfe.f90)

9.2  Program Data

Program Data (g02ecfe.d)

9.3  Program Results

Program Results (g02ecfe.r)


G02ECF (PDF version)
G02 Chapter Contents
G02 Chapter Introduction
NAG Library Manual

© The Numerical Algorithms Group Ltd, Oxford, UK. 2012