﻿ g02ch Method
g02ch performs a multiple linear regression with no constant on a set of variables whose sums of squares and cross-products about zero and correlation-like coefficients are given.

# Syntax

C#
```public static void g02ch(
int n,
int k1,
int k,
double[,] sspz,
double[,] rz,
double[] result,
double[,] coef,
double[,] rznv,
double[,] cz,
out int ifail
)```
Visual Basic
```Public Shared Sub g02ch ( _
n As Integer, _
k1 As Integer, _
k As Integer, _
sspz As Double(,), _
rz As Double(,), _
result As Double(), _
coef As Double(,), _
rznv As Double(,), _
cz As Double(,), _
<OutAttribute> ByRef ifail As Integer _
)```
Visual C++
```public:
static void g02ch(
int n,
int k1,
int k,
array<double,2>^ sspz,
array<double,2>^ rz,
array<double>^ result,
array<double,2>^ coef,
array<double,2>^ rznv,
array<double,2>^ cz,
[OutAttribute] int% ifail
)```
F#
```static member g02ch :
n : int *
k1 : int *
k : int *
sspz : float[,] *
rz : float[,] *
result : float[] *
coef : float[,] *
rznv : float[,] *
cz : float[,] *
ifail : int byref -> unit
```

#### Parameters

n
Type: System..::..Int32
On entry: $n$, the number of cases used in calculating the sums of squares and cross-products and correlation-like coefficients.
k1
Type: System..::..Int32
On entry: the total number of variables, independent and dependent $\left(k+1\right)$, in the regression.
Constraint: $2\le {\mathbf{k1}}\le {\mathbf{n}}$.
k
Type: System..::..Int32
On entry: the number of independent variables $k$ in the regression.
Constraint: ${\mathbf{k}}={\mathbf{k1}}-1$.
sspz
Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, k1]
Note: dim1 must satisfy the constraint: $\mathrm{dim1}\ge {\mathbf{k1}}$
On entry: ${\mathbf{sspz}}\left[\mathit{i}-1,\mathit{j}-1\right]$ must be set to ${\stackrel{~}{S}}_{\mathit{i}\mathit{j}}$, the sum of cross-products about zero for the $\mathit{i}$th and $\mathit{j}$th variables, for $\mathit{i}=1,2,\dots ,k+1$ and $\mathit{j}=1,2,\dots ,k+1$; terms involving the dependent variable appear in row $k+1$ and column $k+1$.
rz
Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, k1]
Note: dim1 must satisfy the constraint: $\mathrm{dim1}\ge {\mathbf{k1}}$
On entry: ${\mathbf{rz}}\left[\mathit{i}-1,\mathit{j}-1\right]$ must be set to ${\stackrel{~}{R}}_{\mathit{i}\mathit{j}}$, the correlation-like coefficient for the $\mathit{i}$th and $\mathit{j}$th variables, for $\mathit{i}=1,2,\dots ,k+1$ and $\mathit{j}=1,2,\dots ,k+1$; coefficients involving the dependent variable appear in row $k+1$ and column $k+1$.
result
Type: array<System..::..Double>[]()[][]
An array of size [$13$]
On exit: the following information:
 ${\mathbf{result}}\left[0\right]$ $SSR$, the sum of squares attributable to the regression; ${\mathbf{result}}\left[1\right]$ $DFR$, the degrees of freedom attributable to the regression; ${\mathbf{result}}\left[2\right]$ $MSR$, the mean square attributable to the regression; ${\mathbf{result}}\left[3\right]$ $F$, the $F$ value for the analysis of variance; ${\mathbf{result}}\left[4\right]$ $SSD$, the sum of squares of deviations about the regression; ${\mathbf{result}}\left[5\right]$ $DFD$, the degrees of freedom of deviations about the regression; ${\mathbf{result}}\left[6\right]$ $MSD$, the mean square of deviations about the regression; ${\mathbf{result}}\left[7\right]$ $SST$, the total sum of squares; ${\mathbf{result}}\left[8\right]$ $DFT$, the total degrees of freedom; ${\mathbf{result}}\left[9\right]$ $s$, the standard error estimate; ${\mathbf{result}}\left[10\right]$ $R$, the coefficient of multiple correlation; ${\mathbf{result}}\left[11\right]$ ${R}^{2}$, the coefficient of multiple determination; ${\mathbf{result}}\left[12\right]$ ${\stackrel{-}{R}}^{2}$, the coefficient of multiple determination corrected for the degrees of freedom.
coef
Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, $3$]
Note: dim1 must satisfy the constraint: $\mathrm{dim1}\ge {\mathbf{k}}$
On exit: for $i=1,2,\dots ,k$, the following information:
${\mathbf{coef}}\left[i-1,0\right]$
${b}_{i}$, the regression coefficient for the $i$th variable.
${\mathbf{coef}}\left[i-1,1\right]$
$se\left({b}_{i}\right)$, the standard error of the regression coefficient for the $i$th variable.
${\mathbf{coef}}\left[i-1,2\right]$
$t\left({b}_{i}\right)$, the $t$ value of the regression coefficient for the $i$th variable.
rznv
Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, k]
Note: dim1 must satisfy the constraint: $\mathrm{dim1}\ge {\mathbf{k}}$
On exit: the inverse of the matrix of correlation-like coefficients for the independent variables; that is, the inverse of the matrix consisting of the first $k$ rows and columns of rz.
cz
Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, k]
Note: dim1 must satisfy the constraint: $\mathrm{dim1}\ge {\mathbf{k}}$
On exit: the modified inverse matrix, $C$, where
 $cz[i-1,j-1]=rz[i-1,j-1]×rznv[i-1,j-1]sspz[i-1,j-1], i,j=1,2,…,k.$
ifail
Type: System..::..Int32%
On exit: ${\mathbf{ifail}}={0}$ unless the method detects an error or a warning has been flagged (see [Error Indicators and Warnings]).

# Description

g02ch fits a curve of the form
 $y=b1x1+b2x2+⋯+bkxk$
to the data points
 $x11,x21,…,xk1,y1x12,x22,…,xk2,y2⋮x1n,x2n,…,xkn,yn$
such that
 $yi=b1x1i+b2x2i+⋯+bkxki+ei, i=1,2,…,n.$
The method calculates the regression coefficients, ${b}_{1},{b}_{2},\dots ,{b}_{k}$, (and various other statistical quantities) by minimizing
 $∑i=1nei2.$
The actual data values $\left({x}_{1i},{x}_{2i},\dots ,{x}_{ki},{y}_{i}\right)$ are not provided as input to the method. Instead, input to the method consists of:
 (i) The number of cases, $n$, on which the regression is based. (ii) The total number of variables, dependent and independent, in the regression, $\left(k+1\right)$. (iii) The number of independent variables in the regression, $k$. (iv) The $\left(k+1\right)$ by $\left(k+1\right)$ matrix $\left[{\stackrel{~}{S}}_{ij}\right]$ of sums of squares and cross-products about zero of all the variables in the regression; the terms involving the dependent variable, $y$, appear in the $\left(k+1\right)$th row and column. (v) The $\left(k+1\right)$ by $\left(k+1\right)$ matrix $\left[{\stackrel{~}{R}}_{ij}\right]$ of correlation-like coefficients for all the variables in the regression; the correlations involving the dependent variable, $y$, appear in the $\left(k+1\right)$th row and column.
The quantities calculated are:
(a) The inverse of the $k$ by $k$ partition of the matrix of correlation-like coefficients, $\left[{\stackrel{~}{R}}_{ij}\right]$, involving only the independent variables. The inverse is obtained using an accurate method which assumes that this sub-matrix is positive definite (see [Further Comments]).
(b) The modified matrix, $C=\left[{c}_{ij}\right]$, where
 $cij=R~ijr~ijS~ij, i,j=1,2,…,k,$
where ${\stackrel{~}{r}}^{ij}$ is the $\left(i,j\right)$th element of the inverse matrix of $\left[{\stackrel{~}{R}}_{ij}\right]$ as described in (a) above. Each element of $C$ is thus the corresponding element of the matrix of correlation-like coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and cross-products about zero.
(c) The regression coefficients:
 $bi=∑j=1kcijS~jk+1, i=1,2,…,k,$
where ${\stackrel{~}{S}}_{j\left(k+1\right)}$ is the sum of cross-products about zero for the independent variable ${x}_{j}$ and the dependent variable $y$.
(d) The sum of squares attributable to the regression, $SSR$, the sum of squares of deviations about the regression, $SSD$, and the total sum of squares, $SST$:
• $SST={\stackrel{~}{S}}_{\left(k+1\right)\left(k+1\right)}$, the sum of squares about zero for the dependent variable, $y$;
• $SSR=\sum _{j=1}^{k}{b}_{j}{\stackrel{~}{S}}_{j\left(k+1\right)}\text{; }SSD=SST-SSR$.
(e) The degrees of freedom attributable to the regression, $DFR$, the degrees of freedom of deviations about the regression, $DFD$, and the total degrees of freedom, $DFT$:
 $DFR=k; DFD=n-k; DFT=n.$
(f) The mean square attributable to the regression, $MSR$, and the mean square of deviations about the regression, $MSD$:
 $MSR=SSR/DFR; MSD=SSD/DFD.$
(g) The $F$ value for the analysis of variance:
 $F=MSR/MSD.$
(h) The standard error estimate:
 $s=MSD.$
(i) The coefficient of multiple correlation, $R$, the coefficient of multiple determination, ${R}^{2}$, and the coefficient of multiple determination corrected for the degrees of freedom, ${\stackrel{-}{R}}^{2}$:
 $R=1-SSDSST; R2=1-SSDSST; R-2=1-SSD×DFTSST×DFD.$
(j) The standard error of the regression coefficients:
 $sebi=MSD×cii, i=1,2,…,k.$
(k) The $t$ values for the regression coefficients:
 $tbi=bisebi, i=1,2,…,k.$

# References

Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley

# Error Indicators and Warnings

Errors or warnings detected by the method:
Some error messages may refer to parameters that are dropped from this interface (LDSSPZ, LDRZ, LDCOEF, LDRZNV, LDCZ) In these cases, an error in another parameter has usually caused an incorrect value to be inferred.
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{k1}}<2$.
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{k1}}\ne \left({\mathbf{k}}+1\right)$.
${\mathbf{ifail}}=3$
 On entry, ${\mathbf{n}}<{\mathbf{k1}}$.
${\mathbf{ifail}}=4$
 On entry, ${\mathbf{ldwkz}}<{\mathbf{k}}$.
${\mathbf{ifail}}=5$
This indicates that the $k$ by $k$ partition of the matrix held in rz, which is to be inverted, is not positive definite.
${\mathbf{ifail}}=6$
This indicates that the refinement following the actual inversion fails, indicating that the $k$ by $k$ partition of the matrix held in rz, which is to be inverted, is ill-conditioned. The use of g02da, which employs a different numerical technique, may avoid the difficulty.
${\mathbf{ifail}}=7$
Unexpected error in (F04ABF not in this release).
${\mathbf{ifail}}=-9000$
An error occured, see message report.
${\mathbf{ifail}}=-6000$
Invalid Parameters $〈\mathit{\text{value}}〉$
${\mathbf{ifail}}=-4000$
Invalid dimension for array $〈\mathit{\text{value}}〉$
${\mathbf{ifail}}=-8000$
Negative dimension for array $〈\mathit{\text{value}}〉$
${\mathbf{ifail}}=-6000$
Invalid Parameters $〈\mathit{\text{value}}〉$

# Accuracy

The accuracy of any regression method is almost entirely dependent on the accuracy of the matrix inversion method used. In g02ch, it is the matrix of correlation-like coefficients rather than that of the sums of squares and cross-products about zero that is inverted; this means that all terms in the matrix for inversion are of a similar order, and reduces the scope for computational error. For details on absolute accuracy, the relevant section of the document describing the inversion method used, (F04ABF not in this release), should be consulted. g02da uses a different method, based on (F04AMF not in this release), and that method may well prove more reliable numerically. It does not handle missing values, nor does it provide the same output as this method.
If, in calculating $F$ or any of the $t\left({b}_{i}\right)$  (see [Description]), the numbers involved are such that the result would be outside the range of numbers which can be stored by the machine, then the answer is set to the largest quantity which can be stored as a real variable, by means of a call to x02al.

# Parallelism and Performance

None.

The time taken by g02ch depends on $k$.
This method assumes that the matrix of correlation-like coefficients for the independent variables in the regression is positive definite; it fails if this is not the case.
This correlation matrix will in fact be positive definite whenever the correlation-like matrix and the sums of squares and cross-products (about zero) matrix have been formed either without regard to missing values, or by eliminating completely any cases involving missing values for any variable. If, however, these matrices are formed by eliminating cases with missing values from only those calculations involving the variables for which the values are missing, no such statement can be made, and the correlation-like matrix may or may not be positive definite. You should be aware of the possible dangers of using correlation matrices formed in this way (see the G02 class), but if they nevertheless wish to carry out regressions using such matrices, this method is capable of handling the inversion of such matrices, provided they are positive definite.
If a matrix is positive definite, its subsequent re-organisation by either of g02ce or g02cf will not affect this property and the new matrix can safely be used in this method. Thus correlation matrices produced by any of g02bdg02beg02bk or g02bl, even if subsequently modified by either g02ce or g02cf, can be handled by this method.
It should be noted that the method requires the dependent variable to be the last of the $k+1$ variables whose statistics are provided as input to the method. If this variable is not correctly positioned in the original data, the means, standard deviations, sums of squares and cross-products about zero, and correlation-like coefficients can be manipulated by using g02ce or g02cf to reorder the variables as necessary.

# Example

This example reads in the sums of squares and cross-products about zero, and correlation-like coefficients for three variables. A multiple linear regression with no constant is then performed with the third and final variable as the dependent variable. Finally the results are printed.

Example program (C#): g02che.cs

Example program data: g02che.d

Example program results: g02che.r