g02cg performs a multiple linear regression on a set of variables whose means, sums of squares and crossproducts of deviations from means, and Pearson productmoment correlation coefficients are given.
Syntax
C# 

public static void g02cg(
int n,
int k1,
int k,
double[] xbar,
double[,] ssp,
double[,] r,
double[] result,
double[,] coef,
double[] con,
double[,] rinv,
double[,] c,
out int ifail
) 
Visual Basic 

Public Shared Sub g02cg ( _
n As Integer, _
k1 As Integer, _
k As Integer, _
xbar As Double(), _
ssp As Double(,), _
r As Double(,), _
result As Double(), _
coef As Double(,), _
con As Double(), _
rinv As Double(,), _
c As Double(,), _
<OutAttribute> ByRef ifail As Integer _
) 
Visual C++ 

public:
static void g02cg(
int n,
int k1,
int k,
array<double>^ xbar,
array<double,2>^ ssp,
array<double,2>^ r,
array<double>^ result,
array<double,2>^ coef,
array<double>^ con,
array<double,2>^ rinv,
array<double,2>^ c,
[OutAttribute] int% ifail
) 
Parameters
 n
 Type: System..::..Int32
On entry: the number of cases $n$, used in calculating the sums of squares and crossproducts and correlation coefficients.
 k1
 Type: System..::..Int32
On entry: the total number of variables, independent and dependent, $\left(k+1\right)$, in the regression.
Constraint:
$2\le {\mathbf{k1}}<{\mathbf{n}}$.
 k
 Type: System..::..Int32
On entry: the number of independent variables $k$ in the regression.
Constraint:
${\mathbf{k}}={\mathbf{k1}}1$.
 xbar
 Type: array<System..::..Double>[]()[][]
On entry: ${\mathbf{xbar}}\left[\mathit{i}1\right]$ must be set to ${\stackrel{}{x}}_{\mathit{i}}$, the mean value of the $\mathit{i}$th variable, for $\mathit{i}=1,2,\dots ,k+1$; the mean of the dependent variable must be contained in ${\mathbf{xbar}}\left[k\right]$.
 ssp
 Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [
dim1,
k1]
Note: dim1 must satisfy the constraint:
$\mathrm{dim1}\ge {\mathbf{k1}}$
On entry: ${\mathbf{ssp}}[\mathit{i}1,\mathit{j}1]$ must be set to ${S}_{\mathit{i}\mathit{j}}$, the sum of crossproducts of deviations from means for the $\mathit{i}$th and $\mathit{j}$th variables, for $\mathit{i}=1,2,\dots ,k+1$ and $\mathit{j}=1,2,\dots ,k+1$; terms involving the dependent variable appear in row $k+1$ and column $k+1$.
 r
 Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [
dim1,
k1]
Note: dim1 must satisfy the constraint:
$\mathrm{dim1}\ge {\mathbf{k1}}$
On entry: ${\mathbf{r}}[\mathit{i}1,\mathit{j}1]$ must be set to ${R}_{\mathit{i}\mathit{j}}$, the Pearson productmoment correlation coefficient for the $\mathit{i}$th and $\mathit{j}$th variables, for $\mathit{i}=1,2,\dots ,k+1$ and $\mathit{j}=1,2,\dots ,k+1$; terms involving the dependent variable appear in row $k+1$ and column $k+1$.
 result
 Type: array<System..::..Double>[]()[][]
An array of size [$13$]
On exit: the following information:
${\mathbf{result}}\left[0\right]$  $SSR$, the sum of squares attributable to the regression; 
${\mathbf{result}}\left[1\right]$  $DFR$, the degrees of freedom attributable to the regression; 
${\mathbf{result}}\left[2\right]$  $MSR$, the mean square attributable to the regression; 
${\mathbf{result}}\left[3\right]$  $F$, the $F$ value for the analysis of variance; 
${\mathbf{result}}\left[4\right]$  $SSD$, the sum of squares of deviations about the regression; 
${\mathbf{result}}\left[5\right]$  $DFD$, the degrees of freedom of deviations about the regression; 
${\mathbf{result}}\left[6\right]$  $MSD$, the mean square of deviations about the regression; 
${\mathbf{result}}\left[7\right]$  $SST$, the total sum of squares; 
${\mathbf{result}}\left[8\right]$  $DFT$, the total degrees of freedom; 
${\mathbf{result}}\left[9\right]$  $s$, the standard error estimate; 
${\mathbf{result}}\left[10\right]$  $R$, the coefficient of multiple correlation; 
${\mathbf{result}}\left[11\right]$  ${R}^{2}$, the coefficient of multiple determination; 
${\mathbf{result}}\left[12\right]$  ${\stackrel{}{R}}^{2}$, the coefficient of multiple determination corrected for the degrees of freedom. 
 coef
 Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, $3$]
Note: dim1 must satisfy the constraint:
$\mathrm{dim1}\ge {\mathbf{k}}$
On exit: for
$i=1,2,\dots ,k$, the following information:
 ${\mathbf{coef}}[i1,0]$
 ${b}_{i}$, the regression coefficient for the $i$th variable.
 ${\mathbf{coef}}[i1,1]$
 $se\left({b}_{i}\right)$, the standard error of the regression coefficient for the $i$th variable.
 ${\mathbf{coef}}[i1,2]$
 $t\left({b}_{i}\right)$, the $t$ value of the regression coefficient for the $i$th variable.
 con
 Type: array<System..::..Double>[]()[][]
An array of size [$3$]
On exit: the following information:
${\mathbf{con}}\left[0\right]$  $a$, the regression constant; 
${\mathbf{con}}\left[1\right]$  $se\left(a\right)$, the standard error of the regression constant; 
${\mathbf{con}}\left[2\right]$  $t\left(a\right)$, the $t$ value for the regression constant. 
 rinv
 Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [
dim1,
k]
Note: dim1 must satisfy the constraint:
$\mathrm{dim1}\ge {\mathbf{k}}$
On exit: the inverse of the matrix of correlation coefficients for the independent variables; that is, the inverse of the matrix consisting of the first
$k$ rows and columns of
r.
 c
 Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [
dim1,
k]
Note: dim1 must satisfy the constraint:
$\mathrm{dim1}\ge {\mathbf{k}}$
On exit: the modified inverse matrix, where
 ${\mathbf{c}}[\mathit{i}1,\mathit{j}1]={\mathbf{r}}[\mathit{i}1,\mathit{j}1]\times {\mathbf{rinv}}[\mathit{i}1,\mathit{j}1]/{\mathbf{ssp}}[\mathit{i}1,\mathit{j}1]$, for $\mathit{i}=1,2,\dots ,k$ and $\mathit{j}=1,2,\dots ,k$.
 ifail
 Type: System..::..Int32%
On exit:
${\mathbf{ifail}}={0}$ unless the method detects an error or a warning has been flagged (see
[Error Indicators and Warnings]).
Description
g02cg fits a curve of the form
to the data points
such that
The method calculates the regression coefficients,
${b}_{1},{b}_{2},\dots ,{b}_{k}$, the regression constant,
$a$, and various other statistical quantities by minimizing
The actual data values
$\left({x}_{1i},{x}_{2i},\dots ,{x}_{ki},{y}_{i}\right)$ are not provided as input to the method. Instead, input consists of:
(i) 
The number of cases, $n$, on which the regression is based. 
(ii) 
The total number of variables, dependent and independent, in the regression, $\left(k+1\right)$. 
(iii) 
The number of independent variables in the regression, $k$. 
(iv) 
The means of all $k+1$ variables in the regression, both the independent variables $\left({x}_{1},{x}_{2},\dots ,{x}_{k}\right)$ and the dependent variable $\left(y\right)$, which is the $\left(k+1\right)$th variable: i.e., ${\stackrel{}{x}}_{1},{\stackrel{}{x}}_{2},\dots ,{\stackrel{}{x}}_{k},\stackrel{}{y}$. 
(v) 
The $\left(k+1\right)$ by $\left(k+1\right)$ matrix [${S}_{ij}$] of sums of squares and crossproducts of deviations from means of all the variables in the regression; the terms involving the dependent variable, $y$, appear in the $\left(k+1\right)$th row and column. 
(vi) 
The $\left(k+1\right)$ by $\left(k+1\right)$ matrix [${R}_{ij}$] of the Pearson productmoment correlation coefficients for all the variables in the regression; the correlations involving the dependent variable, $y$, appear in the $\left(k+1\right)$th row and column. 
The quantities calculated are:
(a) 
The inverse of the $k$ by $k$ partition of the matrix of correlation coefficients, [${R}_{ij}$], involving only the independent variables. The inverse is obtained using an accurate method which assumes that this submatrix is positive definite. 
(b) 
The modified inverse matrix, $C=\left[{c}_{ij}\right]$, where
where ${r}_{ij}$ is the $\left(i,j\right)$th element of the inverse matrix of [${R}_{ij}$] as described in (a) above. Each element of $C$ is thus the corresponding element of the matrix of correlation coefficients multiplied by the corresponding element of the inverse of this matrix, divided by the corresponding element of the matrix of sums of squares and crossproducts of deviations from means. 
(c) 
The regression coefficients:
where ${S}_{j\left(k+1\right)}$ is the sum of crossproducts of deviations from means for the independent variable ${x}_{j}$ and the dependent variable $y$. 
(d) 
The sum of squares attributable to the regression, $SSR$, the sum of squares of deviations about the regression, $SSD$, and the total sum of squares, $SST$:
 $SST={S}_{\left(k+1\right)\left(k+1\right)}$, the sum of squares of deviations from the mean for the dependent variable, $y$;
 $SSR={\displaystyle \sum _{j=1}^{k}}{b}_{j}{S}_{j\left(k+1\right)}\text{; \hspace{1em}}SSD=SSTSSR$

(e) 
The degrees of freedom attributable to the regression, $DFR$, the degrees of freedom of deviations about the regression, $DFD$, and the total degrees of freedom, $DFT$:

(f) 
The mean square attributable to the regression, $MSR$, and the mean square of deviations about the regression, $MSD$:

(g) 
The $F$ values for the analysis of variance:

(h) 
The standard error estimate:

(i) 
The coefficient of multiple correlation, $R$, the coefficient of multiple determination, ${R}^{2}$ and the coefficient of multiple determination corrected for the degrees of freedom, ${\stackrel{}{R}}^{2}$;

(j) 
The standard error of the regression coefficients:

(k) 
The $t$ values for the regression coefficients:

(l) 
The regression constant, $a$, its standard error, $se\left(a\right)$, and its $t$ value, $t\left(a\right)$:

References
Draper N R and Smith H (1985) Applied Regression Analysis (2nd Edition) Wiley
Error Indicators and Warnings
Accuracy
The accuracy of any regression method is almost entirely dependent on the accuracy of the matrix inversion method used. In
g02cg, it is the matrix of correlation coefficients rather than that of the sums of squares and crossproducts of deviations from means that is inverted; this means that all terms in the matrix for inversion are of a similar order, and reduces the scope for computational error. For details on absolute accuracy, the relevant section of the document describing the inversion method used,
(F04ABF not in this release), should be consulted.
g02da uses a different method, based on
(F04AMF not in this release), and that method may well prove more reliable numerically. It does not handle missing values, nor does it provide the same output as this method. (In particular it is necessary to include explicitly the constant in the regression equation as another ‘variable’.)
If, in calculating
$F$,
$t\left(a\right)$, or any of the
$t\left({b}_{i}\right)$
(see
[Description]), the numbers involved are such that the result would be outside the range of numbers which can be stored by the machine, then the answer is set to the largest quantity which can be stored as a real variable, by means of a call to
x02al.
Parallelism and Performance
Further Comments
Example
See Also