g02bu calculates the sample means and sums of squares and cross-products, or sums of squares and cross-products of deviations from the mean, in a single pass for a set of data. The data may be weighted.

# Syntax

C#
public static void g02bu(
string mean,
string weight,
int n,
int m,
double[,] x,
double[] wt,
out double sw,
double[] wmean,
double[] c,
out int ifail
)
Visual Basic
Public Shared Sub g02bu ( _
mean As String, _
weight As String, _
n As Integer, _
m As Integer, _
x As Double(,), _
wt As Double(), _
<OutAttribute> ByRef sw As Double, _
wmean As Double(), _
c As Double(), _
<OutAttribute> ByRef ifail As Integer _
)
Visual C++
public:
static void g02bu(
String^ mean,
String^ weight,
int n,
int m,
array<double,2>^ x,
array<double>^ wt,
[OutAttribute] double% sw,
array<double>^ wmean,
array<double>^ c,
[OutAttribute] int% ifail
)
F#
static member g02bu :
mean : string *
weight : string *
n : int *
m : int *
x : float[,] *
wt : float[] *
sw : float byref *
wmean : float[] *
c : float[] *
ifail : int byref -> unit

#### Parameters

mean
Type: System..::..String
On entry: indicates whether g02bu is to calculate sums of squares and cross-products, or sums of squares and cross-products of deviations about the mean.
${\mathbf{mean}}=\text{"M"}$
The sums of squares and cross-products of deviations about the mean are calculated.
${\mathbf{mean}}=\text{"Z"}$
The sums of squares and cross-products are calculated.
Constraint: ${\mathbf{mean}}=\text{"M"}$ or $\text{"Z"}$.
weight
Type: System..::..String
On entry: indicates whether the data is weighted or not.
${\mathbf{weight}}=\text{"U"}$
The calculations are performed on unweighted data.
${\mathbf{weight}}=\text{"W"}$
The calculations are performed on weighted data.
Constraint: ${\mathbf{weight}}=\text{"W"}$ or $\text{"U"}$.
n
Type: System..::..Int32
On entry: $n$, the number of observations in the dataset.
Constraint: ${\mathbf{n}}\ge 1$.
m
Type: System..::..Int32
On entry: $m$, the number of variables.
Constraint: ${\mathbf{m}}\ge 1$.
x
Type: array<System..::..Double,2>[,](,)[,][,]
An array of size [dim1, m]
Note: dim1 must satisfy the constraint: $\mathrm{dim1}\ge {\mathbf{n}}$
On entry: ${\mathbf{x}}\left[\mathit{i}-1,\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation on the $\mathit{j}$th variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
wt
Type: array<System..::..Double>[]()[][]
An array of size [dim1]
Note: the dimension of the array wt must be at least ${\mathbf{n}}$ if ${\mathbf{weight}}=\text{"W"}$, and at least $1$ otherwise.
On entry: the optional weights of each observation.
If ${\mathbf{weight}}=\text{"U"}$, wt is not referenced.
If ${\mathbf{weight}}=\text{"W"}$, ${\mathbf{wt}}\left[i-1\right]$ must contain the weight for the $i$th observation.
Constraint: if ${\mathbf{weight}}=\text{"W"}$, ${\mathbf{wt}}\left[\mathit{i}\right]\ge 0.0$, for $\mathit{i}=0,1,\dots ,n-1$.
sw
Type: System..::..Double%
On exit: the sum of weights.
If ${\mathbf{weight}}=\text{"U"}$, sw contains the number of observations, $n$.
wmean
Type: array<System..::..Double>[]()[][]
An array of size [m]
On exit: the sample means. ${\mathbf{wmean}}\left[j-1\right]$ contains the mean for the $j$th variable.
c
Type: array<System..::..Double>[]()[][]
An array of size [$\left({\mathbf{m}}×{\mathbf{m}}+{\mathbf{m}}\right)/2$]
On exit: the cross-products.
If ${\mathbf{mean}}=\text{"M"}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products of deviations about the mean.
If ${\mathbf{mean}}=\text{"Z"}$, c contains the upper triangular part of the matrix of (weighted) sums of squares and cross-products.
These are stored packed by columns, i.e., the cross-product between the $j$th and $k$th variable, $k\ge j$, is stored in ${\mathbf{c}}\left[k×\left(k-1\right)/2+j-1\right]$.
ifail
Type: System..::..Int32%
On exit: ${\mathbf{ifail}}={0}$ unless the method detects an error or a warning has been flagged (see [Error Indicators and Warnings]).

# Description

g02bu is an adaptation of West's WV2 algorithm; see West (1979). This method calculates the (optionally weighted) sample means and (optionally weighted) sums of squares and cross-products or sums of squares and cross-products of deviations from the (weighted) mean for a sample of $n$ observations on $m$ variables ${X}_{j}$, for $\mathit{j}=1,2,\dots ,m$. The algorithm makes a single pass through the data.
For the first $i-1$ observations let the mean of the $j$th variable be ${\stackrel{-}{x}}_{j}\left(i-1\right)$, the cross-product about the mean for the $j$th and $k$th variables be ${c}_{jk}\left(i-1\right)$ and the sum of weights be ${W}_{i-1}$. These are updated by the $i$th observation, ${x}_{ij}$, for $\mathit{j}=1,2,\dots ,m$, with weight ${w}_{i}$ as follows:
 $Wi=Wi-1+wix-ji=x-ji-1+wiWixj-x-ji-1, j=1,2,…,m$
and
 $cjki=cjki-1+wiWixj-x-ji-1xk-x-ki-1Wi-1, j=1,2,…,m​ and ​k=j,j+1,…,m.$
The algorithm is initialized by taking ${\stackrel{-}{x}}_{j}\left(1\right)={x}_{1j}$, the first observation, and ${c}_{ij}\left(1\right)=0.0$.
For the unweighted case ${w}_{i}=1$ and ${W}_{i}=i$ for all $i$.
Note that only the upper triangle of the matrix is calculated and returned packed by column.

# References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

# Error Indicators and Warnings

Errors or warnings detected by the method:
Some error messages may refer to parameters that are dropped from this interface (LDX) In these cases, an error in another parameter has usually caused an incorrect value to be inferred.
${\mathbf{ifail}}=1$
 On entry, ${\mathbf{m}}<1$, or ${\mathbf{n}}<1$,
${\mathbf{ifail}}=2$
 On entry, ${\mathbf{mean}}\ne \text{"M"}$ or $\text{"Z"}$.
${\mathbf{ifail}}=3$
 On entry, ${\mathbf{weight}}\ne \text{"W"}$ or $\text{"U"}$.
${\mathbf{ifail}}=-9000$
An error occured, see message report.
${\mathbf{ifail}}=-6000$
Invalid Parameters $〈\mathit{\text{value}}〉$
${\mathbf{ifail}}=-4000$
Invalid dimension for array $〈\mathit{\text{value}}〉$
${\mathbf{ifail}}=-8000$
Negative dimension for array $〈\mathit{\text{value}}〉$
${\mathbf{ifail}}=-6000$
Invalid Parameters $〈\mathit{\text{value}}〉$

# Accuracy

For a detailed discussion of the accuracy of this algorithm see Chan et al. (1982) or West (1979).

# Parallelism and Performance

None.

g02bw may be used to calculate the correlation coefficients from the cross-products of deviations about the mean. The cross-products of deviations about the mean may be scaled using (F06EDF not in this release) f06fd to give a variance-covariance matrix.
The means and cross-products produced by g02bu may be updated by adding or removing observations using g02bt.
Two sets of means and cross-products, as produced by g02bu, can be combined using (G02BZF not in this release).

# Example

A program to calculate the means, the required sums of squares and cross-products matrix, and the variance matrix for a set of $3$ observations of $3$ variables.

Example program (C#): g02bue.cs

Example program data: g02bue.d

Example program results: g02bue.r