NAG Library Function Document

nag_corr_cov (g02bxc)


    1  Purpose
    7  Accuracy


nag_corr_cov (g02bxc) calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.


#include <nag.h>
#include <nagg02.h>
void  nag_corr_cov (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)


For n  observations on m  variables the one-pass algorithm of West (1979) as implemented in nag_sum_sqs (g02buc) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for p  selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
x - j = i=1 n w i x ij i=1 n w i j = 1 , , p  
(b) The variance-covariance matrix
C jk = i=1 n w i x ij - x - j x ik - x - k i=1 n w i - 1 j , k = 1 , , p  
(c) The standard deviations
s j = C jj j = 1 , , p  
(d) The Pearson product-moment correlation coefficients
R jk = C jk C jj C kk j , k = 1 , , p  
where x ij  is the value of the i th observation on the j th variable and w i  is the weight for the i th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is i=1 n w i - 1 , so the weights should be scaled so that the sum of weights reflects the true sample size.


Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555


1:     n IntegerInput
On entry: the number of observations in the dataset, n .
Constraint: n>1 .
2:     m IntegerInput
On entry: the total number of variables, m .
Constraint: m1 .
3:     x[n×tdx] const doubleInput
On entry: the data x[i-1×tdx+j-1]  must contain the i th observation on the j th variable, x ij , for i=1,2,,n and j=1,2,,m.
4:     tdx IntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: tdxm .
5:     sx[m] const IntegerInput
On entry: indicates which p  variables to include in the analysis.
sx[j-1] > 0
The j th variable is to be included.
sx[j-1] = 0
The j th variable is not to be included.
sx is set to NULL
All variables are included in the analysis, i.e., p=m .
Constraint: sx[i] 0 , for i=1,2,,m.
6:     wt[n] const doubleInput
On entry: w, the optional frequency weighting for each observation, with wt[i-1]=wi. Usually wi will be an integral value corresponding to the number of observations associated with the i th data value, or zero if the i th data value is to be ignored. If wt is NULL then wi is set to 1 for all i.
Constraint: if wt is not NULL, i=1 n wt[i-1]>1.0, wt[i-1]0.0, for i=1,2,,n.
7:     sw double *Output
On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations, n .
8:     wmean[m] doubleOutput
On exit: the sample means. wmean[j-1]  contains the mean for the j th variable.
9:     std[m] doubleOutput
On exit: the standard deviations. std[j-1]  contains the standard deviation for the j th variable.
10:   r[m×tdr] doubleOutput
On exit: the matrix of Pearson product-moment correlation coefficients. r[j-1×tdr+k-1]  contains the correlation between variables j  and k , for j , k = 1 , , p .
11:   tdr IntegerInput
On entry: the stride separating matrix column elements in the array r.
Constraint: tdrm .
12:   v[m×tdv] doubleOutput
On exit: the variance-covariance matrix. v[j-1×tdv+k-1]  contains the covariance between variables j  and k , for j , k = 1 , , p .
13:   tdv IntegerInput
On entry: the stride separating matrix column elements in the array v.
Constraint: tdvm .
14:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

Error Indicators and Warnings

On entry, tdr=value  while m=value .
The arguments must satisfy tdrm .
On entry, tdv=value  while m=value . These arguments must satisfy tdvm .
On entry, tdx=value  while m=value . These arguments must satisfy tdxm .
Dynamic memory allocation failed.
On entry, n must be greater than 1: n=value .
On entry, m=value.
Constraint: m1.
On entry, at least one element of sx is negative.
On entry, at least one of the weights is negative.
On entry, no element of sx is positive.
On entry, the sum of weights is less than 1.0.
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.


For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

Parallelism and Performance

nag_corr_cov (g02bxc) is not threaded in any implementation.

Further Comments

Correlation coefficients based on ranks can be computed using nag_ken_spe_corr_coeff (g02brc).


A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

Program Text

Program Text (g02bxce.c)

Program Data

Program Data (g02bxce.d)

Program Results

Program Results (g02bxce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017