NAG Library Function Document

nag_corr_cov (g02bxc)

 Contents

    1  Purpose
    7  Accuracy

1
Purpose

nag_corr_cov (g02bxc) calculates the Pearson product-moment correlation coefficients and the variance-covariance matrix for a set of data. Weights may be used.

2
Specification

#include <nag.h>
#include <nagg02.h>
void  nag_corr_cov (Integer n, Integer m, const double x[], Integer tdx, const Integer sx[], const double wt[], double *sw, double wmean[], double std[], double r[], Integer tdr, double v[], Integer tdv, NagError *fail)

3
Description

For n  observations on m  variables the one-pass algorithm of West (1979) as implemented in nag_sum_sqs (g02buc) is used to compute the means, the standard deviations, the variance-covariance matrix, and the Pearson product-moment correlation matrix for p  selected variables. Suitables weights may be used to indicate multiple observations and to remove missing values. The quantities are defined by:
(a) The means
x - j = i=1 n w i x ij i=1 n w i j = 1 , , p  
(b) The variance-covariance matrix
C jk = i=1 n w i x ij - x - j x ik - x - k i=1 n w i - 1 j , k = 1 , , p  
(c) The standard deviations
s j = C jj j = 1 , , p  
(d) The Pearson product-moment correlation coefficients
R jk = C jk C jj C kk j , k = 1 , , p  
where x ij  is the value of the i th observation on the j th variable and w i  is the weight for the i th observation which will be 1 in the unweighted case.
Note that the denominator for the variance-covariance is i=1 n w i - 1 , so the weights should be scaled so that the sum of weights reflects the true sample size.

4
References

Chan T F, Golub G H and Leveque R J (1982) Updating Formulae and a Pairwise Algorithm for Computing Sample Variances Compstat, Physica-Verlag
West D H D (1979) Updating mean and variance estimates: An improved method Comm. ACM 22 532–555

5
Arguments

1:     n IntegerInput
On entry: the number of observations in the dataset, n .
Constraint: n>1 .
2:     m IntegerInput
On entry: the total number of variables, m .
Constraint: m1 .
3:     x[n×tdx] const doubleInput
On entry: the data x[i-1×tdx+j-1]  must contain the i th observation on the j th variable, x ij , for i=1,2,,n and j=1,2,,m.
4:     tdx IntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: tdxm .
5:     sx[m] const IntegerInput
On entry: indicates which p  variables to include in the analysis.
sx[j-1] > 0
The j th variable is to be included.
sx[j-1] = 0
The j th variable is not to be included.
sx is set to NULL
All variables are included in the analysis, i.e., p=m .
Constraint: sx[i] 0 , for i=1,2,,m.
6:     wt[n] const doubleInput
On entry: w, the optional frequency weighting for each observation, with wt[i-1]=wi. Usually wi will be an integral value corresponding to the number of observations associated with the i th data value, or zero if the i th data value is to be ignored. If wt is NULL then wi is set to 1 for all i.
Constraint: if wt is not NULL, i=1 n wt[i-1]>1.0, wt[i-1]0.0, for i=1,2,,n.
7:     sw double *Output
On exit: the sum of weights if wt is not NULL, otherwise sw contains the number of observations, n .
8:     wmean[m] doubleOutput
On exit: the sample means. wmean[j-1]  contains the mean for the j th variable.
9:     std[m] doubleOutput
On exit: the standard deviations. std[j-1]  contains the standard deviation for the j th variable.
10:   r[m×tdr] doubleOutput
On exit: the matrix of Pearson product-moment correlation coefficients. r[j-1×tdr+k-1]  contains the correlation between variables j  and k , for j , k = 1 , , p .
11:   tdr IntegerInput
On entry: the stride separating matrix column elements in the array r.
Constraint: tdrm .
12:   v[m×tdv] doubleOutput
On exit: the variance-covariance matrix. v[j-1×tdv+k-1]  contains the covariance between variables j  and k , for j , k = 1 , , p .
13:   tdv IntegerInput
On entry: the stride separating matrix column elements in the array v.
Constraint: tdvm .
14:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_2_INT_ARG_LT
On entry, tdr=value  while m=value .
The arguments must satisfy tdrm .
On entry, tdv=value  while m=value . These arguments must satisfy tdvm .
On entry, tdx=value  while m=value . These arguments must satisfy tdxm .
NE_ALLOC_FAIL
Dynamic memory allocation failed.
NE_INT_ARG_LE
On entry, n must be greater than 1: n=value .
NE_INT_ARG_LT
On entry, m=value.
Constraint: m1.
NE_NEG_SX
On entry, at least one element of sx is negative.
NE_NEG_WEIGHT
On entry, at least one of the weights is negative.
NE_POS_SX
On entry, no element of sx is positive.
NE_SW_LT_ONE
On entry, the sum of weights is less than 1.0.
NE_VAR_EQ_ZERO
A variable has zero variance.
At least one variable has zero variance. In this case v and std are as calculated, but r will contain zero for any correlation involving a variable with zero variance.

7
Accuracy

For a discussion of the accuracy of the one pass algorithm see Chan et al. (1982) and West (1979).

8
Parallelism and Performance

nag_corr_cov (g02bxc) is not threaded in any implementation.

9
Further Comments

Correlation coefficients based on ranks can be computed using nag_ken_spe_corr_coeff (g02brc).

10
Example

A program to calculate the means, standard deviations, variance-covariance matrix and a matrix of Pearson product-moment correlation coefficients for a set of 3 observations of 3 variables.

10.1
Program Text

Program Text (g02bxce.c)

10.2
Program Data

Program Data (g02bxce.d)

10.3
Program Results

Program Results (g02bxce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017