NAG Library Function Document

nag_approx_quantiles_fixed (g01anc)

 Contents

    1  Purpose
    7  Accuracy

1
Purpose

nag_approx_quantiles_fixed (g01anc) finds approximate quantiles from a data stream of known size using an out-of-core algorithm.

2
Specification

#include <nag.h>
#include <nagg01.h>
void  nag_approx_quantiles_fixed (Integer *ind, Integer n, const double rv[], Integer nb, double eps, Integer *np, const double q[], double qv[], Integer nq, double rcomm[], Integer lrcomm, Integer icomm[], Integer licomm, NagError *fail)

3
Description

A quantile is a value which divides a frequency distribution such that there is a given proportion of data values below the quantile. For example, the median of a dataset is the 0.5 quantile because half the values are less than or equal to it.
nag_approx_quantiles_fixed (g01anc) uses a slightly modified version of an algorithm described in a paper by Zhang and Wang (2007) to determine ε-approximate quantiles of a data stream of n real values, where n is known. Given any quantile q0.0,1.0, an ε-approximate quantile is defined as an element in the data stream whose rank falls within q-εn,q+εn . In case of more than one ε-approximate quantile being available, the one closest to qn is returned.

4
References

Zhang Q and Wang W (2007) A fast algorithm for approximate quantiles in high speed data streams Proceedings of the 19th International Conference on Scientific and Statistical Database Management IEEE Computer Society 29

5
Arguments

1:     ind Integer *Input/Output
On entry: indicates the action required in the current call to nag_approx_quantiles_fixed (g01anc).
ind=0
Return the required length of rcomm and icomm in icomm[0] and icomm[1] respectively. n and eps must be set and licomm must be at least 2.
ind=1
Initialise the communication arrays and process the first nb values from the data stream as supplied in rv.
ind=2
Process the next block of nb values from the data stream. The calling program must update rv and (if required) nb, and re-enter nag_approx_quantiles_fixed (g01anc) with all other parameters unchanged.
ind=3
Calculate the nq ε-approximate quantiles specified in q. The calling program must set q and nq and re-enter nag_approx_quantiles_fixed (g01anc) with all other parameters unchanged. This option can be chosen only when npexp1.0/eps.
On exit: indicates output from a successful call.
ind=1
Lengths of rcomm and icomm have been returned in icomm[0] and icomm[1] respectively.
ind=2
nag_approx_quantiles_fixed (g01anc) has processed np data points and expects to be called again with additional data (i.e., np<n).
ind=3
nag_approx_quantiles_fixed (g01anc) has returned the requested ε-approximate quantiles in qv. These quantiles are based on np data points.
ind=4
Routine has processed all n data points (i.e., np=n).
Constraint: on entry ind=0, 1, 2 or 3.
2:     n IntegerInput
On entry: n, the total number of values in the data stream.
Constraint: n>0.
3:     rv[dim] const doubleInput
Note: the dimension, dim, of the array rv must be at least nb when ind=1 or 2.
On entry: if ind=1 or 2, the vector containing the current block of data, otherwise rv is not referenced.
4:     nb IntegerInput
On entry: if ind=1 or 2, the size of the current block of data. The size of blocks of data in array rv can vary; therefore nb can change between calls to nag_approx_quantiles_fixed (g01anc).
Constraint: if ind=1 or 2, nb>0.
5:     eps doubleInput
On entry: approximation factor ε.
Constraint: epsexp1.0/n ​ and ​eps1.0.
6:     np Integer *Output
On exit: the number of elements processed so far.
7:     q[dim] const doubleInput
Note: the dimension, dim, of the array q must be at least nq when ind=3.
On entry: if ind=3, the quantiles to be calculated, otherwise q is not referenced. Note that q[i]=0.0, corresponds to the minimum value and q[i]=1.0 to the maximum value.
Constraint: if ind=3, 0.0q[i-1]1.0, for i=1,2,,nq.
8:     qv[dim] doubleOutput
Note: the dimension, dim, of the array qv must be at least nq when ind=3.
On exit: if ind=3, qv[i] contains the ε-approximate quantiles specified by the value provided in q[i].
9:     nq IntegerInput
On entry: if ind=3, the number of quantiles requested, otherwise nq is not referenced.
Constraint: if ind=3, nq>0.
10:   rcomm[lrcomm] doubleCommunication Array
11:   lrcomm IntegerInput
On entry: the dimension of the array rcomm.
Constraint: if ind0, lrcomm must be at least equal to the value returned in icomm[0] by a call to nag_approx_quantiles_fixed (g01anc) with ind=0. This will not be more than x+2× minx, x/2.0 + 1 × log2n/x+1.0 + 1 , where x= max1, logeps×n / eps .
12:   icomm[licomm] IntegerCommunication Array
13:   licomm IntegerInput
On entry: the dimension of the array icomm.
Constraints:
  • if ind=0, licomm2;
  • otherwise licomm must be at least equal to the value returned in icomm[1] by a call to nag_approx_quantiles_fixed (g01anc) with ind=0. This will not be more than 2 × x+2 × minx, x/2.0 +1 × y + y + 6 , where x = max1, logeps×n / eps  and y = log2n/x+1.0 + 1 .
14:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
See Section 2.3.1.2 in How to Use the NAG Library and its Documentation for further information.
NE_ARRAY_SIZE
On entry, licomm is too small: licomm=value.
On entry, lrcomm is too small: lrcomm=value.
NE_BAD_PARAM
On entry, argument value had an illegal value.
NE_INT
On entry, ind=1 or 2 and nb=value.
Constraint: if ind=1 or 2 then nb>0.
On entry, ind=3 and nq=value.
Constraint: if ind=3 then nq>0.
On entry, ind=value.
Constraint: ind=0, 1, 2 or 3.
On entry, n=value.
Constraint: n>0.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
See Section 2.7.6 in How to Use the NAG Library and its Documentation for further information.
NE_NO_LICENCE
Your licence key may have expired or may not have been installed correctly.
See Section 2.7.5 in How to Use the NAG Library and its Documentation for further information.
NE_Q_OUT_OF_RANGE
On entry, ind=3 and q[value]=value.
Constraint: if ind=3 then 0.0q[i]1.0 for all i.
NE_REAL
On entry, eps=value.
Constraint: exp1.0/neps1.0.
NE_TOO_SMALL
Number of data elements streamed, value is not sufficient for a quantile query when eps=value.
Supply more data or reprocess the data with a higher eps value.

7
Accuracy

Not applicable.

8
Parallelism and Performance

nag_approx_quantiles_fixed (g01anc) is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the x06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this function. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9
Further Comments

The average time taken by nag_approx_quantiles_fixed (g01anc) is n log1/εlogεn .

10
Example

This example calculates ε-approximate quantile for q=0.25, 0.5 and 1.0 for a data stream of 60 values. The stream is read in four blocks of varying size.

10.1
Program Text

Program Text (g01ance.c)

10.2
Program Data

Program Data (g01ance.d)

10.3
Program Results

Program Results (g01ance.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017