NAG Library Function Document

nag_robust_m_estim_1var (g07dbc)

 Contents

    1  Purpose
    7  Accuracy

1
Purpose

nag_robust_m_estim_1var (g07dbc) computes an M -estimate of location with (optional) simultaneous estimation of the scale using Huber's algorithm.

2
Specification

#include <nag.h>
#include <nagg07.h>
void  nag_robust_m_estim_1var (Nag_SigmaSimulEst sigma_est, Integer n, const double x[], Nag_PsiFun psifun, double c, double h1, double h2, double h3, double dchi, double *theta, double *sigma, Integer maxit, double tol, double rs[], Integer *nit, double sorted_x[], NagError *fail)

3
Description

The data consists of a sample of size n , denoted by x 1 , x 2 , , x n , drawn from a random variable X .
The x i  are assumed to be independent with an unknown distribution function of the form
F x i - θ / σ  
where θ  is a location argument, and σ  is a scale argument. M -estimators of θ  and σ  are given by the solution to the following system of equations:
i=1 n ψ x i - θ ^ / σ ^ = 0 (1)
i=1 n χ x i - θ ^ / σ ^ = n-1 β (2)
where ψ  and χ  are given functions, and β  is a constant, such that σ ^  is an unbiased estimator when x i , for i=1,2,,n, has a normal distribution. Optionally, the second equation can be omitted and the first equation is solved for θ ^  using an assigned value of σ = σ c .
The values of ψ x i - θ ^ σ ^ σ ^  are known as the Winsorized residuals.
The following functions are available for ψ  and χ  in nag_robust_m_estim_1var (g07dbc):
(a) Null Weights
ψ t =t   χ t = t 2 2
Use of these null functions leads to the mean and standard deviation of the data.
(b) Huber's Function
ψ t = max-c,minc,t   χ t = t 2 2 t d
      χ t = d 2 2 t > d
(c) Hampel's Piecewise Linear Function
ψ h 1 , h 2 , h 3 t = - ψ h 1 , h 2 , h 3 -t    
  =t 0 t h 1 χ t = t 2 2 t d
  = h 1 h 1 t h 2  
  = h 1 h 3 - t / h 3 - h 2 h 2 t h 3 χ t = d 2 2 t > d
  =0 t > h 3  
(d) Andrew's Sine Wave Function
ψ t = sint -π t π χ t = t 2 2 t d
  =0 otherwise χ t = d 2 2 t > d
(e) Tukey's Bi-weight
ψ t = t 1 - t 2 2 t 1 χ t = t 2 2 t d
  =0 otherwise χ t = d 2 2 t > d
where c , h 1 , h 2 , h 3  and d  are constants.
Equations (1) and (2) are solved by a simple iterative procedure suggested by Huber:
σ ^ k = 1 β n-1 i=1 n χ x i - θ ^ k-1 σ ^ k-1 σ ^ k-1 2  
and
θ ^ k = θ ^ k-1 + 1 n i=1 n ψ x i - θ ^ k-1 σ ^ k σ ^ k  
or
σ ^ k = σ c , if ​ σ ​ is fixed.  
The initial values for θ ^  and σ ^  may either be user-supplied or calculated within nag_robust_m_estim_1var (g07dbc) as the sample median and an estimate of σ  based on the median absolute deviation respectively.
nag_robust_m_estim_1var (g07dbc) is based upon subroutine LYHALG within the ROBETH library, see Marazzi (1987).

4
References

Hampel F R, Ronchetti E M, Rousseeuw P J and Stahel W A (1986) Robust Statistics. The Approach Based on Influence Functions Wiley
Huber P J (1981) Robust Statistics Wiley
Marazzi A (1987) Subroutines for robust estimation of location and scale in ROBETH Cah. Rech. Doc. IUMSP, No. 3 ROB 1 Institut Universitaire de Médecine Sociale et Préventive, Lausanne

5
Arguments

1:     sigma_est Nag_SigmaSimulEstInput
On entry: the value assigned to sigma_est determines whether σ ^  is to be simultaneously estimated.
sigma_est=Nag_SigmaBypas
The estimation of σ ^  is bypassed and sigma is set equal to σ c ;
sigma_est=Nag_SigmaSimul
σ ^  is estimated simultaneously.
Constraint: sigma_est=Nag_SigmaBypas or Nag_SigmaSimul.
2:     n IntegerInput
On entry: the number of observations, n .
Constraint: n>1 .
3:     x[n] const doubleInput
On entry: the vector of observations, x 1 , x 2 , , x n .
4:     psifun Nag_PsiFunInput
On entry: which ψ  function is to be used.
psifun=Nag_Lsq
ψ t = t .  
psifun=Nag_HuberFun
Huber's function.
psifun=Nag_HampelFun
Hampel's piecewise linear function.
psifun=Nag_AndrewFun
Andrew's sine wave.
psifun=Nag_TukeyFun
Tukey's bi-weight.
Constraint: psifun=Nag_Lsq, Nag_HuberFun, Nag_HampelFun, Nag_AndrewFun or Nag_TukeyFun.
5:     c doubleInput
On entry: must specify the argument, c , of Huber's ψ  function, if psifun=Nag_HuberFun. c is not referenced if psifunNag_HuberFun.
Constraint: c>0.0  if psifun=Nag_HuberFun.
6:     h1 doubleInput
7:     h2 doubleInput
8:     h3 doubleInput
On entry: h1, h2, and h3 must specify the arguments h 1 , h 2 , and h 3 , of Hampel's piecewise linear ψ  function, if psifun=Nag_HampelFun. h1, h2, and h3 are not referenced if psifunNag_HampelFun.
Constraint: 0 h1 h2 h3  and h3>0.0  if psifun=Nag_HampelFun.
9:     dchi doubleInput
On entry: the argument, d , of the χ  function. dchi is not referenced if psifun=Nag_Lsq.
Constraint: dchi>0.0  if psifunNag_Lsq.
10:   theta double *Input/Output
On entry: if sigma>0  then theta must be set to the required starting value of the estimation of the location argument θ ^ . A reasonable initial value for θ ^  will often be the sample mean or median.
On exit: the M -estimate of the location argument, θ ^ .
11:   sigma double *Input/Output
The role of sigma depends on the value assigned to sigma_est (see above) as follows.
If sigma_est=Nag_SigmaSimul:
On entry: sigma must be assigned a value which determines the values of the starting points for the calculations of θ ^  and σ ^ . If sigma0.0  then nag_robust_m_estim_1var (g07dbc) will determine the starting points of θ ^  and σ ^ . Otherwise the value assigned to sigma will be taken as the starting point for σ ^ , and theta must be assigned a value before entry, see above.
If sigma_est=Nag_SigmaBypas:
On entry: sigma must be assigned a value which determines the value of σ c , which is held fixed during the iterations, and the starting value for the calculation of θ ^ . If sigma0 , then nag_robust_m_estim_1var (g07dbc) will determine the value of σ c  as the median absolute deviation adjusted to reduce bias (see G07DAF) and the starting point for θ ^ . Otherwise, the value assigned to sigma will be taken as the value of σ c  and theta must be assigned a relevant value before entry, see above.
On exit: sigma contains the M  – estimate of the scale argument, σ ^ , if sigma_est=Nag_SigmaSimul on entry, otherwise sigma will contain the initial fixed value σ c .
12:   maxit IntegerInput
On entry: the maximum number of iterations that should be used during the estimation.
Suggested value: p maxit=50 .
Constraint: maxit>0 .
13:   tol doubleInput
On entry: the relative precision for the final estimates. Convergence is assumed when the increments for theta, and sigma are less than tol × max1.0, σ k-1 .
Constraint: tol>0.0 .
14:   rs[n] doubleOutput
On exit: the Winsorized residuals.
15:   nit Integer *Output
On exit: the number of iterations that were used during the estimation.
16:   sorted_x[n] doubleOutput
On exit: if sigma0.0  on entry, sorted_x will contain the n  observations in ascending order.
17:   fail NagError *Input/Output
The NAG error argument (see Section 3.7 in How to Use the NAG Library and its Documentation).

6
Error Indicators and Warnings

NE_2_REAL_ENUM_ARG_CONS
On entry, h1=value , h2=value  and psifun=value . These arguments must satisfy h1h2 , psifun=Nag_HampelFun.
On entry, h1=value , h3=value  and psifun=value . These arguments must satisfy h1h3 , psifun=Nag_HampelFun.
On entry, h2=value , h3=value  and psifun=value . These arguments must satisfy h2h3 , psifun=Nag_HampelFun.
NE_3_REAL_ENUM_ARG_CONS
On entry, h1=value , h2=value , h3=value , psifun=value . These arguments must satisfy h1=h2 = h30.0 , psifun=Nag_HampelFun.
NE_ALL_ELEMENTS_EQUAL
On entry, all the values in the array x must not be equal.
NE_BAD_PARAM
On entry, argument psifun had an illegal value.
On entry, argument sigma_est had an illegal value.
NE_ESTIM_SIGMA_ZERO
The estimated value of sigma was 0.0  during an iteration.
NE_INT_ARG_LE
On entry, maxit=value.
Constraint: maxit>0.
On entry, n=value.
Constraint: n>1.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_REAL_ARG_LE
On entry, tol must not be less than or equal to 0.0: tol=value .
NE_REAL_ENUM_ARG_CONS
On entry, c=value , psifun=value . These arguments must satisfy c>0.0 , psifun=Nag_HuberFun.
On entry, dchi=value , psifun=value . These arguments must satisfy dchi>0.0 , psifunNag_Lsq.
On entry, h1=value , psifun=value . These arguments must satisfy h10.0 , psifun=Nag_HampelFun.
NE_TOO_MANY
Too many iterations (value ).
NE_WINS_RES_ZERO
The Winsorized residuals are zero.
On completion of the iterations, the Winsorized residuals were all zero. This may occur when using the sigma_est=Nag_SigmaBypas option with a redescending ψ  function, i.e., Hampel's piecewise linear function, Andrew's sine wave, and Tukey's biweight.
If the given value of σ  is too small, then the standardized residuals x i - θ ^ k σ c , will be large and all the residuals may fall into the region for which ψ t = 0 . This may incorrectly terminate the iterations thus making theta and sigma invalid.
Re-enter the function with a larger value of σ c  or with sigma_est=Nag_SigmaSimul.

7
Accuracy

On successful exit the accuracy of the results is related to the value of TOL, see Section 4.

8
Parallelism and Performance

nag_robust_m_estim_1var (g07dbc) is not threaded in any implementation.

9
Further Comments

When you supply the initial values, care has to be taken over the choice of the initial value of σ . If too small a value of σ  is chosen then initial values of the standardized residuals x i - θ ^ k σ  will be large. If the redescending ψ  functions are used, i.e., Hampel's piecewise linear function, Andrew's sine wave, or Tukey's bi-weight, then these large values of the standardized residuals are Winsorized as zero. If a sufficient number of the residuals fall into this category then a false solution may be returned, see Hampel et al. (1986).

10
Example

The following program reads in a set of data consisting of eleven observations of a variable X .
For this example, Hampels's Piecewise Linear Function is used (psifun=Nag_HampelFun), values for h 1 , h 2  and h 3  along with d  for the χ  function, being read from the data file.
Using the following starting values various estimates of θ  and σ  are calculated and printed along with the number of iterations used:
(a) nag_robust_m_estim_1var (g07dbc) determines the starting values, σ  is estimated simultaneously.
(b) You supply the starting values, σ  is estimated simultaneously.
(c) nag_robust_m_estim_1var (g07dbc) determines the starting values, σ  is fixed.
(d) You supply the starting values, σ  is fixed.

10.1
Program Text

Program Text (g07dbce.c)

10.2
Program Data

Program Data (g07dbce.d)

10.3
Program Results

Program Results (g07dbce.r)

© The Numerical Algorithms Group Ltd, Oxford, UK. 2017