NAG Library Routine Document

g02mcf (lars_param)

1
Purpose

g02mcf calculates additional parameter estimates following Least Angle Regression (LARS), forward stagewise linear regression or Least Absolute Shrinkage and Selection Operator (LASSO) as performed by g02maf and g02mbf.

2
Specification

Fortran Interface
Subroutine g02mcf ( nstep, ip, b, ldb, fitsum, ktype, nk, lnk, nb, ldnb, ifail)
Integer, Intent (In):: nstep, ip, ldb, ktype, lnk, ldnb
Integer, Intent (Inout):: ifail
Real (Kind=nag_wp), Intent (In):: b(ldb,*), fitsum(6,nstep+1), nk(lnk)
Real (Kind=nag_wp), Intent (Inout):: nb(ldnb,*)
C Header Interface
#include <nagmk26.h>
void  g02mcf_ (const Integer *nstep, const Integer *ip, const double b[], const Integer *ldb, const double fitsum[], const Integer *ktype, const double nk[], const Integer *lnk, double nb[], const Integer *ldnb, Integer *ifail)

3
Description

g02maf and g02mbf fit either a LARS, forward stagewise linear regression, LASSO or positive LASSO model to a vector of n observed values, y = yi : i=1,2,,n  and an n×p design matrix X, where the jth column of X is given by the jth independent variable xj. The models are fit using the LARS algorithm of Efron et al. (2004).
GnuplotProduced by GNUPLOT 4.6 patchlevel 3 −1 0 1 2 3 4 0 20 40 60 80 100 120 140 160 180 200 220 Parameter Estimates (βkj) ||βk||1 gnuplot_plot_1 βk1 gnuplot_plot_2 βk2 gnuplot_plot_3 βk3 gnuplot_plot_4 βk4 gnuplot_plot_5 βk5 gnuplot_plot_6 βk6
Figure 1
The full solution path for all four of these models follow a similar pattern where the parameter estimate for a given variable is piecewise linear. One such path, for a LARS model with six variables p=6 can be seen in Figure 1. Both g02maf and g02mbf return the vector of p parameter estimates, βk, at K points along this path (so k=1,2,,K). Each point corresponds to a step of the LARS algorithm. The number of steps taken depends on the model being fitted. In the case of a LARS model, K=p and each step corresponds to a new variable being included in the model. In the case of the LASSO models, each step corresponds to either a new variable being included in the model or an existing variable being removed from the model; the value of K is therefore no longer bound by the number of parameters. For forward stagewise linear regression, each step no longer corresponds to the addition or removal of a variable; therefore the number of possible steps is often markedly greater than for a corresponding LASSO model.
g02mcf uses the piecewise linear nature of the solution path to predict the parameter estimates, β~, at a different point on this path. The location of the solution can either be defined in terms of a (fractional) step number or a function of the L1 norm of the parameter estimates.

4
References

Efron B, Hastie T, Johnstone I and Tibshirani R (2004) Least Angle Regression The Annals of Statistics (Volume 32) 2 407–499
Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction Springer (New York)
Tibshirani R (1996) Regression Shrinkage and Selection via the Lasso Journal of the Royal Statistics Society, Series B (Methodological) (Volume 58) 1 267–288
Weisberg S (1985) Applied Linear Regression Wiley

5
Arguments

1:     nstep – IntegerInput
On entry: K, the number of steps carried out in the model fitting process, as returned by g02maf and g02mbf.
Constraint: nstep0.
2:     ip – IntegerInput
On entry: p, number of parameter estimates, as returned by g02maf and g02mbf.
Constraint: ip1.
3:     bldb* – Real (Kind=nag_wp) arrayInput
Note: the second dimension of the array b must be at least nstep+1.
On entry: β the parameter estimates, as returned by g02maf and g02mbf, with bjk=βkj, the parameter estimate for the jth variable, for j=1,2,,p, at the kth step of the model fitting process.
Constraint: b should be unchanged since the last call to g02maf or g02mbf.
4:     ldb – IntegerInput
On entry: the first dimension of the array b as declared in the (sub)program from which g02mcf is called.
Constraint: ldbip.
5:     fitsum6nstep+1 – Real (Kind=nag_wp) arrayInput
On entry: summaries of the model fitting process, as returned by g02maf and g02mbf.
Constraint: fitsum should be unchanged since the last call to g02maf or g02mbf..
6:     ktype – IntegerInput
On entry: indicates what target values are held in nk.
ktype=1
nk holds (fractional) LARS step numbers.
ktype=2
nk holds values for L1 norm of the (scaled) parameters.
ktype=3
nk holds ratios with respect to the largest (scaled) L1 norm.
ktype=4
nk holds values for the L1 norm of the (unscaled) parameters.
ktype=5
nk holds ratios with respect to the largest (unscaled) L1 norm.
If g02maf was called with pred=0 or 1 or g02mbf was called with pred=0 then the model fitting routine did not rescale the independent variables, X, prior to fitting the model and therefore there is no difference between ktype=2 or 3 and ktype=4 or 5.
Constraint: ktype=1, 2, 3, 4 or 5.
7:     nklnk – Real (Kind=nag_wp) arrayInput
On entry: target values used for predicting the new set of parameter estimates.
Constraints:
  • if ktype=1, 0nkinstep, for i=1,2,,lnk;
  • if ktype=2, 0nkifitsum1nstep, for i=1,2,,lnk;
  • if ktype=3 or 5, 0nki1, for i=1,2,,lnk;
  • if ktype=4, 0nkiβK1, for i=1,2,,lnk.
8:     lnk – IntegerInput
On entry: number of values supplied in nk.
Constraint: lnk1.
9:     nbldnb* – Real (Kind=nag_wp) arrayOutput
Note: the second dimension of the array nb must be at least lnk.
On exit: β~ the predicted parameter estimates, with bji=β~ij, the parameter estimate for variable j, j=1,2,,p at the point in the fitting process associated with nki, i=1,2,,lnk.
10:   ldnb – IntegerInput
On entry: the first dimension of the array nb as declared in the (sub)program from which g02mcf is called.
Constraint: ldnbip.
11:   ifail – IntegerInput/Output
On entry: ifail must be set to 0, -1 or 1. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1 or 1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, because for this routine the values of the output arguments may be useful even if ifail0 on exit, the recommended value is -1. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6
Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Note: g02mcf may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
ifail=11
On entry, nstep=value.
Constraint: nstep0.
ifail=21
On entry, ip=value.
Constraint: ip1.
ifail=31
b has been corrupted since the last call to g02maf or g02mbf.
ifail=41
On entry, ldb=value and ip=value
Constraint: ldbip.
ifail=51
fitsum has been corrupted since the last call to g02maf or g02mbf.
ifail=61
On entry, ktype=value.
Constraint: ktype=1, 2, 3, 4 or 5.
ifail=71
On entry, ktype=1, nkvalue=value and nstep=value
Constraint: 0nkinstep, for all i.
ifail=72
On entry, ktype=2, nkvalue=value, nstep=value and fitsum1nstep=value.
Constraint: 0nkifitsum1nstep, for all i.
ifail=73
On entry, ktype=3 or 5, nkvalue=value.
Constraint: 0nki1, for all i.
ifail=74
On entry, ktype=4, nkvalue=value and βK1=value
Constraint: 0nkiβK1, for all i.
ifail=81
On entry, lnk=value.
Constraint: lnk1.
ifail=101
On entry, ldnb=value and ip=value.
Constraint: ldnbip.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7
Accuracy

Not applicable.

8
Parallelism and Performance

g02mcf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
g02mcf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9
Further Comments

None.

10
Example

This example performs a LARS on a set a simulated dataset with 20 observations and 6 independent variables.
Additional parameter estimates are obtained corresponding to a LARS step number of 0.2,1.2,3.2,4.5 and 5.2. Where, for example, 4.5 corresponds to the solution halfway between that obtained at step 4 and that obtained at step 5.

10.1
Program Text

Program Text (g02mcfe.f90)

10.2
Program Data

Program Data (g02mcfe.d)

10.3
Program Results

Program Results (g02mcfe.r)