The main aim of this chapter is to assist you in finding a function which approximates a set of data points. Typically the data contain random errors, as of experimental measurement, which need to be smoothed out. To seek an approximation to the data, it is first necessary to specify for the approximating function a mathematical form (a polynomial, for example) which contains a number of unspecified coefficients: the appropriate fitting method then derives for the coefficients the values which provide the best fit of that particular form. The chapter deals mainly with curve and surface fitting (i.e., fitting with functions of one and of two variables) when a polynomial or a cubic spline is used as the fitting function, since these cover the most common needs. However, fitting with other functions and/or more variables can be undertaken by means of general linear or nonlinear methods (some of which are contained in other chapters) depending on whether the coefficients in the function occur linearly or nonlinearly. Cases where a graph rather than a set of data points is given can be treated simply by first reading a suitable set of points from the graph.

The chapter also contains methods for evaluating, differentiating and integrating polynomial and spline curves and surfaces, once the numerical values of their coefficients have been determined.

There is, too, a method for computing a Padé approximant of a mathematical function (see [Padé Approximants] and [Padé Approximants]).

# Syntax

C# |
---|

public static class E02 |

Visual Basic |
---|

Public NotInheritable Class E02 |

Visual C++ |
---|

public ref class E02 abstract sealed |

F# |
---|

[<AbstractClassAttribute>] [<SealedAttribute>] type E02 = class end |

# Background to the Problems

# Preliminary Considerations

In the curve-fitting problems considered in this chapter, we have a dependent variable $y$ and an independent variable $x$, and we are given a set of data points $\left({x}_{\mathit{r}},{y}_{\mathit{r}}\right)$, for $\mathit{r}=1,2,\dots ,m$. The preliminary matters to be considered in this section will, for simplicity, be discussed in this context of curve-fitting problems. In fact, however, these considerations apply equally well to surface and higher-dimensional problems. Indeed, the discussion presented carries over essentially as it stands if, for these cases, we interpret $x$ as a vector of several independent variables and correspondingly each ${x}_{r}$ as a vector containing the $r$th data value of each independent variable.

We wish, then, to approximate the set of data points as closely as possible with a specified function, $f\left(x\right)$ say, which is as smooth as possible: $f\left(x\right)$ may, for example, be a polynomial. The requirements of smoothness and closeness conflict, however, and a balance has to be struck between them. Most often, the smoothness requirement is met simply by limiting the number of coefficients allowed in the fitting function – for example, by restricting the degree in the case of a polynomial. Given a particular number of coefficients in the function in question, the fitting methods of this chapter determine the values of the coefficients such that the ‘distance’ of the function from the data points is as small as possible. The necessary balance is struck when you compare a selection of such fits having different numbers of coefficients. If the number of coefficients is too low, the approximation to the data will be poor. If the number is too high, the fit will be too close to the data, essentially following the random errors and tending to have unwanted fluctuations between the data points. Between these extremes, there is often a group of fits all similarly close to the data points and then, particularly when least squares polynomials are used, the choice is clear: it is the fit from this group having the smallest number of coefficients.

You are in effect minimizing the smoothness measure (i.e., the number of coefficients) subject to the distance from the data points being acceptably small. Some of the methods, however, do this task themselves. They use a different measure of smoothness (in each case one that is continuous) and minimize it subject to the distance being less than a threshold specified by you. This is a much more automatic process, requiring only some experimentation with the threshold.

# Fitting criteria: norms

A measure of the above ‘distance’ between the set of data points and the function $f\left(x\right)$ is needed. The distance from a single data point $\left({x}_{r},{y}_{r}\right)$ to the function can simply be taken as

and is called the

and

respectively the ${\ell}_{1}$ norm, the ${\ell}_{2}$ norm and the ${\ell}_{\infty}$ norm.

$${\epsilon}_{r}={y}_{r}-f\left({x}_{r}\right)\text{,}$$ | (1) |

**residual**of the point. (With this definition, the residual is regarded as a function of the coefficients contained in $f\left(x\right)$; however, the term is also used to mean the particular value of ${\epsilon}_{r}$ which corresponds to the fitted values of the coefficients.) However, we need a measure of distance for the set of data points as a whole. Three different measures are used in the different methods (which measure to select, according to circumstances, is discussed later in this sub-section). With ${\epsilon}_{r}$ defined in (1), these measures, or**norms**, are$$\sum _{r=1}^{m}\left|{\epsilon}_{r}\right|\text{,}$$ | (2) |

$$\sqrt{\sum _{r=1}^{m}{\epsilon}_{r}^{2}}\text{,}$$ | (3) |

$$\underset{r}{\mathrm{max}}\phantom{\rule{0.25em}{0ex}}\left|{\epsilon}_{r}\right|\text{,}$$ | (4) |

Minimization of one or other of these norms usually provides the fitting criterion, the minimization being carried out with respect to the coefficients in the mathematical form used for $f\left(x\right)$: with respect to the ${b}_{i}$ for example if the mathematical form is the power series in (8) below. The fit which results from minimizing (2) is known as the ${\ell}_{1}$ fit, or the fit in the ${\ell}_{1}$ norm: that which results from minimizing (3) is the ${\ell}_{2}$ fit, the well-known least squares fit (minimizing (3) is equivalent to minimizing the square of (3), i.e., the sum of squares of residuals, and it is the latter which is used in practice), and that from minimizing (4) is the ${\ell}_{\infty}$, or minimax, fit.

Strictly speaking, implicit in the use of the above norms are the statistical assumptions that the random errors in the ${y}_{r}$ are independent of one another and that any errors in the ${x}_{r}$ are negligible by comparison. From this point of view, the use of the ${\ell}_{2}$ norm is appropriate when the random errors in the ${y}_{r}$ have a Normal distribution, and the ${\ell}_{\infty}$ norm is appropriate when they have a rectangular distribution, as when fitting a table of values rounded to a fixed number of decimal places. The ${\ell}_{1}$ norm is appropriate when the error distribution has its frequency function proportional to the negative exponential of the modulus of the normalized error – not a common situation.

However, it may be that you are indifferent to these statistical considerations, and simply seek a fit which can be assessed by inspection, perhaps visually from a graph of the results. In this event, the ${\ell}_{1}$ norm is particularly appropriate when the data are thought to contain some ‘wild’ points (since fitting in this norm tends to be unaffected by the presence of a small number of such points), though of course in simple situations you may prefer to identify and reject these points. The ${\ell}_{\infty}$ norm should be used only when the maximum residual is of particular concern, as may be the case for example when the data values have been obtained by accurate computation, as of a mathematical function. Generally, however, a method based on least squares should be preferred, as being computationally faster and usually providing more information on which to assess the results. In many problems the three fits will not differ significantly for practical purposes.

Some of the methods based on the ${\ell}_{2}$ norm do not minimize the norm itself but instead minimize some (intuitively acceptable) measure of smoothness subject to the norm being less than a user-specified threshold. These methods fit with cubic or bicubic splines (see (10) and (14) below) and the smoothing measures relate to the size of the discontinuities in their third derivatives. A much more automatic fitting procedure follows from this approach.

# Weighting of data points

The use of the above norms also assumes that the data values ${y}_{r}$ are of equal (absolute) accuracy. Some of the methods enable an allowance to be made to take account of differing accuracies. The allowance takes the form of ‘weights’ applied to the $y$ values so that those values known to be more accurate have a greater influence on the fit than others. These weights, to be supplied by you, should be calculated from estimates of the absolute accuracies of the $y$ values, these estimates being expressed as standard deviations, probable errors or some other measure which has the same dimensions as $y$. Specifically, for each ${y}_{r}$ the corresponding weight ${w}_{r}$ should be inversely proportional to the accuracy estimate of ${y}_{r}$. For example, if the percentage accuracy is the same for all ${y}_{r}$, then the absolute accuracy of ${y}_{r}$ is proportional to ${y}_{r}$ (assuming ${y}_{r}$ to be positive, as it usually is in such cases) and so ${w}_{\mathit{r}}=K/{y}_{\mathit{r}}$, for $\mathit{r}=1,2,\dots ,m$, for an arbitrary positive constant $K$. (This definition of weight is stressed because often weight is defined as the square of that used here.) The norms (2), (3) and (4) above are then replaced respectively by

and

Again it is the square of (6) which is used in practice rather than (6) itself.

$$\sum _{r=1}^{m}\left|{w}_{r}{\epsilon}_{r}\right|\text{,}$$ | (5) |

$$\sqrt{\sum _{r=1}^{m}{w}_{r}^{2}{\epsilon}_{r}^{2}}\text{,}$$ | (6) |

$$\underset{r}{\mathrm{max}}\phantom{\rule{0.25em}{0ex}}\left|{w}_{r}{\epsilon}_{r}\right|\text{.}$$ | (7) |

# Curve Fitting

When, as is commonly the case, the mathematical form of the fitting function is immaterial to the problem, polynomials and cubic splines are to be preferred because their simplicity and ease of handling confer substantial benefits. The

**cubic spline**is the more versatile of the two. It consists of a number of cubic polynomial segments joined end to end with continuity in first and second derivatives at the joins. The third derivative at the joins is in general discontinuous. The $x$ values of the joins are called**knots**, or, more precisely, interior knots. Their number determines the number of coefficients in the spline, just as the degree determines the number of coefficients in a polynomial.# Representation of polynomials

Two different forms for representing a polynomial are used in different methods. One is the usual power-series form

The other is the Chebyshev series form

where ${T}_{i}\left(x\right)$ is the Chebyshev polynomial of the first kind of degree $i$ (see page 9 of Cox and Hayes (1973)), and where the range of $x$ has been normalized to run from $-1$ to $+1$. The use of either form leads theoretically to the same fitted polynomial, but in practice results may differ substantially because of the effects of rounding error. The Chebyshev form is to be preferred, since it leads to much better accuracy in general, both in the computation of the coefficients and in the subsequent evaluation of the fitted polynomial at specified points. This form also has other advantages: for example, since the later terms in (9) generally decrease much more rapidly from left to right than do those in (8), the situation is more often encountered where the last terms are negligible and it is obvious that the degree of the polynomial can be reduced (note that on the interval $-1\le x\le 1$ for all $i$, ${T}_{i}\left(x\right)$ attains the value unity but never exceeds it, so that the coefficient ${a}_{i}$ gives directly the maximum value of the term containing it). If the power-series form is used it is most advisable to work with the variable $x$ normalized to the range $-1$ to $+1$, carrying out the normalization before entering the relevant method. This will often substantially improve computational accuracy.

$$f\left(x\right)\equiv {b}_{0}+{b}_{1}x+{b}_{2}{x}^{2}+\cdots +{b}_{k}{x}^{k}\text{.}$$ | (8) |

$$f\left(x\right)\equiv \frac{1}{2}{a}_{0}{T}_{0}\left(x\right)+{a}_{1}{T}_{1}\left(x\right)+{a}_{2}{T}_{2}\left(x\right)+\cdots +{a}_{k}{T}_{k}\left(x\right)\text{,}$$ | (9) |

# Representation of cubic splines

A cubic spline is represented in the form

where ${N}_{\mathit{i}}\left(x\right)$, for $\mathit{i}=1,2,\dots ,p$, is a normalized cubic B-spline (see Hayes (1974)). This form, also, has advantages of computational speed and accuracy over alternative representations.

$$f\left(x\right)\equiv {c}_{1}{N}_{1}\left(x\right)+{c}_{2}{N}_{2}\left(x\right)+\cdots +{c}_{p}{N}_{p}\left(x\right)\text{,}$$ | (10) |

# Surface Fitting

There are now two independent variables, and we shall denote these by $x$ and $y$. The dependent variable, which was denoted by $y$ in the curve-fitting case, will now be denoted by $f$. (This is a rather different notation from that indicated for the general-dimensional problem in the first paragraph of [Preliminary Considerations], but it has some advantages in presentation.)

Again, in the absence of contrary indications in the particular application being considered, polynomials and splines are the approximating functions most commonly used.

# Representation of bivariate polynomials

The type of bivariate polynomial currently considered in the chapter can be represented in either of the two forms

and

where ${T}_{i}\left(x\right)$ is the Chebyshev polynomial of the first kind of degree $i$ in the argument $x$ (see page 9 of Cox and Hayes (1973)), and correspondingly for ${T}_{j}\left(y\right)$. The prime on the two summation signs, following standard convention, indicates that the first term in each sum is halved, as shown for one variable in equation (9). The two forms (11) and (12) are mathematically equivalent, but again the Chebyshev form is to be preferred on numerical grounds, as discussed in [Representation of polynomials].

$$f\left(x,y\right)\equiv \sum _{i=0}^{k}\sum _{j=0}^{\ell}{b}_{ij}{x}^{i}{y}^{j}\text{,}$$ | (11) |

$$f\left(x,y\right)\equiv \underset{i=0}{\overset{k}{{\sum}^{\prime}}}\underset{j=0}{\overset{\ell}{{\sum}^{\prime}}}{a}_{ij}{T}_{i}\left(x\right){T}_{j}\left(y\right)\text{,}$$ | (12) |

# Bicubic splines: definition and representation

The bicubic spline is defined over a rectangle $R$ in the $\left(x,y\right)$ plane, the sides of $R$ being parallel to the $x$ and $y$ axes. $R$ is divided into rectangular panels, again by lines parallel to the axes. Over each panel the bicubic spline is a bicubic polynomial, that is it takes the form

Each of these polynomials joins the polynomials in adjacent panels with continuity up to the second derivative. The constant $x$ values of the dividing lines parallel to the $y$-axis form the set of interior knots for the variable $x$, corresponding precisely to the set of interior knots of a cubic spline. Similarly, the constant $y$ values of dividing lines parallel to the $x$-axis form the set of interior knots for the variable $y$. Instead of representing the bicubic spline in terms of the above set of bicubic polynomials, however, it is represented, for the sake of computational speed and accuracy, in the form

where ${M}_{\mathit{i}}\left(x\right)$, for $\mathit{i}=1,2,\dots ,p$, and ${N}_{\mathit{j}}\left(y\right)$, for $\mathit{j}=1,2,\dots ,q$, are normalized B-splines (see Hayes and Halliday (1974) for further details of bicubic splines and Hayes (1974) for normalized B-splines).

$$\sum _{i=0}^{3}\sum _{j=0}^{3}{a}_{ij}{x}^{i}{y}^{j}\text{.}$$ | (13) |

$$f\left(x,y\right)=\sum _{i=1}^{p}\sum _{j=1}^{q}{c}_{ij}{M}_{i}\left(x\right){N}_{j}\left(y\right)\text{,}$$ | (14) |

# General Linear and Nonlinear Fitting Functions

We have indicated earlier that, unless the data-fitting application under consideration specifically requires some other type of fit, a polynomial or a spline is usually to be preferred. Special methods for these types, in one and in two variables, are provided in this chapter. When the application does specify some other form of fitting, however, it may be treated by a method which deals with a general linear function, or by one for a general nonlinear function, depending on whether the coefficients occur linearly or nonlinearly.

The general linear fitting function can be written in the form

where $x$ is a vector of one or more independent variables, and the ${\varphi}_{i}$ are any given functions of these variables (though they must be linearly independent of one another if there is to be the possibility of a unique solution to the fitting problem). This is not intended to imply that each ${\varphi}_{i}$ is necessarily a function of all the variables: we may have, for example, that each ${\varphi}_{i}$ is a function of a different single variable, and even that one of the ${\varphi}_{i}$ is a constant. All that is required is that a value of each ${\varphi}_{i}\left(x\right)$ can be computed when a value of each independent variable is given.

$$f\left(x\right)\equiv {c}_{1}{\varphi}_{1}\left(x\right)+{c}_{2}{\varphi}_{2}\left(x\right)+\cdots +{c}_{p}{\varphi}_{p}\left(x\right)\text{,}$$ | (15) |

When the fitting function $f\left(x\right)$ is not linear in its coefficients, no more specific representation is available in general than $f\left(x\right)$ itself. However, we shall find it helpful later on to indicate the fact that $f\left(x\right)$ contains a number of coefficients (to be determined by the fitting process) by using instead the notation $f\left(x;c\right)$, where $c$ denotes the vector of coefficients. An example of a nonlinear fitting function is

which is in one variable and contains five coefficients. Note that here, as elsewhere in this Chapter Introduction, we use the term ‘coefficients’ to include all the quantities whose values are to be determined by the fitting process, not just those which occur linearly. We may observe that it is only the presence of the coefficients ${c}_{4}$ and ${c}_{5}$ which makes the form (16) nonlinear. If the values of these two coefficients were known beforehand, (16) would instead be a linear function which, in terms of the general linear form (15), has $p=3$ and

We may note also that polynomials and splines, such as (9) and (14), are themselves linear in their coefficients. Thus if, when fitting with these functions, a suitable special method is not available (as when more than two independent variables are involved or when fitting in the ${\ell}_{1}$ norm), it is appropriate to use a method designed for a general linear function.

$$f\left(x;c\right)\equiv {c}_{1}+{c}_{2}\mathrm{exp}\left(-{c}_{4}x\right)+{c}_{3}\mathrm{exp}\left(-{c}_{5}x\right)\text{,}$$ | (16) |

$${\varphi}_{1}\left(x\right)\equiv 1,{\varphi}_{2}\left(x\right)\equiv \mathrm{exp}\left(-{c}_{4}x\right),\text{and}{\varphi}_{3}\left(x\right)\equiv \mathrm{exp}\left(-{c}_{5}x\right)\text{.}$$ |

# Constrained Problems

So far, we have considered only fitting processes in which the values of the coefficients in the fitting function are determined by an unconstrained minimization of a particular norm. Some fitting problems, however, require that further restrictions be placed on the determination of the coefficient values. Sometimes these restrictions are contained explicitly in the formulation of the problem in the form of equalities or inequalities which the coefficients, or some function of them, must satisfy. For example, if the fitting function contains a term $A\mathrm{exp}\left(-kx\right)$, it may be required that $k\ge 0$. Often, however, the equality or inequality constraints relate to the value of the fitting function or its derivatives at specified values of the independent variable(s), but these too can be expressed in terms of the coefficients of the fitting function, and it is appropriate to do this if a general linear or nonlinear method is being used. For example, if the fitting function is that given in (10), the requirement that the first derivative of the function at $x={x}_{0}$ be non-negative can be expressed as

where the prime denotes differentiation with respect to $x$ and each derivative is evaluated at $x={x}_{0}$. On the other hand, if the requirement had been that the derivative at $x={x}_{0}$ be exactly zero, the inequality sign in (17) would be replaced by an equality.

$${c}_{1}{N}_{1}^{\prime}\left({x}_{0}\right)+{c}_{2}{N}_{2}^{\prime}\left({x}_{0}\right)+\cdots +{c}_{p}{N}_{p}^{\prime}\left({x}_{0}\right)\ge 0\text{,}$$ | (17) |

# Padé Approximants

A Padé approximant to a function $f\left(x\right)$ is a rational function (ratio of two polynomials) whose Maclaurin series expansion is the same as that of $f\left(x\right)$ up to and including the term in ${x}^{k}$, where $k$ is the sum of the degrees of the numerator and denominator of the approximant. Padé approximation can be a useful technique when values of a function are to be obtained from its Maclaurin series but convergence of the series is unacceptably slow or even nonexistent.

# References

Baker G A (1975)

*Essentials of Padé Approximants*Academic Press, New YorkCox M G and Hayes J G (1973) Curve fitting: a guide and suite of algorithms for the non-specialist user

*NPL Report NAC26*National Physical LaboratoryHayes J G (ed.) (1970)

*Numerical Approximation to Functions and Data*Athlone Press, LondonHayes J G (1974) Numerical methods for curve and surface fitting

*Bull. Inst. Math. Appl.***10**144–152Hayes J G and Halliday J (1974) The least squares fitting of cubic spline surfaces to general data sets

*J. Inst. Math. Appl.***14**89–103