This chapter provides facilities for investigating and modelling the statistical structure of series of observations collected at points in time. The models may then be used to forecast the series.
The chapter covers the following models and approaches.
1.  Univariate time series analysis, including autocorrelation functions and autoregressive moving average (ARMA) models. 
2.  Univariate spectral analysis. 
3.  Transfer function (multiinput) modelling, in which one time series is dependent on other time series. 
4.  Bivariate spectral methods including coherency, gain and input response functions. 
5.  Vector ARMA models for multivariate time series. 
6.  Kalman filter models. 
7.  GARCH models for volatility. 
8.  Inhomogeneous Time Series. 
Syntax
C# 

public static class G13 
Visual Basic 

Public NotInheritable Class G13 
Visual C++ 

public ref class G13 abstract sealed 
F# 

[<AbstractClassAttribute>] [<SealedAttribute>] type G13 = class end 
Background to the Problems
Univariate Analysis
Let the given time series be ${x}_{1},{x}_{2},\dots ,{x}_{n}$, where $n$ is its length. The structure which is intended to be investigated, and which may be most evident to the eye in a graph of the series, can be broadly described as:
(a)  trends, linear or possibly higherorder polynomial; 
(b)  seasonal patterns, associated with fixed integer seasonal periods. The presence of such seasonality and the period will normally be known a priori. The pattern may be fixed, or slowly varying from one season to another; 
(c)  cycles or waves of stable amplitude and period $p$ (from peak to peak). The period is not necessarily integer, the corresponding absolute frequency (cycles/time unit) being $f=1/p$ and angular frequency $\omega =2\pi f$. The cycle may be of pure sinusoidal form like $\mathrm{sin}\left(\omega t\right)$, or the presence of higher harmonic terms may be indicated, e.g., by asymmetry in the wave form; 
(d)  quasicycles, i.e., waves of fluctuating period and amplitude; and 
(e)  irregular statistical fluctuations and swings about the overall mean or trend. 
Trends, seasonal patterns, and cycles might be regarded as deterministic components following fixed mathematical equations, and the quasicycles and other statistical fluctuations as stochastic and describable by shortterm correlation structure. For a finite dataset it is not always easy to discriminate between these two types, and a common description using the class of autoregressive integrated movingaverage (ARIMA) models is now widely used. The form of these models is that of difference equations (or recurrence relations) relating present and past values of the series. You are referred to Box and Jenkins (1976) for a thorough account of these models and how to use them. We follow their notation and outline the recommended steps in ARIMA model building for which methods are available.
Transformations
If the variance of the observations in the series is not constant across the range of observations it may be useful to apply a variancestabilizing transformation to the series. A common situation is for the variance to increase with the magnitude of the observations and in this case typical transformations used are the log or square root transformation. A rangemean plot or standard deviationmean plot provides a quick and easy way of detecting nonconstant variance and of choosing, if required, a suitable transformation. These are plots of either the range or standard deviation of successive groups of observations against their means.
Differencing operations
These may be used to simplify the structure of a time series.
Firstorder differencing, i.e., forming the new series
will remove a linear trend. Firstorder seasonal differencing
eliminates a fixed seasonal pattern.
$$\nabla {x}_{t}={x}_{t}{x}_{t1}$$ 
$${\nabla}_{s}{x}_{t}={x}_{t}{x}_{ts}$$ 
These operations reflect the fact that it is often appropriate to model a time series in terms of changes from one value to another. Differencing is also therefore appropriate when the series has something of the nature of a random walk, which is by definition the accumulation of independent changes.
Differencing may be applied repeatedly to a series, giving
where $d$ and $D$ are the orders of differencing. The derived series ${w}_{t}$ will be shorter, of length $N=nds\times D$, and extend for $t=1+d+s\times D,\dots ,n$.
$${w}_{t}={\nabla}^{d}{\nabla}_{s}^{D}{x}_{t}$$ 
Sample autocorrelations
Given that a series has (possibly as a result of simplifying by differencing operations) a homogeneous appearance throughout its length, fluctuating with approximately constant variance about an overall mean level, it is appropriate to assume that its statistical properties are stationary. For most purposes the correlations ${\rho}_{k}$ between terms ${x}_{t},{x}_{t+k}$ or ${w}_{t},{w}_{t+k}$ separated by lag $k$ give an adequate description of the statistical structure and are estimated by the sample autocorrelation function (ACF) ${r}_{\mathit{k}}$, for $\mathit{k}=1,2,\dots $.
As described by Box and Jenkins (1976), these may be used to indicate which particular ARIMA model may be appropriate.
Partial autocorrelations
The information in the autocorrelations, ${\rho}_{k}$, may be presented in a different light by deriving from them the coefficients of the partial autocorrelation function (PACF) ${\varphi}_{\mathit{k},\mathit{k}}$, for $\mathit{k}=1,2,\dots $. ${\varphi}_{k,k}$ which measures the correlation between ${x}_{t}$ and ${x}_{t+k}$ conditional upon the intermediate values ${x}_{t+1},{x}_{t+2},\dots ,{x}_{t+k1}$. The corresponding sample values ${\hat{\varphi}}_{k,k}$ give further assistance in the selection of ARIMA models.
Both autocorrelation function (ACF) and PACF may be rapidly computed, particularly in comparison with the time taken to estimate ARIMA models.
Finite lag predictor coefficients and error variances
The partial autocorrelation coefficient ${\varphi}_{k,k}$ is determined as the final parameter in the minimum variance predictor of ${x}_{t}$ in terms of ${x}_{t1},{x}_{t2},\dots ,{x}_{tk}$,
where ${e}_{k,t}$ is the prediction error, and the first subscript $k$ of ${\varphi}_{k,i}$ and ${e}_{k,t}$ emphasizes the fact that the parameters will alter as $k$ increases. Moderately good estimates ${\hat{\varphi}}_{k,i}$ of ${\varphi}_{k,i}$ are obtained from the sample autocorrelation function (ACF), and after calculating the partial autocorrelation function (PACF) up to lag $L$, the successive values ${v}_{1},{v}_{2},\dots ,{v}_{L}$ of the prediction error variance estimates, ${v}_{k}=\mathrm{var}\left({e}_{k,t}\right)$, are available, together with the final values of the coefficients ${\hat{\varphi}}_{k,1},{\hat{\varphi}}_{k,2},\dots ,{\hat{\varphi}}_{k,L}$. If ${x}_{t}$ has nonzero mean, $\stackrel{}{x}$, it is adequate to use ${x}_{t}\stackrel{}{x}$ in place of ${x}_{t}$ in the prediction equation.
$${x}_{t}={\varphi}_{k,1}{x}_{t1}+{\varphi}_{k,2}{x}_{t2}+\cdots +{\varphi}_{k,k}{x}_{tk}+{e}_{k,t}$$ 
Although Box and Jenkins (1976) do not place great emphasis on these prediction coefficients, their use is advocated for example by Akaike (1971), who recommends selecting an optimal order of the predictor as the lag for which the final prediction error (FPE) criterion $\left(1+k/n\right){\left(1k/n\right)}^{1}{v}_{k}$ is a minimum.
ARIMA models
The correlation structure in stationary time series may often be represented by a model with a small number of parameters belonging to the autoregressive movingaverage (ARMA) class. If the stationary series ${w}_{t}$ has been derived by differencing from the original series ${x}_{t}$, then ${x}_{t}$ is said to follow an ARIMA model. Taking ${w}_{t}={\nabla}^{d}{x}_{t}$, the (nonseasonal) ARIMA $\left(p,d,q\right)$ model with $p$ autoregressive parameters ${\varphi}_{1},{\varphi}_{2},\dots ,{\varphi}_{p}$ and $q$ movingaverage parameters ${\theta}_{1},{\theta}_{2},\dots ,{\theta}_{q}$, represents the structure of ${w}_{t}$ by the equation
where ${a}_{t}$ is an uncorrelated series (white noise) with mean $0$ and constant variance ${\sigma}_{a}^{2}$. If ${w}_{t}$ has a nonzero mean $c$, then this is allowed for by replacing ${w}_{t},{w}_{t1},\dots \text{}$ by ${w}_{t}c,{w}_{t1}c,\dots \text{}$ in the model. Although $c$ is often estimated by the sample mean of ${w}_{t}$ this is not always optimal.
$${w}_{t}={\varphi}_{1}{w}_{t1}+\cdots +{\varphi}_{p}{w}_{tp}+{a}_{t}{\theta}_{1}{a}_{t1}\cdots {\theta}_{q}{a}_{tq}\text{,}$$  (1) 
A series generated by this model will only be stationary provided restrictions are placed on ${\varphi}_{1},{\varphi}_{2},\dots ,{\varphi}_{p}$ to avoid unstable growth of ${w}_{t}$. These are called stationarity constraints. The series ${a}_{t}$ may also be usefully interpreted as the linear innovations in ${x}_{t}$ (and in ${w}_{t}$), i.e., the error if ${x}_{t}$ were to be predicted using the information in all past values ${x}_{t1},{x}_{t2},\dots \text{}$, provided also that ${\theta}_{1},{\theta}_{2},\dots ,{\theta}_{q}$ satisfy invertibility constraints. This allows the series ${a}_{t}$ to be regenerated by rewriting the model equation as
For a series with shortterm correlation only, i.e., ${r}_{k}$ is not significant beyond some low lag $q$ (see Box and Jenkins (1976) for the statistical test), then the pure movingaverage model $\text{MA}\left(q\right)$ is appropriate, with no autoregressive parameters, i.e., $p=0$.
$${a}_{t}={w}_{t}{\varphi}_{1}{w}_{t1}\cdots {\varphi}_{p}{w}_{tp}+{\theta}_{1}{a}_{t1}+\cdots +{\theta}_{q}{a}_{tq}\text{.}$$  (2) 
Autoregressive parameters are appropriate when the autocorrelation function (ACF) pattern decays geometrically, or with a damped sinusoidal pattern which is associated with quasiperiodic behaviour in the series. If the sample partial autocorrelation function (PACF) ${\hat{\varphi}}_{k,k}$ is significant only up to some low lag $p$, then a pure autoregressive model $\text{AR}\left(p\right)$ is appropriate, with $q=0$. Otherwise movingaverage terms will need to be introduced, as well as autoregressive terms.
The seasonal ARIMA $\left(p,d,q,P,D,Q,s\right)$ model allows for correlation at lags which are multiples of the seasonal period $s$. Taking ${w}_{t}={\nabla}^{d}{\nabla}_{s}^{D}{x}_{t}$, the series is represented in a twostage manner via an intermediate series ${e}_{t}$:
where ${\Phi}_{i}$, ${\Theta}_{i}$ are the seasonal parameters and $P$ and $Q$ are the corresponding orders. Again, ${w}_{t}$ may be replaced by ${w}_{t}c$.
$${w}_{t}={\Phi}_{1}{w}_{ts}+\cdots +{\Phi}_{P}{w}_{ts\times P}+{e}_{t}{\Theta}_{1}{e}_{ts}\cdots {\Theta}_{Q}{e}_{ts\times Q}$$  (3) 
$${e}_{t}={\varphi}_{1}{e}_{t1}+\cdots +{\varphi}_{p}{e}_{tp}+{a}_{t}{\theta}_{1}{a}_{t1}\cdots {\theta}_{q}{a}_{tq}$$  (4) 
ARIMA model estimation
In theory, the parameters of an ARIMA model are determined by a sufficient number of autocorrelations ${\rho}_{1},{\rho}_{2},\dots \text{}$. Using the sample values ${r}_{1},{r}_{2},\dots \text{}$ in their place it is usually (but not always) possible to solve for the corresponding ARIMA parameters.
These are rapidly computed but are not fully efficient estimates, particularly if movingaverage parameters are present. They do provide useful preliminary values for an efficient but relatively slow iterative method of estimation. This is based on the least squares principle by which parameters are chosen to minimize the sum of squares of the innovations ${a}_{t}$, which are regenerated from the data using (2), or the reverse of (3) and (4) in the case of seasonal models.
Lack of knowledge of terms on the righthand side of (2), when $t=1,2,\dots ,\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(p,q\right)$, is overcome by introducing $q$ unknown series values ${w}_{0},{w}_{1},\dots ,{w}_{q1}$ which are estimated as nuisance parameters, and using correction for transient errors due to the autoregressive terms. If the data ${w}_{1},{w}_{2},\dots ,{w}_{N}=w$ is viewed as a single sample from a multivariate Normal density whose covariance matrix $V$ is a function of the ARIMA model parameters, then the exact likelihood of the parameters is
The least squares criterion as outlined above is equivalent to using the quadratic form
as an objective function to be minimized. Neglecting the term $\frac{1}{2}\mathrm{log}\leftV\right$ yields estimates which differ very little from the exact likelihood except in small samples, or in seasonal models with a small number of whole seasons contained in the data. In these cases bias in movingaverage parameters may cause them to stick at the boundary of their constraint region, resulting in failure of the estimation method.
$$\frac{1}{2}\mathrm{log}\leftV\right\frac{1}{2}{w}^{\mathrm{T}}{V}^{1}w\text{.}$$ 
$$QF={w}^{\mathrm{T}}{V}^{1}w$$ 
Approximate standard errors of the parameter estimates and the correlations between them are available after estimation.
The model residuals, ${\hat{a}}_{t}$, are the innovations resulting from the estimation and are usually examined for the presence of autocorrelation as a check on the adequacy of the model.
ARIMA model forecasting
An ARIMA model is particularly suited to extrapolation of a time series. The model equations are simply used for $t=n+1,n+2,\dots \text{}$ replacing the unknown future values of ${a}_{t}$ by zero. This produces future values of ${w}_{t}$, and if differencing has been used this process is reversed (the socalled integration part of ARIMA models) to construct future values of ${x}_{t}$.
Forecast error limits are easily deduced.
This process requires knowledge only of the model orders and parameters together with a limited set of the terms ${a}_{ti},{e}_{ti},{w}_{ti},{x}_{ti}$ which appear on the righthand side of the models (3) and (4) (and the differencing equations) when $t=n$. It does not require knowledge of the whole series.
We call this the state set. It is conveniently constituted after model estimation. Moreover, if new observations ${x}_{n+1},{x}_{n+2},\dots \text{}$ come to hand, then the model equations can easily be used to update the state set before constructing forecasts from the end of the new observations. This is particularly useful when forecasts are constructed on a regular basis. The new innovations ${a}_{n+1},{a}_{n+2},\dots \text{}$ may be compared with the residual standard deviation, ${\sigma}_{a}$, of the model used for forecasting, as a check that the model is continuing to forecast adequately.
Exponential Smoothing
Exponential smoothing is a relatively simple method of short term forecasting for a time series. A variety of different smoothing methods are possible, including; single exponential, Brown's double exponential, linear Holt (also called double exponential smoothing in some references), additive Holt–Winters and multiplicative Holt–Winters. The choice of smoothing method used depends on the characteristics of the time series. If the mean of the series is only slowly changing then single exponential smoothing may be suitable. If there is a trend in the time series, which itself may be slowly changing, then linear Holt smoothing may be suitable. If there is a seasonal component to the time series, e.g., daily or monthly data, then one of the two Holt–Winters methods may be suitable.
For a time series ${y}_{\mathit{t}}$, for $\mathit{t}=1,2,\dots ,n$, the five smoothing functions are defined by the following:
 Single Exponential Smoothing
$$\begin{array}{ccc}\hfill {m}_{t}& =& \alpha {y}_{t}+\left(1\alpha \right){m}_{t1}\hfill \\ \hfill {\hat{y}}_{t+f}& =& {m}_{t}\hfill \\ \hfill \mathrm{var}\left({\hat{y}}_{t+f}\right)& =& \mathrm{var}\left({\epsilon}_{t}\right)\left(1+\left(f1\right){\alpha}^{2}\right)\hfill \end{array}$$  Brown Double Exponential Smoothing
$$\begin{array}{ccc}\hfill {m}_{t}& =& \alpha {y}_{t}+\left(1\alpha \right){m}_{t1}\hfill \\ \hfill {r}_{t}& =& \alpha \left({m}_{t}{m}_{t1}\right)+\left(1\alpha \right){r}_{t1}\hfill \\ \hfill {\hat{y}}_{t+f}& =& {m}_{t}+\left(\left(f1\right)+1/\alpha \right){r}_{t}\hfill \\ \hfill \mathrm{var}\left({\hat{y}}_{t+f}\right)& =& \mathrm{var}\left({\epsilon}_{t}\right)\left(1+\sum _{\mathit{i}=0}^{f1}{\left(2\alpha +\left(i1\right){\alpha}^{2}\right)}^{2}\right)\hfill \end{array}$$  Linear Holt Smoothing
$$\begin{array}{ccc}\hfill {m}_{t}& =& \alpha {y}_{t}+\left(1\alpha \right)\left({m}_{t1}+\varphi {r}_{t1}\right)\hfill \\ \hfill {r}_{t}& =& \gamma \left({m}_{t}{m}_{t1}\right)+\left(1\gamma \right)\varphi {r}_{t1}\hfill \\ \hfill {\hat{y}}_{t+f}& =& {m}_{t}+\sum _{\mathit{i}=1}^{f}{\varphi}^{i}{r}_{t}\hfill \\ \hfill \mathrm{var}\left({\hat{y}}_{t+f}\right)& =& \mathrm{var}\left({\epsilon}_{t}\right)\left(1+\sum _{\mathit{i}=1}^{f1}{\left(\alpha +\frac{\alpha \gamma \varphi \left({\varphi}^{i}1\right)}{\left(\varphi 1\right)}\right)}^{2}\right)\hfill \end{array}$$  Additive Holt–Winters Smoothing
$$\begin{array}{ccc}\hfill {m}_{t}& =& \alpha \left({y}_{t}{s}_{tp}\right)+\left(1\alpha \right)\left({m}_{t1}+\varphi {r}_{t1}\right)\hfill \\ \hfill {r}_{t}& =& \gamma \left({m}_{t}{m}_{t1}\right)+\left(1\gamma \right)\varphi {r}_{t1}\hfill \\ \hfill {s}_{t}& =& \beta \left({y}_{t}{m}_{t}\right)+\left(1\beta \right){s}_{tp}\hfill \\ \hfill {\hat{y}}_{t+f}& =& {m}_{t}+\left(\sum _{\mathit{i}=1}^{f}{\varphi}^{i}{r}_{t}\right)+{s}_{tp}\hfill \\ \hfill \mathrm{var}\left({\hat{y}}_{t+f}\right)& =& \mathrm{var}\left({\epsilon}_{t}\right)\left(1+\sum _{\mathit{i}=1}^{f1}{\psi}_{i}^{2}\right)\hfill \\ \hfill {\psi}_{i}& =& \left\{\begin{array}{cc}0& \text{if}i\ge f\\ \alpha +\frac{\alpha \gamma \varphi \left({\varphi}^{i}1\right)}{\left(\varphi 1\right)}& \text{if}imodp\ne 0\\ \alpha +\frac{\alpha \gamma \varphi \left({\varphi}^{i}1\right)}{\left(\varphi 1\right)}+\beta \left(1\alpha \right)& \text{otherwise}\end{array}\right.\hfill \end{array}$$  Multiplicative Holt–Winters Smoothing
and $\psi $ is defined as in the additive Holt–Winters smoothing,
$$\begin{array}{ccc}\hfill {m}_{t}& =& \alpha {y}_{t}/{s}_{tp}+\left(1\alpha \right)\left({m}_{t1}+\varphi {r}_{t1}\right)\hfill \\ \hfill {r}_{t}& =& \gamma \left({m}_{t}{m}_{t1}\right)+\left(1\gamma \right)\varphi {r}_{t1}\hfill \\ \hfill {s}_{t}& =& \beta {y}_{t}/{m}_{t}+\left(1\beta \right){s}_{tp}\hfill \\ \hfill {\hat{y}}_{t+f}& =& \left({m}_{t}+\sum _{\mathit{i}=1}^{f}{\varphi}^{i}{r}_{t}\right)\times {s}_{tp}\hfill \\ \hfill \mathrm{var}\left({\hat{y}}_{t+f}\right)& =& \mathrm{var}\left({\epsilon}_{t}\right)\left(\sum _{\mathit{i}=0}^{\infty}\sum _{\mathit{j}=0}^{p1}{\left({\psi}_{j+ip}\frac{{s}_{t+f}}{{s}_{t+fj}}\right)}^{2}\right)\hfill \end{array}$$
The parameters, $\alpha $, $\beta $ and $\gamma $ control the amount of smoothing. The nearer these parameters are to one, the greater the emphasis on the current data point. Generally these parameters take values in the range $0.1$ to $0.3$. The linear Holt and two Holt–Winters smoothers include an additional parameter, $\varphi $, which acts as a trend dampener. For $0.0<\varphi <1.0$ the trend is dampened and for $\varphi >1.0$ the forecast function has an exponential trend, $\varphi =0.0$ removes the trend term from the forecast function and $\varphi =1.0$ does not dampen the trend.
For all methods, values for $\alpha $, $\beta $, $\gamma $ and $\psi $ can be chosen by trying different values and then visually comparing the results by plotting the fitted values along side the original data. Alternatively, for single exponential smoothing a suitable value for $\alpha $ can be obtained by fitting an $\mathrm{ARIMA}\left(0,1,1\right)$ model. For Brown's double exponential smoothing and linear Holt smoothing with no dampening, (i.e., $\varphi =1.0$), suitable values for $\alpha $ and, in the case of linear Holt smoothing, $\gamma $ can be obtained by fitting an $\mathrm{ARIMA}\left(0,2,2\right)$ model. Similarly, the linear Holt method, with $\varphi \ne 1.0$, can be expressed as an $\mathrm{ARIMA}\left(1,2,2\right)$ model and the additive Holt–Winters, with no dampening, ($\varphi =1.0$), can be expressed as a seasonal ARIMA model with order $p$ of the form $\mathrm{ARIMA}\left(0,1,p+1\right)\left(0,1,0\right)$. There is no similar procedure for obtaining parameter values for the multiplicative Holt–Winters method, or the additive Holt–Winters method with $\varphi \ne 1.0$. In these cases parameters could be selected by minimizing a measure of fit using nonlinear optimization.
Univariate Spectral Analysis
In describing a time series using spectral analysis the fundamental components are taken to be sinusoidal waves of the form $R\mathrm{cos}\left(\omega t+\varphi \right)$, which for a given angular frequency $\omega $, $0\le \omega \le \pi $, is specified by its amplitude $R>0$ and phase $\varphi $, $0\le \varphi <2\pi $. Thus in a time series of $n$ observations it is not possible to distinguish more than $n/2$ independent sinusoidal components. The frequency range $0\le \omega \le \pi $ is limited to the shortest wavelength of two sampling units because any wave of higher frequency is indistinguishable upon sampling (or is aliased with) a wave within this range. Spectral analysis follows the idea that for a series made up of a finite number of sine waves the amplitude of any component at frequency $\omega $ is given to order $1/n$ by
$${R}^{2}=\left(\frac{1}{{n}^{2}}\right){\left\sum _{t=1}^{n}{x}_{t}{e}^{i\omega t}\right}^{2}\text{.}$$ 
The sample spectrum
For a series ${x}_{1},{x}_{2},\dots ,{x}_{n}$ this is defined as
the scaling factor now being chosen in order that
i.e., the spectrum indicates how the sample variance (${\sigma}_{x}^{2}$) of the series is distributed over components in the frequency range $0\le \omega \le \pi $.
$${f}^{*}\left(\omega \right)=\left(\frac{1}{2n\pi}\right){\left\sum _{t=1}^{n}{x}_{t}{e}^{i\omega t}\right}^{2}\text{,}$$ 
$$2\underset{0}{\overset{\pi}{\int}}{f}^{*}\left(\omega \right)d\omega ={\sigma}_{x}^{2}\text{,}$$ 
It may be demonstrated that ${f}^{*}\left(\omega \right)$ is equivalently defined in terms of the sample ACF ${r}_{k}$ of the series as
where ${c}_{k}={\sigma}_{x}^{2}{r}_{k}$ are the sample autocovariance coefficients.
$${f}^{*}\left(\omega \right)=\left(\frac{1}{2\pi}\right)\left({c}_{0}+2\sum _{k=1}^{n1}{c}_{k}\mathrm{cos}\u200ak\omega \right)\text{,}$$ 
If the series ${x}_{t}$ does contain a deterministic sinusoidal component of amplitude $R$, this will be revealed in the sample spectrum as a sharp peak of approximate width $\pi /n$ and height $\left(n/2\pi \right){R}^{2}$. This is called the discrete part of the spectrum, the variance ${R}^{2}$ associated with this component being in effect concentrated at a single frequency.
If the series ${x}_{t}$ has no deterministic components, i.e., is purely stochastic being stationary with autocorrelation function (ACF) ${r}_{k}$, then with increasing sample size the expected value of ${f}^{*}\left(\omega \right)$ converges to the theoretical spectrum – the continuous part
where ${\gamma}_{k}$ are the theoretical autocovariances.
$$f\left(\omega \right)=\left(\frac{1}{2\pi}\right)\left({\gamma}_{0}+2\sum _{k=1}^{\infty}{\gamma}_{k}\mathrm{cos}\left(\omega k\right)\right)\text{,}$$ 
The sample spectrum does not however converge to this value but at each frequency point fluctuates about the theoretical spectrum with an exponential distribution, being independent at frequencies separated by an interval of $2\pi /n$ or more. Various devices are therefore employed to smooth the sample spectrum and reduce its variability. Much of the strength of spectral analysis derives from the fact that the error limits are multiplicative so that features may still show up as significant in a part of the spectrum which has a generally low level, whereas they are completely masked by other components in the original series. The spectrum can help to distinguish deterministic cyclical components from the stochastic quasicycle components which produce a broader peak in the spectrum. (The deterministic components can be removed by regression and the remaining part represented by an ARIMA model.)
A large discrete component in a spectrum can distort the continuous part over a large frequency range surrounding the corresponding peak. This may be alleviated at the cost of slightly broadening the peak by tapering a portion of the data at each end of the series with weights which decay smoothly to zero. It is usual to correct for the mean of the series and for any linear trend by simple regression, since they would similarly distort the spectrum.
Spectral smoothing by lag window
The estimate is calculated directly from the sample autocovariances ${c}_{k}$ as
the smoothing being induced by the lag window weights ${w}_{k}$ which extend up to a truncation lag $M$ which is generally much less than $n$. The smaller the value of $M$, the greater the degree of smoothing, the spectrum estimates being independent only at a wider frequency separation indicated by the bandwidth $b$ which is proportional to $1/M$. It is wise, however, to calculate the spectrum at intervals appreciably less than this. Although greater smoothing narrows the error limits, it can also distort the spectrum, particularly by flattening peaks and filling in troughs.
$$f\left(\omega \right)=\left(\frac{1}{2\pi}\right)\left({c}_{0}+2\sum _{k=1}^{M1}{w}_{k}{c}_{k}\mathrm{cos}\u200ak\omega \right)\text{,}$$ 
Direct spectral smoothing
The unsmoothed sample spectrum is calculated for a fine division of frequencies, then averaged over intervals centred on each frequency point for which the smoothed spectrum is required. This is usually at a coarser frequency division. The bandwidth corresponds to the width of the averaging interval.
Linear Lagged Relationships Between Time Series
We now consider the context in which one time series, called the dependent or output series, ${y}_{1},{y}_{2},\dots ,{y}_{n}$, is believed to depend on one or more explanatory or input series, e.g., ${x}_{1},{x}_{2},\dots ,{x}_{n}$. This dependency may follow a simple linear regression, e.g.,
or more generally may involve lagged values of the input
The sequence ${v}_{0},{v}_{1},{v}_{2},\dots \text{}$ is called the impulse response function (IRF) of the relationship. The term ${n}_{t}$ represents that part of ${y}_{t}$ which cannot be explained by the input, and it is assumed to follow a univariate ARIMA model. We call ${n}_{t}$ the (output) noise component of ${y}_{t}$, and it includes any constant term in the relationship. It is assumed that the input series, ${x}_{t}$, and the noise component, ${n}_{t}$, are independent.
$${y}_{t}=v{x}_{t}+{n}_{t}$$ 
$${y}_{t}={v}_{0}{x}_{t}+{v}_{1}{x}_{t1}+{v}_{2}{x}_{t2}+\cdots +{n}_{t}\text{.}$$ 
The part of ${y}_{t}$ which is explained by the input is called the input component ${z}_{t}$:
so ${y}_{t}={z}_{t}+{n}_{t}$.
$${z}_{t}={v}_{0}{x}_{t}+{v}_{1}{x}_{t1}+{v}_{2}{x}_{t2}+\cdots $$ 
The eventual aim is to model both these components of ${y}_{t}$ on the basis of observations of ${y}_{1},{y}_{2},\dots ,{y}_{n}$ and ${x}_{1},{x}_{2},\dots ,{x}_{n}$. In applications to forecasting or control both components are important. In general there may be more than one input series, e.g., ${x}_{1,t}$ and ${x}_{2,t}$, which are assumed to be independent and corresponding components ${z}_{1,t}$ and ${z}_{2,t}$, so
$${y}_{t}={z}_{1,t}+{z}_{2,t}+{n}_{t}\text{.}$$ 
Transfer function models
In a similar manner to that in which the structure of a univariate series may be represented by a finiteparameter ARIMA model, the structure of an input component may be represented by a transfer function (TF) model with delay time $b$, $p$ autoregressivelike parameters ${\delta}_{1},{\delta}_{2},\dots ,{\delta}_{p}$ and $q+1$ movingaveragelike parameters ${\omega}_{0},{\omega}_{1},\dots ,{\omega}_{q}$:
If $p>0$ this represents an impulse response function (IRF) which is infinite in extent and decays with geometric and/or sinusoidal behaviour. The parameters ${\delta}_{1},{\delta}_{2},\dots ,{\delta}_{p}$ are constrained to satisfy a stability condition identical to the stationarity condition of autoregressive models. There is no constraint on ${\omega}_{0},{\omega}_{1},\dots ,{\omega}_{q}$.
$${z}_{t}={\delta}_{1}{z}_{t1}+{\delta}_{2}{z}_{t2}+\cdots +{\delta}_{p}{z}_{tp}+{\omega}_{0}{x}_{tb}{\omega}_{1}{x}_{tb1}\cdots {\omega}_{q}{x}_{tbq}\text{.}$$  (5) 
Crosscorrelations
An important tool for investigating how an input series ${x}_{t}$ affects an output series ${y}_{t}$ is the sample crosscorrelation function (CCF) ${r}_{xy}\left(\mathit{k}\right)$, for $\mathit{k}=0,1,\dots $ between the series. If ${x}_{t}$ and ${y}_{t}$ are (jointly) stationary time series this is an estimator of the theoretical quantity
The sequence ${r}_{yx}\left(\mathit{k}\right)$, for $\mathit{k}=0,1,\dots $, is distinct from ${r}_{xy}\left(k\right)$, though it is possible to interpret
When the series ${y}_{t}$ and ${x}_{t}$ are believed to be related by a transfer function (TF) model, the CCF is determined by the impulse response function (IRF) ${v}_{0},{v}_{1},{v}_{2},\dots \text{}$ and the autocorrelation function (ACF) of the input ${x}_{t}$.
$${\rho}_{xy}\left(k\right)=\mathrm{corr}\left({x}_{t},{y}_{t+k}\right)\text{.}$$ 
$${r}_{yx}\left(k\right)={r}_{xy}\left(k\right)\text{.}$$ 
In the particular case when ${x}_{t}$ is an uncorrelated series or white noise (and is uncorrelated with any other inputs):
and the sample CCF can provide an estimate of ${v}_{k}$:
where ${s}_{y}$ and ${s}_{x}$ are the sample standard deviations of ${y}_{t}$ and ${x}_{t}$, respectively.
$${\rho}_{xy}\left(k\right)\propto {v}_{k}$$ 
$${\stackrel{~}{v}}_{k}=\left({s}_{y}/{s}_{x}\right){r}_{xy}\left(k\right)$$ 
In theory the IRF coefficients ${v}_{b},\dots ,{v}_{b+p+q}$ determine the parameters in the TF model, and using ${\stackrel{~}{v}}_{k}$ to estimate ${\stackrel{~}{v}}_{k}$ it is possible to solve for preliminary estimates of ${\delta}_{1},{\delta}_{2},\dots ,{\delta}_{p}$, ${\omega}_{0},{\omega}_{1},\dots ,{\omega}_{q}$.
Prewhitening or filtering by an ARIMA model
In general an input series ${x}_{t}$ is not white noise, but may be represented by an ARIMA model with innovations or residuals ${a}_{t}$ which are white noise. If precisely the same operations by which ${a}_{t}$ is generated from ${x}_{t}$ are applied to the output ${y}_{t}$ to produce a series ${b}_{t}$, then the transfer function relationship between ${y}_{t}$ and ${x}_{t}$ is preserved between ${b}_{t}$ and ${a}_{t}$. It is then possible to estimate
The procedure of generating ${a}_{t}$ from ${x}_{t}$ (and ${b}_{t}$ from ${y}_{t}$) is called prewhitening or filtering by an ARIMA model. Although ${a}_{t}$ is necessarily white noise, this is not generally true of ${b}_{t}$.
$${\stackrel{~}{v}}_{k}=\left({s}_{b}/{s}_{a}\right){r}_{ab}\left(k\right)\text{.}$$ 
Multiinput model estimation
The term multiinput model is used for the situation when one output series ${y}_{t}$ is related to one or more input series ${x}_{j,t}$, as described in [Linear Lagged Relationships Between Time Series]. If for a given input the relationship is a simple linear regression, it is called a simple input; otherwise it is a transfer function input. The error or noise term follows an ARIMA model.
Given that the orders of all the transfer function models and the ARIMA model of a multiinput model have been specified, the various parameters in those models may be (simultaneously) estimated.
The procedure used is closely related to the least squares principle applied to the innovations in the ARIMA noise model.
The innovations are derived for any proposed set of parameter values by calculating the response of each input to the transfer functions and then evaluating the noise ${n}_{t}$ as the difference between this response (combined for all the inputs) and the output. The innovations are derived from the noise using the ARIMA model in the same manner as for a univariate series, and as described in [ARIMA models].
In estimating the parameters, consideration has to be given to the lagged terms in the various model equations which are associated with times prior to the observation period, and are therefore unknown. The method descriptions provide the necessary detail as to how this problem is treated.
Also, as described in [ARIMA model estimation] the sum of squares criterion
is related to the quadratic form in the exact loglikelihood of the parameters:
Here $w$ is the vector of appropriately differenced noise terms, and
where ${\sigma}_{a}^{2}$ is the innovation variance parameter.
$$S=\sum {a}_{t}^{2}$$ 
$$\frac{1}{2}\mathrm{log}\leftV\right\frac{1}{2}{w}^{\mathrm{T}}{V}^{1}w\text{.}$$ 
$${w}^{\mathrm{T}}{V}^{1}w=S/{\sigma}_{a}^{2}\text{,}$$ 
The least squares criterion is therefore identical to minimization of the quadratic form, but is not identical to exact likelihood. Because $V$ may be expressed as $M{\sigma}_{a}^{2}$, where $M$ is a function of the ARIMA model parameters, substitution of ${\sigma}_{a}^{2}$ by its maximum likelihood (ML) estimator yields a concentrated (or profile) likelihood which is a function of
$N$ is the length of the differenced noise series $w$, and $\leftM\right=\mathrm{det}\u200aM$.
$${\leftM\right}^{1/N}S\text{.}$$ 
Use of the above quantity, called the deviance, $D$, as an objective function is preferable to the use of $S$ alone, on the grounds that it is equivalent to exact likelihood, and yields estimates with better properties. However, there is an appreciable computational penalty in calculating $D$, and in large samples it differs very little from $S$, except in the important case of seasonal ARIMA models where the number of whole seasons within the data length must also be large.
You are given the option of taking the objective function to be either $S$ or $D$, or a third possibility, the marginal likelihood. This is similar to exact likelihood but can counteract bias in the ARIMA model due to the fitting of a large number of simple inputs.
Approximate standard errors of the parameter estimates and the correlations between them are available after estimation.
The model residuals ${\hat{a}}_{t}$ are the innovations resulting from the estimation, and they are usually examined for the presence of either autocorrelation or crosscorrelation with the inputs. Absence of such correlation provides some confirmation of the adequacy of the model.
Multiinput model forecasting
A multiinput model may be used to forecast the output series provided future values (possibly forecasts) of the input series are supplied.
Construction of the forecasts requires knowledge only of the model orders and parameters, together with a limited set of the most recent variables which appear in the model equations. This is called the state set. It is conveniently constituted after model estimation. Moreover, if new observations ${y}_{n+1},{y}_{n+2},\dots \text{}$ of the output series and ${x}_{n+1},{x}_{n+2},\dots \text{}$ of (all) the independent input series become available, then the model equations can easily be used to update the state set before constructing forecasts from the end of the new observations. The new innovations ${a}_{n+1},{a}_{n+2},\dots \text{}$ generated in this updating may be used to monitor the continuing adequacy of the model.
Transfer function model filtering
In many time series applications it is desired to calculate the response (or output) of a transfer function (TF) model for a given input series.
Smoothing, detrending, and seasonal adjustment are typical applications. You must specify the orders and parameters of a TF model for the purpose being considered. This may then be applied to the input series.
Again, problems may arise due to ignorance of the input series values prior to the observation period. The transient errors which can arise from this may be substantially reduced by using ‘backforecasts’ of these unknown observations.
Multivariate Time Series
Multiinput modelling represents one output time series in terms of one or more input series. Although there are circumstances in which it may be more appropriate to analyse a set of time series by modelling each one in turn as the output series with the remainder as inputs, there is a more symmetric approach in such a context. These models are known as vector autoregressive movingaverage (VARMA) models.
Differencing and transforming a multivariate time series
As in the case of a univariate time series, it may be useful to simplify the series by differencing operations which may be used to remove linear or seasonal trends, thus ensuring that the resulting series to be used in the model estimation is stationary. It may also be necessary to apply transformations to the individual components of the multivariate series in order to stabilize the variance. Commonly used transformations are the log and square root transformations.
Model identification for a multivariate time series
Multivariate analogues of the autocorrelation and partial autocorrelation functions are available for analysing a set of $k$ time series, ${x}_{\mathit{i},1},{x}_{\mathit{i},2},\dots ,{x}_{\mathit{i},n}$, for $\mathit{i}=1,2,\dots ,k$, thereby making it possible to obtain some understanding of a suitable VARMA model for the observed series.
It is assumed that the time series have been differenced if necessary, and that they are jointly stationary. The lagged correlations between all possible pairs of series, i.e.,
are then taken to provide an adequate description of the statistical relationships between the series. These quantities are estimated by sample auto and crosscorrelations ${r}_{ijl}$. For each $l$ these may be viewed as elements of a (lagged) autocorrelation matrix.
$${\rho}_{ijl}=\mathrm{corr}\left({x}_{i,t},{x}_{j,t+l}\right)$$ 
Thus consider the vector process ${x}_{t}$ (with elements ${x}_{it}$) and lagged autocovariance matrices ${\Gamma}_{l}$ with elements of ${\sigma}_{i}{\sigma}_{j}{\rho}_{ijl}$ where ${\sigma}_{i}^{2}=\mathrm{var}\left({x}_{i,t}\right)$. Correspondingly, ${\Gamma}_{l}$ is estimated by the matrix ${C}_{l}$ with elements ${s}_{i}{s}_{j}{r}_{ijl}$ where ${s}_{i}^{2}$ is the sample variance of ${x}_{it}$.
For a series with shortterm crosscorrelation only, i.e., ${r}_{ijl}$ is not significant beyond some low lag $q$, then the pure vector $\text{MA}\left(q\right)$ model, with no autoregressive parameters, i.e., $p=0$, is appropriate.
The correlation matrices provide a description of the joint statistical properties of the series. It is also possible to calculate matrix quantities which are closely analogous to the partial autocorrelations of univariate series (see [Partial autocorrelations]). Wei (1990) discusses both the partial autoregression matrices proposed by Tiao and Box (1981) and partial lag correlation matrices.
In the univariate case the partial autocorrelation function (PACF) between ${x}_{t}$ and ${x}_{t+l}$ is the correlation coefficient between the two after removing the linear dependence on each of the intervening variables ${x}_{t+1},{x}_{t+2},\dots ,{x}_{t+l1}$. This partial autocorrelation may also be obtained as the last regression coefficient associated with ${x}_{t}$ when regressing ${x}_{t+l}$ on its $l$ lagged variables ${x}_{t+l1},{x}_{t+l2},\dots ,{x}_{t}$. Tiao and Box (1981) extended this method to the multivariate case to define the partial autoregression matrix. Heyse and Wei (1985) also extended the univariate definition of the PACF to derive the correlation matrix between the vectors ${x}_{t}$ and ${x}_{t+l}$ after removing the linear dependence on each of the intervening vectors ${x}_{t+1},{x}_{t+2},\dots ,{x}_{t+l1}$, the partial lag correlation matrix.
Note that the partial lag correlation matrix is a correlation coefficient matrix since each of its elements is a properly normalized correlation coefficient. This is not true of the partial autoregression matrices (except in the univariate case for which the two types of matrix are the same). The partial lag correlation matrix at lag $1$ also reduces to the regular correlation matrix at lag $1$; this is not true of the partial autoregression matrices (again except in the univariate case).
Both the above share the same cutoff property for autoregressive processes; that is for an autoregressive process of order $p$, the terms of the matrix at lags $p+1$ and greater are zero. Thus if the sample partial crosscorrelations are significant only up to some low lag $p$ then a pure vector $\text{AR}\left(p\right)$ model is appropriate with $q=0$. Otherwise movingaverage terms will need to be introduced as well as autoregressive terms.
Under the hypothesis that ${x}_{t}$ is an autoregressive process of order $l1$, $n$ times the sum of the squared elements of the partial lag correlation matrix at lag $l$ is asymptotically distributed as a ${\chi}^{2}$ variable with ${k}^{2}$ degrees of freedom where $k$ is the dimension of the multivariate time series. This provides a diagnostic aid for determining the order of an autoregressive model.
The partial autoregression matrices may be found by solving a multivariate version of the Yule–Walker equations to find the autoregression matrices, using the final regression matrix coefficient as the partial autoregression matrix at that particular lag.
The basis of these calculations is a multivariate autoregressive model:
where ${\varphi}_{l,1},{\varphi}_{l,2},\dots ,{\varphi}_{l,l}$ are matrix coefficients, and ${e}_{l,t}$ is the vector of errors in the prediction. These coefficients may be rapidly computed using a recursive technique which requires, and simultaneously furnishes, a backward prediction equation:
(in the univariate case ${\psi}_{l,i}={\varphi}_{l,i}$).
$${x}_{t}={\varphi}_{l,1}{x}_{t1}+\cdots +{\varphi}_{l,l}{x}_{tl}+{e}_{l,t}$$ 
$${x}_{tl1}={\psi}_{l,1}{x}_{tl}+{\psi}_{l,2}{x}_{tl+1}+\cdots +{\psi}_{l,l}{x}_{t1}+{f}_{l,t}$$ 
The forward prediction equation coefficients, ${\varphi}_{l,i}$, are of direct interest, together with the covariance matrix ${D}_{l}$ of the prediction errors ${e}_{l,t}$. The calculation of these quantities for a particular maximum equation lag $l=L$ involves calculation of the same quantities for increasing values of $l=1,2,\dots ,L$.
The quantities ${v}_{l}=\mathrm{det}\u200a{D}_{l}/\mathrm{det}\u200a{\Gamma}_{0}$ may be viewed as generalized variance ratios, and provide a measure of the efficiency of prediction (the smaller the better). The reduction from ${v}_{l1}$ to ${v}_{l}$ which occurs on extending the order of the predictor to $l$ may be represented as
where ${\rho}_{l}^{2}$ is a multiple squared partial autocorrelation coefficient associated with ${k}^{2}$ degrees of freedom.
$${v}_{l}={v}_{l1}\left(1{\rho}_{l}^{2}\right)$$ 
Sample estimates of all the above quantities may be derived by using the series covariance matrices ${C}_{\mathit{l}}$, for $\mathit{l}=1,2,\dots ,L$, in place of ${\Gamma}_{l}$. The best lag for prediction purposes may be chosen as that which yields the minimum final prediction error (FPE) criterion:
An alternative method of estimating the sample partial autoregression matrices is by using multivariate least squares to fit a series of multivariate autoregressive models of increasing order.
$$\mathrm{FPE}\left(l\right)={v}_{l}\times \frac{\left(1+l{k}^{2}/n\right)}{\left(1l{k}^{2}/n\right)}\text{.}$$ 
VARMA model estimation
The crosscorrelation structure of a stationary multivariate time series may often be represented by a model with a small number of parameters belonging to the VARMA class. If the stationary series ${w}_{t}$ has been derived by transforming and/or differencing the original series ${x}_{t}$, then ${w}_{t}$ is said to follow the VARMA model:
where ${\epsilon}_{t}$ is a vector of uncorrelated residual series (white noise) with zero mean and constant covariance matrix $\Sigma $, ${\varphi}_{1},{\varphi}_{2},\dots ,{\varphi}_{p}$ are the $p$ autoregressive (AR) parameter matrices and ${\theta}_{1},{\theta}_{2},\dots ,{\theta}_{q}$ are the $q$ movingaverage (MA) parameter matrices. If ${w}_{t}$ has a nonzero mean $\mu $, then this can be allowed for by replacing ${w}_{t},{w}_{t1},\dots \text{}$ by ${w}_{t}\mu ,{w}_{t1}\mu ,\dots \text{}$ in the model.
$${w}_{t}={\varphi}_{1}{w}_{t1}+\cdots +{\varphi}_{p}{w}_{tp}+{\epsilon}_{t}{\theta}_{1}{\epsilon}_{t1}\cdots {\theta}_{q}{\epsilon}_{tq}\text{,}$$ 
A series generated by this model will only be stationary provided restrictions are placed on ${\varphi}_{1},{\varphi}_{2},\dots ,{\varphi}_{p}$ to avoid unstable growth of ${w}_{t}$. These are stationarity constraints. The series ${\epsilon}_{t}$ may also be usefully interpreted as the linear innovations in ${w}_{t}$, i.e., the error if ${w}_{t}$ were to be predicted using the information in all past values ${w}_{t1},{w}_{t2},\dots \text{}$, provided also that ${\theta}_{1},{\theta}_{2},\dots ,{\theta}_{q}$ satisfy what are known as invertibility constraints. This allows the series ${\epsilon}_{t}$ to be generated by rewriting the model equation as
The method of maximum likelihood (ML) may be used to estimate the parameters of a specified VARMA model from the observed multivariate time series together with their standard errors and correlations.
$${\epsilon}_{t}={w}_{t}{\varphi}_{1}{w}_{t1}\cdots {\varphi}_{p}{w}_{tp}+{\theta}_{1}{\epsilon}_{t1}+\cdots +{\theta}_{q}{\epsilon}_{tq}\text{.}$$ 
The residuals from the model may be examined for the presence of autocorrelations as a check on the adequacy of the fitted model.
VARMA model forecasting
Forecasts of the series may be constructed using a multivariate version of the univariate method. Efficient methods are available for updating the forecasts each time new observations become available.
Crossspectral Analysis
The relationship between two time series may be investigated in terms of their sinusoidal components at different frequencies. At frequency $\omega $ a component of ${y}_{t}$ of the form
has its amplitude ${R}_{y}\left(\omega \right)$ and phase lag ${\varphi}_{y}\left(\omega \right)$ estimated by
and similarly for ${x}_{t}$. In the univariate analysis only the amplitude was important – in the cross analysis the phase is important.
$${R}_{y}\left(\omega \right)\mathrm{cos}\left(\omega t{\varphi}_{y}\left(\omega \right)\right)$$ 
$${R}_{y}\left(\omega \right){e}^{i{\varphi}_{y}\left(\omega \right)}=\frac{1}{n}\sum _{t=1}^{n}{y}_{t}{e}^{i\omega t}$$ 
The sample crossspectrum
This is defined by
It may be demonstrated that this is equivalently defined in terms of the sample crosscorrelation function (CCF), ${r}_{xy}\left(k\right)$, of the series as
where ${c}_{xy}\left(k\right)={s}_{x}{s}_{y}{r}_{xy}\left(k\right)$ is the crosscovariance function.
$${f}_{xy}^{*}\left(\omega \right)=\frac{1}{2\pi n}\left(\sum _{t=1}^{n}{y}_{t}{e}^{i\omega t}\right)\left(\sum _{t=1}^{n}{x}_{t}{e}^{i\omega t}\right)\text{.}$$ 
$${f}_{xy}^{*}\left(\omega \right)=\frac{1}{2\pi}\sum _{\left(n1\right)}^{\left(n1\right)}{c}_{xy}\left(k\right){e}^{i\omega k}$$ 
The amplitude and phase spectrum
The crossspectrum is specified by its real part or cospectrum $c{f}^{*}\left(\omega \right)$ and imaginary part or quadrature spectrum $q{f}^{*}\left(\omega \right)$, but for the purpose of interpretation the crossamplitude spectrum and phase spectrum are useful:
$${A}^{*}\left(\omega \right)=\left{f}_{xy}^{*}\left(\omega \right)\right\text{, \hspace{1em}}{\varphi}^{*}\left(\omega \right)=\mathrm{arg}\left({f}_{xy}^{*}\left(\omega \right)\right)\text{.}$$ 
If the series ${x}_{t}$ and ${y}_{t}$ contain deterministic sinusoidal components of amplitudes ${R}_{y},{R}_{x}$ and phases ${\varphi}_{y},{\varphi}_{x}$ at frequency $\omega $, then ${A}^{*}\left(\omega \right)$ will have a peak of approximate width $\pi /n$ and height $\left(n/2\pi \right){R}_{y}{R}_{x}$ at that frequency, with corresponding phase ${\varphi}^{*}\left(\omega \right)={\varphi}_{y}{\varphi}_{x}$. This supplies no information that cannot be obtained from the two series separately. The statistical relationship between the series is better revealed when the series are purely stochastic and jointly stationary, in which case the expected value of ${f}_{xy}^{*}\left(\omega \right)$ converges with increasing sample size to the theoretical crossspectrum
where ${\gamma}_{xy}\left(k\right)=\mathrm{cov}\left({x}_{t},{y}_{t+k}\right)$. The sample spectrum, as in the univariate case, does not converge to the theoretical spectrum without some form of smoothing which either implicitly (using a lag window) or explicitly (using a frequency window) averages the sample spectrum ${f}_{xy\left(\omega \right)}^{*}$ over wider bands of frequency to obtain a smoothed estimate ${\hat{f}}_{xy}\left(\omega \right)$.
$${f}_{xy}\left(\omega \right)=\frac{1}{2\pi}\sum _{\infty}^{\infty}{\gamma}_{xy}\left(k\right){e}^{i\omega k}$$ 
The coherency spectrum
If there is no statistical relationship between the series at a given frequency, then ${f}_{xy}\left(\omega \right)=0$, and the smoothed estimate ${\hat{f}}_{xy}\left(\omega \right)$, will be close to $0$. This is assessed by the squared coherency between the series:
where ${\hat{f}}_{xx}\left(\omega \right)$ is the corresponding smoothed univariate spectrum estimate for ${x}_{t}$, and similarly for ${y}_{t}$. The coherency can be treated as a squared multiple correlation. It is similarly invariant in theory not only to simple scaling of ${x}_{t}$ and ${y}_{t}$, but also to filtering of the two series, and provides a useful test statistic for the relationship between autocorrelated series. Note that without smoothing,
so the coherency is $1$ at all frequencies, just as a correlation is $1$ for a sample of size $1$. Thus smoothing is essential for crossspectrum analysis.
$$\hat{W}\left(\omega \right)=\frac{{\left{\hat{f}}_{xy}\left(\omega \right)\right}^{2}}{{\hat{f}}_{xx}\left(\omega \right){\hat{f}}_{yy}\left(\omega \right)}$$ 
$${\left{f}_{xy}^{*}\left(\omega \right)\right}^{2}={f}_{xx}^{*}\left(\omega \right){f}_{yy}^{*}\left(\omega \right)\text{,}$$ 
The gain and noise spectrum
If ${y}_{t}$ is believed to be related to ${x}_{t}$ by a linear lagged relationship as in [Linear Lagged Relationships Between Time Series], i.e.,
then the theoretical crossspectrum is
where
is called the frequency response of the relationship.
$${y}_{t}={v}_{0}{x}_{t}+{v}_{1}{x}_{t1}+{v}_{2}{x}_{t2}+\cdots +{n}_{t}\text{,}$$ 
$${f}_{xy}\left(\omega \right)=V\left(\omega \right){f}_{xx}\left(\omega \right)$$ 
$$V\left(\omega \right)=G\left(\omega \right){e}^{i\varphi \left(\omega \right)}=\sum _{k=0}^{\infty}{v}_{k}{e}^{ik\omega}$$ 
Thus if ${x}_{t}$ were a sinusoidal wave at frequency $\omega $ (and ${n}_{t}$ were absent), ${y}_{t}$ would be similar but multiplied in amplitude by $G\left(\omega \right)$ and shifted in phase by $\varphi \left(\omega \right)$. Furthermore, the theoretical univariate spectrum
where ${n}_{t}$, with spectrum ${f}_{n}\left(\omega \right)$, is assumed independent of the input ${x}_{t}$.
$${f}_{yy}\left(\omega \right)=G{\left(\omega \right)}^{2}{f}_{xx}\left(\omega \right)+{f}_{n}\left(\omega \right)$$ 
Crossspectral analysis thus furnishes estimates of the gain
and the phase
From these representations of the estimated frequency response $\hat{V}\left(\omega \right)$, parametric transfer function (TF) models may be recognized and selected. The noise spectrum may also be estimated as
a formula which reflects the fact that in essence a regression is being performed of the sinusoidal components of ${y}_{t}$ on those of ${x}_{t}$ over each frequency band.
$$\hat{G}\left(\omega \right)=\left{\hat{f}}_{xy}\left(\omega \right)\right/{\hat{f}}_{xx}\left(\omega \right)$$ 
$$\hat{\varphi}\left(\omega \right)=\mathrm{arg}\left({\hat{f}}_{xy}\left(\omega \right)\right)\text{.}$$ 
$${\hat{f}}_{y\mid x}\left(\omega \right)={\hat{f}}_{yy}\left(\omega \right)\left(1\hat{W}\left(\omega \right)\right)$$ 
Interpretation of the frequency response may be aided by extracting from $\hat{V}\left(\omega \right)$ estimates of the impulse response function (IRF) ${\hat{v}}_{k}$. It is assumed that there is no anticipatory response between ${y}_{t}$ and ${x}_{t}$, i.e., no coefficients ${v}_{k}$ with $k=1$ or $2$ are needed (their presence might indicate feedback between the series).
Crossspectrum smoothing by lag window
The estimate of the crossspectrum is calculated from the sample crossvariances as
The lag window ${w}_{k}$ extends up to a truncation lag $M$ as in the univariate case, but its centre is shifted by an alignment lag $S$ usually chosen to coincide with the peak crosscorrelation. This is equivalent to an alignment of the series for peak crosscorrelation at lag $0$, and reduces bias in the phase estimation.
$${\hat{f}}_{xy}\left(\omega \right)=\frac{1}{2\pi}\sum _{M+S}^{M+S}{w}_{kS}{c}_{xy}\left(k\right){e}^{i\omega k}\text{.}$$ 
The selection of the truncation lag $M$, which fixes the bandwidth of the estimate, is based on the same criteria as for univariate series, and the same choice of $M$ and window shape should be used as in univariate spectrum estimation to obtain valid estimates of the coherency, gain, etc., and test statistics.
Direct smoothing of the crossspectrum
The computations are exactly as for smoothing of the univariate spectrum except that allowance is made for an implicit alignment shift $S$ between the series.
Kalman Filters
Kalman filtering provides a method for the analysis of multidimensional time series. The underlying model is:
where ${X}_{t}$ is the unobserved state vector, ${Y}_{t}$ is the observed measurement vector, ${W}_{t}$ is the state noise, ${V}_{t}$ is the measurement noise, ${A}_{t}$ is the state transition matrix, ${B}_{t}$ is the noise coefficient matrix and ${C}_{t}$ is the measurement coefficient matrix at time $t$. The state noise and the measurement noise are assumed to be uncorrelated with zero mean and covariance matrices:
If the system matrices ${A}_{t}$, ${B}_{t}$, ${C}_{t}$ and the covariance matrices ${Q}_{t},{R}_{t}$ are known then Kalman filtering can be used to compute the minimum variance estimate of the stochastic variable ${X}_{t}$.
$${X}_{t+1}={A}_{t}{X}_{t}+{B}_{t}{W}_{t}$$  (6) 
$${Y}_{t}={C}_{t}{X}_{t}+{V}_{t}$$  (7) 
$$E\left\{{W}_{t}{W}_{t}^{\mathrm{T}}\right\}={Q}_{t}\text{\hspace{1em} and \hspace{1em}}E\left\{{V}_{t}{V}_{t}^{\mathrm{T}}\right\}={R}_{t}\text{.}$$ 
The estimate of ${X}_{t}$ given observations ${Y}_{1}$ to ${Y}_{t1}$ is denoted by
${\hat{X}}_{t\mid t1}$ with state covariance matrix
$E\left\{{\hat{X}}_{t\mid t1}{\hat{X}}_{t\mid t1}^{\mathrm{T}}\right\}={P}_{t\mid t1}$ while the estimate of ${X}_{t}$ given observations ${Y}_{1}$ to ${Y}_{t}$ is denoted by
${\hat{X}}_{t\mid t}$ with covariance matrix
$E\left\{{\hat{X}}_{t\mid t}{\hat{X}}_{t\mid t}^{\mathrm{T}}\right\}={P}_{t\mid t}$.
The update of the estimate,
${\hat{X}}_{t+1\mid t}$, from time $t$ to time $t+1$, is computed in two stages.
First, the update equations are
where the residual
${r}_{t}={Y}_{t}{C}_{t}{X}_{t\mid t1}$ has an associated covariance matrix
${H}_{t}={C}_{t}{P}_{t\mid t1}{C}_{t}^{\mathrm{T}}+{R}_{t}$, and ${K}_{t}$ is the Kalman gain matrix with
The second stage is the onestepahead prediction equations given by
These two stages can be combined to give the onestepahead updateprediction equations
The above equations thus provide a method for recursively calculating the estimates of the state vectors
${\hat{X}}_{t\mid t}$ and
${\hat{X}}_{t+1\mid t}$ and their covariance matrices
${P}_{t\mid t}$ and
${P}_{t+1\mid t}$ from their previous values. This recursive procedure can be viewed in a Bayesian framework as being the updating of the prior by the data ${Y}_{t}$.
$${\hat{X}}_{t\mid t}={\hat{X}}_{t\mid t1}+{K}_{t}{r}_{t}\text{, \hspace{1em}}{P}_{t\mid t}=\left(I{K}_{t}{C}_{t}\right){P}_{t\mid t1}$$ 
$${K}_{t}={P}_{t\mid t1}{C}_{t}^{\mathrm{T}}{H}_{t}^{1}\text{.}$$ 
$${\hat{X}}_{t+1\mid t}={A}_{t}{\hat{X}}_{t\mid t}\text{, \hspace{1em}}{P}_{t+1\mid t}={A}_{t}{P}_{t\mid t}{A}_{t}^{\mathrm{T}}+{B}_{t}{Q}_{t}{B}_{t}^{\mathrm{T}}\text{.}$$ 
$${\hat{X}}_{t+1\mid t}={A}_{t}{\hat{X}}_{t\mid t1}+{A}_{t}{K}_{t}{r}_{t}\text{.}$$ 
The initial values ${\hat{X}}_{1\mid 0}$ and ${P}_{1\mid 0}$ are required to start the recursion. For stationary systems, ${P}_{1\mid 0}$ can be computed from the following equation:
which can be solved by iterating on the equation. For ${\hat{X}}_{1\mid 0}$ the value $E\left\{X\right\}$ can be used if it is available.
$${P}_{1\mid 0}={A}_{1}{P}_{1\mid 0}{A}_{1}^{\mathrm{T}}+{B}_{1}{Q}_{1}{B}_{1}^{\mathrm{T}}\text{,}$$ 
Computational methods
To improve the stability of the computations the square root algorithm is used. One recursion of the square root covariance filter algorithm which can be summarised as follows:
where $U$ is an orthogonal transformation triangularizing the lefthand prearray to produce the righthand postarray, ${S}_{t}$ is the lower triangular Cholesky factor of the state covariance matrix ${P}_{t+1\mid t}$, ${Q}_{t}^{1/2}$ and ${R}_{t}^{1/2}$ are the lower triangular Cholesky factor of the covariance matrices $Q$ and $R$ and ${H}^{1/2}$ is the lower triangular Cholesky factor of the covariance matrix of the residuals. The relationship between the Kalman gain matrix, ${K}_{t}$, and ${G}_{t}$ is given by
To improve the efficiency of the computations when the matrices ${A}_{t},{B}_{t}$ and ${C}_{t}$ do not vary with time the system can be transformed to give a simpler structure. The transformed state vector is ${U}^{*}X$ where ${U}^{*}$ is the transformation that reduces the matrix pair $\left(A,C\right)$ to lower observer Hessenberg form. That is, the matrix ${U}^{*}$ is computed such that the compound matrix
is a lower trapezoidal matrix. The transformations need only be computed once at the start of a series, and the covariance matrices ${Q}_{t}$ and ${R}_{t}$ can still be timevarying.
$$\left(\begin{array}{ccc}{R}_{t}^{1/2}& {C}_{t}{S}_{t}& 0\\ & & \\ 0& {A}_{t}{S}_{t}& {B}_{t}{Q}_{t}^{1/2}\end{array}\right)U=\left(\begin{array}{ccc}{H}_{t}^{1/2}& 0& 0\\ & & \\ {G}_{t}& {S}_{t+1}& 0\end{array}\right)$$ 
$${A}_{t}{K}_{t}={G}_{t}{\left({H}_{t}^{1/2}\right)}^{1}\text{.}$$ 
$$\left(\begin{array}{c}C{U}^{*T}\\ {U}^{*}A{U}^{*T}\end{array}\right)$$ 
Model fitting and forecasting
If the state space model contains unknown parameters, $\theta $, these can be estimated using maximum likelihood (ML). Assuming that ${W}_{t}$ and ${V}_{t}$ are normal variates the loglikelihood for observations ${Y}_{\mathit{t}}$, for $\mathit{t}=1,2,\dots ,n$, is given by
Optimal estimates for the unknown model parameters $\theta $ can then be obtained by using a suitable optimizer method to maximize the likelihood function.
$$\mathrm{constant}\frac{1}{2}\sum _{t=1}^{n}\text{ln}\left(\mathrm{det}\left({H}_{t}\right)\right)\frac{1}{2}\sum _{t=1}^{t}{r}_{t}^{\mathrm{T}}{H}_{t}^{1}{r}_{t}\text{.}$$ 
Once the model has been fitted forecasting can be performed by using the onestepahead prediction equations. The onestepahead prediction equations can also be used to ‘jump over’ any missing values in the series.
Kalman filter and time series models
Many commonly used time series models can be written as state space models. A univariate $\mathrm{ARMA}\left(p,q\right)$ model can be cast into the following state space form:
where $r=\mathrm{max}\phantom{\rule{0.125em}{0ex}}\left(p,q+1\right)$.
$$\begin{array}{lll}{x}_{t}& =& A{x}_{t1}+B{\epsilon}_{t}\\ {w}_{t}& =& C{x}_{t}\end{array}$$ 
$$A=\left(\begin{array}{llllll}{\varphi}_{1}& 1& & & & \\ {\varphi}_{2}& & 1& & & \\ .& & & .& & \\ .& & & & .& \\ {\varphi}_{r1}& & & & & 1\\ {\varphi}_{r}& 0& 0& .& .& 0\end{array}\right)\text{, \hspace{1em}}B=\left(\begin{array}{l}\phantom{}1\\ {\theta}_{1}\\ {\theta}_{2}\\ .\\ .\\ {\theta}_{r1}\end{array}\right)\text{\hspace{1em} and \hspace{1em}}{C}^{\mathrm{T}}=\left(\begin{array}{l}1\\ 0\\ 0\\ .\\ .\\ 0\end{array}\right)\text{,}$$ 
The representation for a $k$variate $\mathrm{ARMA}\left(p,q\right)$ series (VARMA) is very similar to that given above, except now the state vector is of length $kr$ and the $\varphi $ and $\theta $ are now $k\times k$ matrices and the 1s in $A$, $B$ and $C$ are now the identity matrix of order $k$. If $p<r$ or $q+1<r$ then the appropriate $\varphi $ or $\theta $ matrices are set to zero, respectively.
Since the compound matrix
is already in lower observer Hessenberg form (i.e., it is lower trapezoidal with zeros in the top righthand triangle) the invariant Kalman filter algorithm can be used directly without the need to generate a transformation matrix ${U}^{*}$.
$$\left(\begin{array}{c}C\\ A\end{array}\right)$$ 
GARCH Models
ARCH models and their generalizations
Rather than modelling the mean (for example using regression models) or the autocorrelation (by using ARMA models) there are circumstances in which the variance of a time series needs to be modelled. This is common in financial data modelling where the variance (or standard deviation) is known as volatility. The ability to forecast volatility is a vital part in deciding the risk attached to financial decisions like portfolio selection. The basic model for relating the variance at time $t$ to the variance at previous times is the autoregressive conditional heteroskedastic (ARCH) model. The standard ARCH model is defined as
where ${\psi}_{t}$ is the information up to time $t$ and ${h}_{t}$ is the conditional variance.
$$\begin{array}{c}{y}_{t}\mid {\psi}_{t1}\sim N\left(0,{h}_{t}\right)\text{,}\\ \\ {h}_{t}={\alpha}_{0}+\sum _{i=1}^{q}{\alpha}_{i}{\epsilon}_{ti}^{2}\text{,}\end{array}$$ 
In a similar way to that in which autoregressive (AR) models were generalized to ARMA models the ARCH models have been generalized to a GARCH model; see Engle (1982), Bollerslev (1986) and Hamilton (1994)
$${h}_{t}={\alpha}_{0}+\sum _{i=1}^{q}{\alpha}_{i}{\epsilon}_{ti}^{2}+\sum _{i=1}^{p}\beta {h}_{ti}\text{.}$$ 
This can be combined with a regression model:
where ${\epsilon}_{t}\mid {\psi}_{t1}\sim N\left(0,{h}_{t}\right)$ and where ${x}_{\mathit{i}t}$, for $\mathit{i}=1,2,\dots ,k$, are the exogenous variables.
$${y}_{t}={b}_{0}+\sum _{i=1}^{k}{b}_{i}{x}_{it}+{\epsilon}_{t}\text{,}$$ 
The above models assume that the change in variance, ${h}_{t}$, is symmetric with respect to the shocks, that is, that a large negative value of ${\epsilon}_{t1}$ has the same effect as a large positive value of ${\epsilon}_{t1}$. A frequently observed effect is that a large negative value ${\epsilon}_{t1}$ often leads to a greater variance than a large positive value. The following three asymmetric models represent this effect in different ways using the parameter $\gamma $ as a measure of the asymmetry.
Type I AGARCH($p,q$)
$${h}_{t}={\alpha}_{0}+\sum _{i=1}^{q}{\alpha}_{i}{\left({\epsilon}_{ti}+\gamma \right)}^{2}+\sum _{i=1}^{p}{\beta}_{i}{h}_{ti}\text{.}$$ 
Type II AGARCH($p,q$)
$${h}_{t}={\alpha}_{0}+\sum _{i=1}^{q}{\alpha}_{i}{\left(\left{\epsilon}_{ti}\right+\gamma {\epsilon}_{ti}\right)}^{2}+\sum _{i=1}^{p}{\beta}_{i}{h}_{ti}\text{.}$$ 
GJRGARCH($p,q$), or Glosten, Jagannathan and Runkle GARCH (see Glosten et al. (1993))
where ${I}_{t}=1$ if ${\epsilon}_{t}<0$ and ${I}_{t}=0$ if ${\epsilon}_{t}\ge 0$.
$${h}_{t}={\alpha}_{0}+\sum _{i=1}^{q}\left({\alpha}_{i}+\gamma {I}_{t1}\right){\epsilon}_{t1}^{2}+\sum _{i=1}^{p}{\beta}_{i}{h}_{ti}\text{,}$$ 
The first assumes that the effects of the shocks are symmetric about $\gamma $ rather than zero, so that for $\gamma <0$ the effect of negative shocks is increased and the effect of positive shocks is decreased. Both the Type II AGARCH and the GJR GARCH (see Glosten et al. (1993)) models introduce asymmetry by increasing the value of the coefficient of ${\epsilon}_{t1}^{2}$ for negative values of ${\epsilon}_{t1}$. In the case of the Type II AGARCH the effect is multiplicative while for the GJR GARCH the effect is additive.
Coefficient  ${\epsilon}_{t1}<0$  ${\epsilon}_{t1}>0$ 
Type II AGARCH  ${\alpha}_{i}{\left(1\gamma \right)}^{2}$  ${\alpha}_{i}{\left(1+\gamma \right)}^{2}$ 
GJR GARCH  ${\alpha}_{i}+\gamma $  ${\alpha}_{i}$ 
(Note that in the case of GJR GARCH, $\gamma $ needs to be positive to inflate variance after negative shocks while for Type I and Type II AGARCH, $\gamma $ needs to be negative.)
A third type of GARCH model is the exponential GARCH (EGARCH). In this model the variance relationship is on the log scale and hence asymmetric.
where ${z}_{t}=\frac{{\epsilon}_{t}}{\sqrt{{h}_{t}}}$ and $E\left[\left{z}_{ti}\right\right]$ denotes the expected value of $\left{z}_{ti}\right$.
$$\mathrm{ln}\left({h}_{t}\right)={\alpha}_{0}+\sum _{i=1}^{q}{\alpha}_{i}{z}_{ti}+\sum _{i=1}^{q}{\varphi}_{i}\left(\left{z}_{ti}\rightE\left[\left{z}_{ti}\right\right]\right)+\sum _{i=1}^{p}{\beta}_{i}\mathrm{ln}\left({h}_{ti}\right)\text{,}$$ 
Note that the ${\varphi}_{i}$ terms represent a symmetric contribution to the variance while the ${\alpha}_{i}$ terms give an asymmetric contribution.
Another common characteristic of financial data is that it is heavier in the tails (leptokurtic) than the Normal distribution. To model this the Normal distribution is replaced by a scaled Student's $t$distribution (that is a Student's $t$distribution standardized to have variance ${h}_{t}$). The Student's $t$distribution is such that the smaller the degrees of freedom the higher the kurtosis for degrees of freedom $\text{}>4$.
Fitting GARCH models
The models are fitted by maximizing the conditional loglikelihood. For the Normal distribution the conditional loglikelihood is
$$\frac{1}{2}\sum _{i=1}^{T}\left(\mathrm{log}\left({h}_{i}\right)+\frac{{\epsilon}_{i}^{2}}{{h}_{i}}\right)\text{.}$$ 
For the Student's $t$distribution the function is more complex. An approximation to the standard errors of the parameter estimates is computed from the Fisher information matrix.
Inhomogeneous Time Series
If we denote a generic univariate time series as a sequence of pairs of values $\left({z}_{\mathit{i}},{t}_{\mathit{i}}\right)$, for $\mathit{i}=1,2,\dots $ where the $z$'s represent an observed scalar value and the $t$'s the time that the value was observed, then in a standard time series analysis, as discussed in other sections of this introduction, it is assumed that the series being analysed is homogeneous, that is the sampling times are regularly spaced with ${t}_{i}{t}_{i1}=\delta $ for some value $\delta $. In many real world applications this assumption does not hold, that is, the series is inhomogeneous.
Standard time series analysis techniques cannot be used on an inhomogeneous series without first preprocessing the series to construct an artificial homogeneous series, by for example, resampling the series at regular intervals. Zumbach and Müller (2001) introduced a series of operators that can be used to extract robust information directly from the inhomogeneous time series. In this context, robust information means that the results should be essentially independent of minor changes to the sampling mechanism used when collecting the data, for example, changing a number of time stamps or adding or removing a few observations.
The basic operator available for inhomogeneous time series is the exponential moving average (EMA). This operator has a single parameter, $\tau $, and is an average operator with an exponentially decaying kernel given by:
$$\frac{{e}^{t/\tau}}{\tau}\text{.}$$ 
This gives rise to the following iterative formula:
where
$$\text{EMA}\left[\tau ;z\right]\left({t}_{i}\right)=\mu \text{EMA}\left[\tau ;z\right]\left({t}_{i1}\right)+\left(\nu \mu \right){z}_{i1}+\left(1\nu \right){z}_{i}$$ 
$$\mu ={e}^{\alpha}\text{\hspace{1em} and \hspace{1em}}\alpha =\frac{{t}_{i}{t}_{i1}}{\tau}\text{.}$$ 
The value of $\nu $ depends on the method of interpolation chosen. Three interpolation methods are available:
1.  Previous point:  $\nu =1$. 
2.  Linear:  $\nu =\left(1\mu \right)/\alpha $. 
3.  Next point:  $\nu =\mu $. 
Given the EMA, a number of other operators can be defined, including:
A discussion of each of these operators, their use and in some cases, alternative definitions, are given in Zumbach and Müller (2001).
(i)  $\mathbf{m}$Iterated Exponential Moving Average, defined as


(ii)  Moving Average (MA), defined as


(iii)  Moving Norm (MNorm), defined as


(iv)  Moving Variance (MVar), defined as


(v)  Moving Standard Deviation (MSD), defined as


(vi)  Differential ( $\mathbf{\Delta}$), defined as


(vii)  Volatility, defined as

References
Akaike H (1971) Autoregressive model fitting for control Ann. Inst. Statist. Math. 23 163–180
Bollerslev T (1986) Generalised autoregressive conditional heteroskedasticity Journal of Econometrics 31 307–327
Box G E P and Jenkins G M (1976) Time Series Analysis: Forecasting and Control (Revised Edition) Holden–Day
Engle R (1982) Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation Econometrica 50 987–1008
Gentleman W S and Sande G (1966) Fast Fourier transforms for fun and profit Proc. Joint Computer Conference, AFIPS 29 563–578
Glosten L, Jagannathan R and Runkle D (1993) Relationship between the expected value and the volatility of nominal excess return on stocks Journal of Finance 48 1779–1801
Hamilton J (1994) Time Series Analysis Princeton University Press
Heyse J F and Wei W W S (1985) The partial lag autocorrelation function Technical Report No. 32 Department of Statistics, Temple University, Philadelphia
Tiao G C and Box G E P (1981) Modelling multiple time series with applications J. Am. Stat. Assoc. 76 802–816
Wei W W S (1990) Time Series Analysis: Univariate and Multivariate Methods Addison–Wesley
Zumbach G O and Müller U A (2001) Operators on inhomogeneous time series International Journal of Theoretical and Applied Finance 4(1) 147–178