'Are we nearly there?' Alice managed to pant out at last. 'Nearly there!' the Queen repeated. 'Why, we passed it ten minutes ago! Faster!' Through the Looking Glass, Lewis Carroll. 12. SUMMARY, DISCUSSION, MAIN CONCLUSIONS AND RECOMMENDATIONS 12.1 SUMMARY In the early 1960's, Kalman made an important contribution to the constant struggle to extract signal from noise (or in other words, to extract information from data). This paper shows that the Kalman filter is a generalization of recursive Bayesian estimation, which is itself the repeated application of Bayes theorem. It is also shown that recursive Bayesian estimation is a generalization of ordinary least squares regression, and so the Kalman filter has its roots very deep in the past. Five problems have been worrying me for several years; so much so that for a while I believed that two of them (what to do when the last few residuals are autocorrelated, and how to treat preliminary data) had no fully satisfactory resolution, even though they are very important to the practical business forecaster. The five problems are described in chapter one, and the way that recursive Bayesian estimation can cope with them is described in chapter three. Recursive Bayesian estimation was used to estimate a very simple model in chapter five; only one dependent and only one explanatory variable can be treated. But the results were sufficiently encouraging to want to treat multiple dependent and multiple explanatory variables, and the theory for this is developed in chapter 6. At this point, the notation and theory is sufficiently well explained to make it possible to examine past work in this area. Although much use has been made of the Kalman filter in process control, it has been used only a little by statisticians, and barely at all by econometricians. I know of some 20 econometric/time series analysis packages, and none of them offer Kalman filtering as an option, except the ones implementing the Harrison/Stevens technique, which inhibits causal modelling. But does the Kalman filter estimate better models than ordinary least squares? Chapter eight addresses this question by generating artificial data with known properties, and then estimating the same model using the two techniques. The conclusion is that the Kalman filter does give better models, but not by much for the range of models studied. Chapter nine is a practical application of the Kalman filter to an energy demand model, showing how a Kalman filter estimation of two fairly straightforward models compares with the ordinary least squares estimation. The longest chapter of this paper is chapter ten. It deals with the estimation of a wool consumption model. Most of the features available with Kalman filtering are used, and the effect of : - zero, non-zero or variable W - giving older and preliminary data less precision - prior information - incorporating information from another source (in this case, a time-series of total fibre consumption) are all demonstrated. The dynamic sum of squared errors (DSSE) from the best model is less than a fifth of the OLS DSSE. A wide variety of assumptions about V, W and prior data are used; the conclusion is that the improvements offered by the Kalman filter are robust to these assumptions. 12.2 DISCUSSION The Kalman filter is difficult to grasp intuitively; it was quite hard at first for me to see what all those matrices were for and what they were doing. By approaching it via recursive Bayesian estimation, it is much easier to see what is going on (I re-invented recursive Bayesian estimation before I found that the Kalman filter had already been invented). I have not yet found a textbook (Chow, 1981, Harvey 1981a, Harvey 1981b and Maddala, 1977 included) that explained the Kalman filter in terms that are familiar to econometricians, and this is probably one of the reasons why so few papers in the econometric literature use it, although in the last few years this is beginning to change. Another reason must be the non-availability of software. It is hoped that this paper goes some way towards remedying both of these problems; the FORTRAN Kalman filter software is offered in 11.1 above and in appendix D, and Appendix G has a program written for a Texas Instruments programmable calculator which copes with one dependent variable and two explanatory variables. The Kalman filter offers a modest improvement in forecasting ability over OLS-estimated models in most cases, a small worsening in some, and a dramatic improvement in others. It makes it easier to feed any available prior information into the estimating process, to use information available from other time series (as in chapter ten, the total fibre consumption series is used), to give much less weight to unreliable (because preliminary or because very old) data. It also has the advantage of being able to allow parameters to random-walk over time, and to allow some parameters to walk faster than others. These advantages make it possible for the Kalman filter to estimate the same model as ordinary least squares, but with an improved dynamic sum of squared errors (although in the case of UK industrial energy demand, the Kalman filter was worse than OLS, despite many trials). The DSSE used throughout this paper has one major limitation; it is calculated on one-step-ahead forecasts. This means that errors are not allowed to accumulate in the Kalman filter, but are corrected in the period after the forecast is made. The same is true for OLS; in the simulations of chapter 8, the forecasts made using both techniques were only one step ahead. But it could be that the Kalman filter benefits more than OLS from the fact that only one-step-ahead forecasts are made, so the results must be viewed in this light. Further work in this field could use a more distant forecast horizon to compare the forecasting ability of the two techniques. The statistical significance of estimates of parameters using the Kalman filter with non-zero W is much less than the significance that we have come to expect of OLS-estimated models. This is not intrinsic to the Kalman filter, nor is it saying that the Kalman filter produces inferior (because less precise) parameter estimates. It is very simply a reflection of the fact that if we admit that the parameters of our models are not totally rigid, but are somewhat movable, then we are able to be less precise about where they are. The more movable we allow the parameters to be, the less we can say about them. The contention is that parameters do drift, and the exactitude with which econometricians are wont to quote their parameter estimates is usually unwarranted, as it is based on the assumption that they do not drift. This contention is unprovable, as we do not know how the real (as opposed to the estimated) parameters behave, but it is quite plausible, and is commonly held by non-econometricians. The values used for W (governing the rate at which old data become irrelevant) are not too critical to the parameter estimates (an order of magnitude difference in W seems to have little effect) but the W should not be too large, or the Kalman filter is not able to carry very much information forward from year to year, as it "forgets" too fast what it has learned. The values used for V are likewise not critical, although it is important that non-zero values be used, especially if W is very small or zero. In this paper, suitable W's have been found to lie in the range 10-3 to 10-5 for a wide variety of models. This may be a consequence of the kind of modelling being done (i.e. annual data, demand models), but it at least suggests that future work could start with a W of 10-4. The problem of selecting W's to use in the Kalman filter is rather difficult. In a four parameter regression, there are 10 numbers to be selected for the W-matrix (the other 6 come from the symmetry of the matrix). Clearly it is not going to be often that these 10 elements of the W-matrix will be estimable simultaneously with the four parameters. So the W-matrix will usually have to be specified by the analyst, from considerations of how fast he thinks the parameters are likely to drift. The experimentation with W in chapter 10, however, does show that the choice of W is not overly critical. The specification of the off-diagonal elements of W is even more problematic than the diagonal elements, and this paper has not even attempted to treat this problem, except superficially. The same problem applies to H. Throughout this paper, H has been assumed to be the identity matrix, but McWhorter et al. (1976) show that the effects of a misspecification of H are serious. The contribution of the Kalman filter to forecasting has so far been slight. This is partly because of the newness of the technique, but also because of the lack of packages available to econometricians and statisticians that make the use of the Kalman filter as painless as is currently the case with OLS. It may also be that a number of researchers have tried using the Kalman filter, but found it inferior to OLS, and simply not reported their results. Some work has been done, however, and this is reported in chapter 7. Opinion is divided as to the efficacy of the Kalman filter, but one clear message is that there is more that the analyst must specify, compared to OLS, and hence more to misspecify. This misspecification, if gross, can lead to very bad forecasts. The work of chapter eight showed that the Kalman filter copes better than ordinary least squares when parameters drift with time, but it also showed that the Kalman filter stands up well when parameters do not drift. This is satisfying, as it means that the Kalman filter can be used to cope with suspected drifting parameters, and will not give absurd answers when the reality is constant parameters. Weighted least squares or discounted least squares (which is sometimes suggested as a way of coping with one or more of the problems listed in chapter one) was examined, but was found to give an inferior performance to recursive Bayesian estimation, and to be highly sensitive to choice of weights. Also, it is necessary to specify a single weight per time-period, and so it is not possible to allow the different parameters of the model to be affected differently. 12.3 MAIN CONCLUSIONS The Kalman filter can be understood and used fairly easily by econometricians provided it is explained in terms that they understand, and especially if it is incorporated into an easy-to-use package. More effort is required to estimate a Kalman filter model, as the econometrician has to set up various matrices (such as V, W, prior information where available). There is more that must be specified, and therefore more that can be misspecified. The Kalman filter will cope with the five problems described in 1.1 above, and will do so more easily and naturally than conventional methods. The Kalman filter is more appropriate where data are of a highly variable quality, or where there is strong prior information on the parameters, or where old data are considerably less relevant to the current values of the parameters than are recent data. The Kalman filter's recognition of the variable precision of the data can lead to better models. The Kalman filter will often estimate a model which will forecast better than OLS. Sometimes the forecasts will be much better, sometimes they will be slightly worse. A good choice for W might lie in the range 10-3 to 10-5. 12.4 RECOMMENDATIONS 1. The Kalman filter should be incorporated into existing econometric packages. 2. The Kalman filter should be covered in econometrics courses. 3. Econometricians should re-estimate some of the models that they currently estimate using conventional techniques, using whatever prior information they have available. They can then decide for themselves whether the Kalman filter works better for their particular applications, and whether it is worth the extra trouble. 4. When Kalman filter estimated models are written up, the values of V and W, and the starting values for M and C should be reported, whether these are estimated, assumed, or from prior information.