OPTIMALITY OF GLS FOR ONE-STEP-AHEAD FORECASTING WITH REGARIMA AND RELATED MODELS WHEN THE REGRESSION IS MISSPECIFIED

David F. Findley

doi:10.1017/S0266466607070430

OPTIMALITY OF GLS FOR ONE-STEP-AHEAD FORECASTING WITH REGARIMA AND RELATED MODELS WHEN THE REGRESSION IS MISSPECIFIED

Published online by Cambridge University Press: 06 September 2007

David F. Findley

Show author details

David F. Findley: Affiliation:
U.S. Census Bureau

Article contents

Abstract
1. INTRODUCTION
2. THE DATA AND REGRESSOR ASSUMPTIONS
3. THE θ-PARAMETERIZATION OF ARMA MODELS
4. UNIFORM CONVERGENCE OF GLS ESTIMATES
5. UNIFORM ASYMPTOTIC STATIONARITY OF FORECAST ERRORS
6. EXTENSION TO ARIMA DISTURBANCE MODELS
7. OPTIMALITY OF GLS
8. EXTENSIONS AND RELATED RESULTS
APPENDIX A. Compact θ-Sets for Estimation
APPENDIX B. Scalable Asymptotic Stationarity
APPENDIX C. Vector Array Reformulation of Assumptions
APPENDIX D. Uniform Convergence Results for Filtered A.S. Arrays
APPENDIX E. Proofs
References

Rights & Permissions

Abstract

We consider the modeling of a time series described by a linear regression component whose regressor sequence satisfies the generalized asymptotic sample second moment stationarity conditions of Grenander (1954, Annals of Mathematical Statistics 25, 252–272). The associated disturbance process is only assumed to have sample second moments that converge with increasing series length, perhaps after a differencing operation. The model's regression component, which can be stochastic, is taken to be underspecified, perhaps as a result of simplifications, approximations, or parsimony. Also, the autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) model used for the disturbances need not be correct. Both ordinary least squares (OLS) and generalized least squares (GLS) estimates of the mean function are considered. An optimality property of GLS relative to OLS is obtained for one-step-ahead forecasting. Asymptotic bias characteristics of the regression estimates are shown to distinguish the forecasting performance. The results provide theoretical support for a procedure used by Statistics Netherlands to impute the values of late reporters in some economic surveys.The author thanks two referees and the co-editor for comments and suggestions that led to substantial improvements in the exposition and also thanks John Aston and Tucker McElroy for helpful comments on an earlier draft. Any views expressed are the author's and not necessarily those of the U.S. Census Bureau.

Type: Research Article
Information: Econometric Theory , Volume 23 , Issue 6 , December 2007 , pp. 1083 - 1107

DOI: https://doi.org/10.1017/S0266466607070430 [Opens in a new window]
Copyright: © 2007 Cambridge University Press

1. INTRODUCTION

For many economic indicator series, modeling requires specification of both a regression function and an autocovariance structure for the disturbance process. Suppose that, possibly after a variance stabilizing transformation (e.g., differencing), one has data W_t, 1 ≤ t ≤ T of the form

where the X_t are column vectors and the y_t are real variates that are asymptotically orthogonal to the X_t in a sense to be defined, whose lagged sample second moments converge as T → ∞. With monthly or quarterly seasonal economic data, AX_t might describe a linear or higher degree trend, stable seasonal effects, moving holiday effects (Bell and Hillmer, 1983), trading day effects (Findley, Monsell, Bell, Otto, and Chen, 1998), or other periodic effects. The term X_t might also include values of related stochastic variables, perhaps at leads or lags. We address the situation in which the modeler considers a model

whose regressor X_t^M is not able to reproduce AX_t for all t because of known or unknown omissions, approximations, simplifications, etc. We assume that the modeler, perhaps starting from the ordinary least squares (OLS) estimate for A^M given by (1.5) later in this section, has decided upon an autoregressive moving average (ARMA) model family, not necessarily correct, for the disturbance (or residual) process y_t^M = W_t − A^MX_t^M. Such a model for (1.2) is called a regARMA model.

Generalized least squares (GLS) estimation of A^M occurs simultaneously with ARMA estimation. The simplest definition of (feasible) GLS estimates of A^M, given by (1.3), makes use of the ARMA model's innovation filter that is defined as follows. With L denoting the lag operator, let φ(L) be the autoregressive polynomial (AR) and α(L) the moving average (MA) polynomial of a (perhaps incorrect) candidate ARMA model for y_t^M and let θ = (1,θ₁,θ₂,…) denote the coefficient sequence of the power series expansion

. When y_t in (1.1) and the regressors missing from X_t^M are weakly (i.e., first and second moment) stationary with mean zero, then y_t^M will be weakly stationary with mean zero. In this case, assuming that values of y_t^M are available at all past times,

is the model's linear forecast of y_t^M from y_s^M, −∞ < s ≤ t − 1; see Section 5.3.3 of Box and Jenkins (1976) or Hannan (1970, p. 147). The forecast errors

are called the model's innovations series, and the coefficient sequence θ is its innovation filter. If the ARMA model is correct, then for each t, a_t(θ) is uncorrelated with y_s^M, −∞ < s ≤ t − 1, and it follows that y_t|t−1^M(θ) has minimum mean square error among all such linear forecasts of y_t^M and that the innovations a_t(θ) are uncorrelated (white noise). However, we do not assume that a correct ARMA model exists or that y_t^M is weakly stationary. For example, when a missing regressor is deterministic, e.g., periodic, y_t^M will not be weakly stationary even when y_t is but will instead be asymptotically stationary, meaning that its lagged sample second moments will converge as T increases. Their limits form the autocovariance sequence of a weakly stationary process. In effect, it is this autocovariance sequence for which an ARMA model is sought. All ARMA model–related quantities of interest in this paper depend only on θ and on the W_t and X_t^M. Thus we can express model dependence in terms of θ, as we do throughout the paper. Further motivation for this “parameterization” is given in Section 3. We refer to each θ as a model.

For given W_t,X_t^M, 1 ≤ t ≤ T and θ, define

and

for 1 ≤ t ≤ T and let ′ denote transpose. Following Pierce (1971), we define the θ-model's GLS estimator of A^M to be

(We discuss another GLS estimator in Section 8.) With these A_T^M(θ), an estimate of θ (and of the ARMA coefficients determining θ when they are identified) can be obtained by conditional or unconditional maximum likelihood estimation (MLE). (As usual, Gaussian likelihood functions are used without requiring the data to be Gaussian.) For the conditional MLE estimates on which we focus for simplicity (see Box and Jenkins, 1976, Sect. 7.1.2), for each 1 ≤ t ≤ T, one defines the θ-model's forecast of W_t from W_s, 1 ≤ s ≤ t − 1 to be

, with the convention

. This is the special case W_t|t−1^M(θ,θ,T) of the more general forecast function W_t|t−1^M(θ,θ*,T) defined in (1.6), which follows. Conditional MLE estimates θ^T leading to GLS estimates A_T^M(θ^T) are the minimizers

where Θ is a compact set of θ specified by ARMA(p,q) models whose AR and MA polynomials have all zeroes in {|z| ≥ 1 + ε}, for some ε > 0.

Responding to the extensive literature comparing GLS with OLS, we also consider model estimates and forecasts based on the OLS estimate of A^M,

This is the special case A_T^M(θ*) of (1.3) with θ* = (1,0,0,…), the white noise model for y_t^M. The forecast function of W_t associated with A_T^M is obtained by using this choice of θ* in

With this formula, for any fixed θ*, conditional MLE yields a specification

In this paper, we obtain formulas for the limiting values of average squared one-step-ahead prediction errors obtained from these two types of MLEs,

and, for fixed θ*,

With Theorems 5.1 and 7.1, which are given later in the paper, we show, under general assumptions on X_t and X_t^M given subsequently, that (1.7) is always less than or equal to (1.8), typically less. This is the optimality property of GLS referred to in the title of this paper. (By contrast, in the correct regressor case, when all our assumptions hold except (2.9) requiring asymptotic nonnegligibility of the omitted regressors, the two limits are equal.) Further, using OLS with the white noise model θ* = (1,0,0,…) for y_t^M, as is often done, usually leads to even worse forecasts, in the sense that

has a larger value than (1.8); see Section 7.1.1.

1.1. Overview of the Paper

The regressor sequence X_t, t ≥ 1 is required to satisfy the conditions of Grenander (1954), which define a property we call scalable asymptotic stationarity; see Section 2 and Appendix B. Grenander introduced this generalization of stationarity to investigate the efficiency of OLS estimates for a large class of nonstochastic regressors in models with a broad range of weakly stationary disturbances. We indicate in Section 7.2 why efficiency in Grenander's sense is rarely applicable in the context of misspecified nonstochastic regressors. For the models we consider, the regressor X_t^M in (1.2), which can be stochastic, is taken to be a proper subvector of X_t. The remaining entries of X_t can be those of any vector X_t^N, compatible with our assumptions, whose variables compensate for the inadequacies of X_t^M in such a way that, for some A^M and A^N, the regression function in (1.1) can be decomposed as

Then, in (1.2), y_t^M = A^NX_t^N + y_t.

Our requirements for X_t^M, X_t^N, and y_t are comprehensively stated in Section 2 and verified for some important classes of models in Sections 2.1 and 6.1.1. More information about ARMA model parameterization with innovations coefficient sequences θ = (1,θ₁,θ₂,…) is provided in Section 3, which includes some elementary examples. For diagonal scaling matrices D_M,T such that

is nonsingular, Theorem 4.1 gives a formula for lim_T→∞ (A_T^M(θ) − A^M)T^−1/2D_M,T⁻¹ and establishes that convergence is uniform on the compact sets Θ defined in Appendix A. For a given θ, this limit is called the asymptotic bias characteristic of A_T^M(θ) for A^M. Section 5 obtains formulas for the limits of the sample second moments of the forecast errors W_t − W_t|t−1^M(θ, θ,T) and W_t − W_t|t−1^M(θ, θ*,T). The analogous results for regARIMA-type nonstationary models, for situations in which the disturbance process requires a differencing transformation prior to ARMA modeling, are discussed in Section 6. We describe, in Theorem 7.1 in Section 7, how the optimality property of GLS mentioned previously arises: the better performance of GLS relative to OLS occurs when the OLS estimate has an asymptotic bias characteristic different from that of the GLS estimate. These results provide support for an imputation procedure used by Statistics Netherlands (Aelen, 2004), which uses one-step-ahead forecasts from regARIMA models with stochastic distributed lag regressors to impute the net contribution of late-reporting firms to economic time series from certain monthly surveys; see Section 6.1. Section 7.1 provides elementary expressions for some asymptotic quantities associated with GLS and OLS estimation when y_t^M is modeled as a first-order autoregression. These are used to illustrate the generality of GLS's optimality. Section 8 discusses related results and extensions.

Proofs of the theorems are given in Appendix E. They use the auxiliary results of Appendix D obtained mainly from Findley, Pötscher, and Wei (2001).

2. THE DATA AND REGRESSOR ASSUMPTIONS

In (1.1), we require y_t, t ≥ 1 to be asymptotically stationary (A.S.) in the sense of Pötscher (1987), meaning that for each k = 0,±1,…, the lag k sample second moments have asymptotic limits almost surely (i.e., with probability one), denoted a.s. That is, the limits

exist. (By convention,

, if a < b.) From a well-known result of Herglotz, the sequence of asymptotic lag k second moments γ_k^y has a spectral distribution function G_y(λ) such that

for k = 0,±1,….

We require X_t, t ≥ 1 in (1.1) to be scalably asymptotically stationary (S.A.S.), meaning that the limits

exist, where the D_X,T are diagonal scaling matrices, D_X,T = diag(d_1,T,…,d_{dim X,T}), which are positive definite, decrease to zero (D_X,T [nearr ] 0), and satisfy lim_T→∞ D_X,T+k⁻¹D_X,T = I_X for each k ≥ 0. Here I_X is the identity matrix of order dim X. (Ordinary convergence is meant in (2.2) if no coordinate of X_t is stochastic.) The resulting sequence Γ_k^X has a spectral distribution matrix function

for k = 0,±1,…; see Appendix B for further background, including examples.

Partition X_t as

where, as in the Introduction, the superscript N designates regressors not in the model (1.2). Let the corresponding partition of A in (1.1) be A = [A^M A^N] and let those of D_X,T, Γ_k^X, and G_X(λ) be, respectively,

From D_X,T [nearr ] 0, we have

We require Γ₀^MM to be positive definite,

and restrict X_t^N to being A.S.,

Of course, (2.7) is equivalent to D_N,T = T^−1/2I_N, with I_N the identity matrix of order dim X^N. We exclude omitted regressors of larger order, e.g., t^p with p > 0, because they yield unbounded y_t^M dominated by A^NX_t^N, which would clearly reveal the inadequacy of X_t^M with large enough T.

Further, the two series y_t and X_t must be asymptotically orthogonal, meaning that

Finally, to keep the focus on the incorrect regressor situation, we assume that

In summary, our assumptions concerning (1.1) are (2.1), (2.2), and (2.5)–(2.9).

2.1. Consequences of (2.1), (2.8), and (2.9) for y_t and y_t^M

First we note that, when X_t contains an entry equal to 1 for all t, then the corresponding scaling factor in D_X,T can be taken to be T^−1/2, and so (2.8) yields

In this sense, y_t in (1.1) can be thought of as an asymptotically mean zero process. A similar result holds for the disturbances y_t^M = A^NX_t^N + y_t of the misspecified model (1.2); see Section 4.

Now we establish the asymptotic stationarity of the y_t^M. From the requirement (2.7) that X_t^N be A.S. and from (2.1) and (2.8), for each k,

is given by

where G_y^M(λ) = A^NG^NN(λ)A^N′ + G_y(λ). From (2.9), we have γ₀^M > 0. (The term γ₀^y can be zero.)

Finally, we note that, except in special situations such as that of Section 7.2, the disturbances and regressors in (1.2) will be asymptotically correlated, meaning

for some k, which will usually cause A_T^M(θ) defined in (1.3) to be biased asymptotically for some θ; see Theorem 4.1.

2.2. Sufficient Conditions for (2.1) and (2.8)

The properties (2.1) and (2.8) hold under reasonably general assumptions on y_t and X_t. The verification of (2.8) for a common type of stochastic regression model is discussed in Section 6.1.1. Here we consider the case in which y_t is weakly stationary with mean zero and X_t is nonstochastic with Γ₀^X > 0. Then, for almost sure convergence in (2.1) and (2.8), it suffices to have

, with

for some independent white noise process ε_t such that sup_t E|ε_t|^r < ∞ with r > 2 if y_t has a bounded spectral density or, if the spectral density of y_t is unbounded but square integrable, with r > 4; see Section 3.1 of Findley et al. (2001).

3. THE θ-PARAMETERIZATION OF ARMA MODELS

Three features of our ARMA model situation may be new to readers not familiar with the vein of research literature of which the papers by Pötscher (1987, 1991) are representative: (a) the disturbances y_t^M, 1 ≤ t ≤ T are not required to have means or covariances but only the asymptotic stationarity property; (b) no ARMA model is assumed to be correct in the sense of being able to exactly model the asymptotic lagged second moment sequence (2.10); (c) the ARMA coefficients of a model envisioned as φ(L)y_t^M = α(L)a_t are replaced by the innovations filter θ = (1,θ₁,θ₂,…) defined by the property that

satisfies θ(z) = φ(z)/α(z) for |z| < 1. In this section, we provide some orienting discussion and examples.

We assume that α(z) ≠ 0 for all |z| ≤ 1, i.e., that the model is invertible. When y_t^M is weakly stationary with mean zero and defined for all t, then there always exists a weakly stationary series a_t = a_t(θ) such that the preceding ARMA model formula holds, namely,

. When y_t^M is only A.S. and defined only for t ≥ 1, we define

, t ≥ 1. This series is A.S. with asymptotic lag k second moment given by

, with G_y^M(λ) as in (2.10); see (ii) of Proposition D.1 in Appendix D. We would call the θ-model correct if the white noise property, γ_k^a(θ) = 0 for k ≠ 0, obtains or, equivalently, if

for all k for some σ² > 0. However, our theorems do not require any model for y_t^M, t ≥ 1 to be correct in this sense.

For subsequent discussions, it will be useful to have in mind the θ's of some simple ARMA models. As was indicated in Section 1, a white noise model has θ = (1,0,0,…). For the invertible ARMA(1,1) model, (1 − φL)y_t^M = (1 − αL)a_t, with |α|,|φ| < 1, one has θ_j = α^j−1(α − φ), j ≥ 1. For AR(1) and MA(1) models, we have θ = (1,−φ,0,0,…) and θ = (1,α,α²,…), respectively.

Model parameterization by θ is useful because the θ's that are determined by likelihood-maximizing ARMA coefficients have uniquely defined large-sample limits in situations where the ARMA coefficients themselves do not, because of common zeroes in limiting AR and MA polynomials. For example, when an ARMA(1,1) model is fitted to white noise, the sequence of maximum likelihood pairs (φ^T,α^T) has multiple limit (or cluster) points, all on the line {(α,α) : |α| ≤ 1}; see Hannan (1982). However, when φ = α for an ARMA(1,1) model, then θ = (1,0,0,…), and so this is the only limit point of the filter sequence θ^T defined by the maximum likelihood estimates φ^T, α^T. That is, θ^T → θ a.s. coordinatewise, i.e., θ_j^T → θ_j a.s., j ≥ 0.

As in the preceding examples, the coordinates of θ are always continuous functions of the ARMA coefficients. The converse holds only if the ARMA model is identifiable, i.e., the AR and MA polynomials have no common zero; also see the Appendix of Pötscher (1991) for additional background on the θ-parameterization. (Pötscher's parameter is the coefficient sequence of

. The relationship between θ and

is continuous and invertible; see Section 3 of Findley, Pötscher, and Wei, 2004.)

To obtain the uniform convergence and continuity properties needed to establish the results indicated in the Introduction, ARMA(p,q), model coefficient estimation is restricted to compact sets of AR and MA coefficient vectors whose polynomials have all zeroes in {|z| ≥ 1 + ε} for some ε > 0. Such sets specify compact sets Θ of the type discussed in Appendix A.

4. UNIFORM CONVERGENCE OF GLS ESTIMATES

We now present a fundamental convergence property of the A_T^M(θ) defined in (1.3). A generalized inverse is to be used in (1.3) when the inverse matrix fails to exist. This can (with probability one when X_t^M is stochastic) only happen for a finite number of T values, because of (2.6) and (iv) of Proposition D.1 in Appendix D. For any matrix M, define ∥M∥ = λ_max^1/2(MM′), with λ_max(·) denoting the maximum eigenvalue. If M is a vector with real coordinates m₁,…,m_n, then

Partition

analogously to (2.4), i.e.,

with

, etc. For θ from an invertible model, define

In Appendix E, we prove the theorem that follows.

THEOREM 4.1. Let Θ be a compact set of models as described in Appendix A. Under the assumptions (2.1), (2.2), and (2.5)–(2.8), we have, uniformly on Θ,

The function C^NM(θ) is continuous on Θ and thus bounded there, max_θ∈Θ∥C^NM(θ)∥ < ∞.

For a given θ, lim_T→∞ (A_T^M(θ) − A^M)T^−1/2D_M,T⁻¹ = A^NC^NM(θ) is called the asymptotic bias characteristic of A_T^M(θ) for A^M. It is nonzero for some θ if Γ_k^NM ≠ 0 for some k, i.e., if the series A^NX_t^N and X_t^M are asymptotically correlated. When D_M,T = T^−1/2, then A^NC^NM(θ) is the asymptotic bias of A_T^M(θ) for A^M. Omitted variable bias is a fundamental modeling issue; see, e.g., Stock and Watson (2002, pp. 143–149). Section 7 will show that, when A^NC^NM(θ) varies with θ, there is usually an optimal value of A^NC^NM(θ) for one-step-ahead forecasting that is determined by the θ^T sequence of (1.4).

If X_t^M has one or more coordinates that are A.S., then for any

that differs from A^M only in these coordinates we have, uniformly on Θ,

This reveals the important fact that the asymptotic bias characteristic associated with an alternative omitted-regressor decomposition,

with

, differs from the right-hand side of (4.2) by a term that is independent of θ.

Except in special situations, e.g., when the omitted regressors are precisely known, there is always ambiguity concerning X_t^N and A^M. However, it is useful to note that if a coordinate X_i,t^M of X_t^M is constant with value one, then

can be assumed to be zero: by defining

to differ from A^M in that

replaces A_i^M, and by defining

, one obtains

with

. Then, for

we have

5. UNIFORM ASYMPTOTIC STATIONARITY OF FORECAST ERRORS

We consider sample second moments of the errors of the one-step-ahead forecasts W_t|t−1^M(θ,θ*,T) from (1.6). For 1 ≤ t ≤ T, the forecast errors W_t − W_t|t−1^M(θ,θ*,T) are observable and equal to W_t[θ] − A_T^M(θ*)X_t^M[θ], which yields

Thus, setting U_t(T) = [y_t T^1/2D_M,TX_t^M X_t^N]′, 1 ≤ t ≤ T and β_T(θ*) = [1 (A^M − A_T^M(θ*))T^−1/2D_M,T⁻¹ A^N], we have

Let Θ* be a compact set in the sense of Appendix A. For β(θ*) = [1 −A^NC^NM(θ*) A^N], Theorem 4.1 yields

This fact and the properties of the U_t(T) array described in Appendix C lead to the following theorem, which is proved in Appendix E. Define

and

For any Θ,Θ*, let Θ × Θ* denote the Cartesian product set {(θ,θ*) : θ ∈ Θ,θ* ∈ Θ*} and define convergence (θ^T,θ*^T) → (θ,θ*) in Θ × Θ* to mean θ_j^T → θ_j and θ_j*^T → θ_j* for all j ≥ 0.

THEOREM 5.1. Let Θ and Θ* be compact sets of models as described in Appendix A. Under the assumptions (2.1), (2.2), and (2.5)–(2.8), the forecast-error arrays W_t − W_t|t−1^M(θ,θ*,T), 1 ≤ t ≤ T are continuous on Θ × Θ* and also jointly uniformly A.S. there. Specifically, for each k = 0,±1,…, as T → ∞, with

for G_M,θ*(λ) as in (5.5), the limits

hold uniformly a.s. on Θ × Θ*. Further, the functions Γ_k^M(θ,θ*) are continuous and uniformly bounded on Θ × Θ*. Also, from (5.7) and (5.1), for given θ and θ*, the values of Γ_k^M(θ,θ*) depend only on the values of the series AX_t, X_t^M and y_t = W_t − AX_t, not on the specification of the compensating regressor X_t^N in decompositions AX_t = A^MX_t^M + A^NX_t^N (see Sect. 4).

Theorem 5.1 shows that the quantities Γ₀^M(θ,θ*) are of special interest because they describe limiting average squared one-step-ahead forecast errors. With

(5.5) yields the decomposition

By specializing the argument used to establish Theorem 5.1, γ₀^y(θ) is seen to be the limiting average squared error of the θ-model's one-step-ahead forecast of W_t when X_t^M = X_t. Similarly, using (4.2), the final quantity in (5.9) is seen to be the limit of the average of the squares of one-step-ahead forecast errors of the regression-function error array AX_t − A_T^M(θ*)X_t^M, 1 ≤ t ≤ T,

It follows from the results for k = 0 in Theorem 5.1 by standard arguments (see Pötscher and Prucha, 1997, Ch. 3 and Lem. 4.2) that the conditional maximum likelihood estimators θ^T of (1.4) converge a.s. to the compact set Θ₀ of minimizers of Γ₀^M(θ,θ) over Θ,

That is, on a set of realizations of the random variables in (1.1) with probability one, the limit point of each (coordinatewise) convergent subsequence of θ^T,T ≥ 1 belongs to Θ₀. (So if there is a unique minimizer θ, then θ^T → θ a.s.) Equivalently, in terms of the l¹-norm (see Appendix A), lim_T→∞ min_θ∈Θ₀∥θ^T − θ∥₁ = 0 a.s.

Similarly, the conditional maximum likelihood estimators θ*^T associated with A_T^M(θ*) for fixed θ* ∈ Θ converge a.s. to the set of minimizers of Γ₀^M(θ,θ*), which usually does not include θ*; see Section 7.1.1.

6. EXTENSION TO ARIMA DISTURBANCE MODELS

Now suppose the observed data are

, 1 − d ≤ t ≤ T from a time series of the form

to which a model of the form

is being fit. Suppose also that it has been correctly determined that the disturbances

require “differencing” with an operator

, whose zeroes are on the unit circle, to obtain residuals for which an ARMA model can be considered. The resulting model is called a regARIMA model for

. Such models are extensively used for seasonal time series in the context of seasonal adjustment (see Findley et al., 1998; Peña, Tiao, and Tsay, 2001), often with δ(L) = (1 − L)(1 − L^s),s = 4,12. We assume that (2.1), (2.2), and (2.5)–(2.8) hold for

, and

and that X_t^M is a subvector of X_t. For any 1 ≤ t ≤ T, because

, for given θ and θ* a natural one-step-ahead forecast for

, with W_t|t−1^M(θ,θ*,T) defined by (1.6). This leads to

for 1 ≤ t ≤ T and therefore to forecast-error limiting results as in Theorem 5.1 with the same functions Γ_k^M(θ,θ*).

6.1. Forecasting a Stochastic Regressor to Impute Values for Late Survey Responders

We briefly consider an application involving regARIMA models with stochastic regressors. Section 3.3 of Aelen (2004) provides an interesting one-step-ahead forecasting application involving a variety of seasonal time series

whose values come from enterprises that report economic data to Statistics Netherlands a month late, and

includes the sum of the values for month t from all enterprises of the same type that report on time, i.e., in the desired month, and sometimes also lagged values of these sums. Thus

is stochastic. In conjunction with the following discussion of distributed lag models, Theorem 5.1 and Theorem 7.1 in Section 7 provide theoretical support for Aelen's use of the regARIMA model GLS estimation and one-step-ahead forecasting procedures of X-12-ARIMA (Findley et al., 1998) to obtain Statistics Netherlands' imputed value for

in the month in which

becomes available.

6.1.1. A Class of Distributed Lag Models Satisfying the Assumptions of Theorem 5.1.

After differencing, Aelen's model becomes a distributed lag model with regressors and correlated disturbances that are both treated as stationary. We consider a broad class of such models. Suppose that W_t and Z_t are jointly covariance stationary variates with zero means and that the spectral density matrix of Z_t is Hermitian positive definite at all frequencies. Then, when the autocovariance sequence Γ_k^V of V_t = [W_t Z_t′]′ satisfies

, there exist coefficients A_k satisfying

such that

holds, with Ey_tZ_t−k′ = 0, k = 0,±1,…; see Theorem 8.3.1 of Brillinger (1975). For any m,n ≥ 0, setting X_t^M = [Z_t+n′ … Z_t−m′]′, X_t^N = [sum ]_{k≠−n,…,m}A_kZ_t−k, A^M = [A_−n … A_m], and A^N = 1 leads to (1.1) and (1.2) having the form of a distributed lag model with stationary disturbances (see, e.g., Stock and Watson, 2002) and to the assumptions of Theorem 5.1 holding under Gaussianity or weaker assumptions on V_t; see Theorem IV.3.6 of Hannan (1970).

7. OPTIMALITY OF GLS

Because of the uniform convergence and continuity results established in Theorem 5.1, for any compact Θ as described in Appendix A, we have

and, for any fixed θ* ∈ Θ,

In Appendix E, we establish the theorem that follows.

THEOREM 7.1. Let Θ be a compact set as described in Appendix A and suppose that (2.1), (2.2), and (2.5)–(2.8) hold. Then for any fixed θ* ∈ Θ,

with equality holding if and only if a minimizer θ* of Γ₀^M(θ,θ*) over Θ is always a minimizer of Γ₀^M(θ,θ),

and, simultaneously, the asymptotic bias characteristic of A_T^M(θ*) as an estimator of A^M coincides with that of A_T^M(θ*),

As a consequence, strict inequality obtains in (7.3) if and only if

holds for every minimizer θ of Γ₀^M(θ,θ) over Θ. For the maximum likelihood estimators θ^T of (1.4), this condition implies

Conversely, if Γ₀^M(θ,θ) has a unique minimizer θ, then (7.7) implies (7.6).

Unless θ* is a minimizer of Γ₀^M(θ,θ), we expect that both min_θ∈Θ Γ₀^M(θ, θ) < Γ₀^M(θ*,θ*) and A^NC^NM(θ*) ≠ A^NC^NM(θ*) will hold except in quite special situations, the only one known to us being when A^NC^NM(θ*), and therefore also Γ₀^M(θ, θ*), does not depend on θ*. In Section 7.1, this is shown to occur with AR(1) models for y_t^M only in a singular situation. Otherwise θ* is unique. Whenever θ* is unique, failure of (7.5), which implies min_θ∈Θ Γ₀^M(θ, θ) < Γ₀^M(θ*,θ*) and θ* ≠ θ*, also yields Γ₀^M(θ*,θ*) < Γ₀^M(θ*, θ*).

Model sets Θ usually include the white noise model θ* = (1,0,0,…) as a degenerate case. Hence the conclusions of Theorem 7.1 are generally applicable to OLS as an alternative to GLS. They indicate the following optimality property of GLS: In conjunction with maximum likelihood estimation of θ, asymptotically, OLS estimation is never better than GLS estimation for one-step-ahead forecasting. When the regressor is underspecified and A^NC^NM(θ) is nonconstant, OLS will typically have greater average mean square error than GLS, for large enough T, because of its asymptotic bias characteristics being different from that of GLS.

Thursby (1987) provides comparisons of OLS and GLS biases when y_t is known to be independent and identically distributed (i.i.d.) (white noise), dim X_t^M = 2, dim X_t^N = 1, the coordinates of X_t are correlated first-order AR processes, and the loss function is the posterior mean squared bias associated with a prior for the parameters that determine the covariance structure between X_t^N and X_t^M. With the aid of numerical integrations for the GLS quantities, he establishes that, depending on the choice of the autocovariance structure of X_t^M, the mean squared asymptotic bias of GLS is sometimes less and sometimes greater than that of OLS. Theorem 7.1 shows that, for either outcome, GLS has an asymptotic advantage over OLS for one-step-ahead forecasting.

7.1. Examples Involving AR(1) Models and dim X_t^M = dim X_t^N = 1

The condition (7.5) is the easiest to investigate, because, for AR models, θ* is the solution of a linear system of equations. For simplicity, we consider only the case in which dim X_t^M = dim X_t^N = 1 and a first-order AR model, i.e., θ = θ(φ) = (1,−φ,0,0,…), is used for the disturbance series y_t^M in (1.2). From (5.8) and (5.9), this leads to

where

and

. Also, with θ* = (1,−φ*,0,…), the C^NM(θ*) component of B^NM(θ*) is

When

the derivative of C^NM(θ*) is nonzero on −1 < φ* < 1 and C^NM(θ*) is strictly monotonic; see Section 6.3 in the paper by Findley (2005), whose derivation also shows that the unique θ* = (1,−φ*,0,…) minimizing (7.8) is the lag one autocorrelation of G_M,θ*(λ) in (5.5),

There is no such simple formula for φ minimizing Γ₀^M(θ,θ) because the critical point equation for φ provides φ as a zero of a polynomial of degree five in general. However, from strict monotonicity of C^NM(θ*(φ*)), if φ* ≠ φ* then (7.5) fails, and therefore strict inequality holds in (7.3) by Theorem 7.1. For the OLS choice, φ* = 0, when C^NM(θ*) = C^NM, (7.10) shows that φ* ≠ 0 (except possibly at a single value of (A^N)²), when either γ₁^y or Δ^NM = Γ₁^NN + (C^NM)²Γ₁^MM − C^NM(Γ₁^NM + Γ₋₁^NM) is nonzero, which will usually be the case. A periodic X_t satisfying (7.9) and Δ^NM ≠ 0 is given in Section 7.1.2.

When (7.9) fails, C^NM(θ*) = C^NM = Γ₀^NM/Γ₀^MM for all θ*, and so equality holds in (7.3).

7.1.1. The Inferiority of White Noise Modeling with OLS when φ* ≠ 0.

If Θ is a compact model set containing the AR(1) models θ = θ(φ), then Γ₀^M(θ*,θ*) ≤ Γ₀^M(θ*(φ*),θ*). So, under (7.9) and φ* ≠ φ*, we have, from (7.3), that min_θ∈Θ Γ₀^M(θ, θ) ≤ Γ₀^M(θ*, θ*) < Γ₀^M(θ*, θ*). Thus, for θ* = (1,0,0,…), it follows from (7.1) and (7.2) that when φ* ≠ 0, using OLS estimation of A^M with the white noise model for y_t^M leads to asymptotically worse one-step-ahead forecasts than GLS with (1.4), for any such model set Θ.

7.1.2. Periodic X_t and an Example of Δ^NM.

The trading day and holiday regressors discussed in Findley et al. (1998), Bell and Hillmer (1983), and Findley and Soukup (2000) are effectively periodic functions; i.e., X_t+P^M = X_t^M holds for all t, for rather large periods P (e.g., 12 × 28 = 336 months for trading day regressors, 12 × 19 = 228 months for some lunar holiday regressors, more for other holidays, e.g., Easter). The simplest holiday regressors are one-dimensional and specify that the effect of the holiday is the same for each day in some interval near the holiday, a dubious but simplifying assumption. For such regressors, the compensating X_t^N can be assumed to be one-dimensional and have the same period.

Every regressor of period P has a Fourier representation [sum ]_jα_jcos(2πjt/P) + β_jsin(2πjt/P) with at most P nonzero coefficients, which are uniquely determined linear functions of P consecutive values of the regressor; see Section 4.2.3 of Anderson (1971). To give a more complete analysis of (7.3) for the function (7.8), we consider a simplified period P = 4 regressor X_t^M having the representation X_t^M = a₁^M cos(π/2)t + a₂^M(−1)^t, with a₁^M, a₂^M ≠ 0, for which X_t^N = a₁^N cos(π/2)t + b₁^N sin(π/2)t, with a₁^N, b₁^N ≠ 0. Thus X_t = [X_t^M X_t^N]′ = α₁cos(π/2)t + α₂(−1)^t + β₁sin(π/2)t, where α₁ = [a₁^M a₁^N], α₂ = [a₂^M 0], and β₁ = [0 b₁^N]. Consequently, Γ_k^X = ½α₁′α₁cos(π/2)k + α₂′α₂(−1)^k + ½β₁′β₁sin(π/2)k, k = 0,±1,…, and G_X(λ) is piecewise constant with upward jumps at λ = ±π/2,π; see Anderson (1971, p. 581).

For this X_t, the left-hand side of (7.9) has the value −a₁^Ma₁^N(a₂^M)², and so (7.9) holds. Further, C^NM = a₁^Ma₁^N{(a₁^M)² + 2(a₂^M)²}⁻¹ and Δ^NM = −(a₂^MC^NM)². Strict inequality holds in (7.3) for OLS estimation except when γ₁^y > 0 and (A^N)² = γ₁^y(a₂^MC^NM)⁻², in which case φ* = 0 = φ*.

7.2. Regarding Asymptotic Efficiency in the Sense of Grenander (1954)

Here we restrict attention to nonrandom regressors X_t in (1.1) whose components are polynomials, periodic functions, or realizations of stationary processes with continuous spectral densities and with convergent sample second moments. The disturbance process y_t is assumed to be a mean zero stationary process with the last-mentioned properties. Grenander (1954) considers the correct regressor case and calls the OLS estimates

asymptotically efficient if lim_T→∞ D_T⁻¹E {(A_T − A)′(A_T − A)}D_T⁻¹ is minimal (in the ordering of symmetric matrices) among all linear, unbiased estimates A_T of A. For this situation, his result, given on p. 244 of Grenander and Rosenblatt (1984), is that OLS is efficient if and only if the spectral distribution function G_X(λ) has at most dim X_t jumps and the sum of the ranks of the jumps G_X(λ+) − G_X(λ+), 0 ≤ λ ≤ π is equal to dim X_t. These conditions are not satisfied, and OLS is not efficient, for most of the regressors discussed in Section 7.1.2, including the calendar effect regressors and the period four regressor with b^N ≠ 0; see Chapter 7.7 and case (1) on p. 253 of Grenander and Rosenblatt (1984): usually, the number of terms in the Fourier representation of X_t, and thus also the number of jumps in G_X(λ), exceeds dim X_t.

To be able to apply Grenander's result to our underspecified regression situation, assume that X_t^M and y_t^M have the properties hypothesized previously for X_t and y_t. Thus X_t^N has a continuous spectral density and so cannot have periodic components. If we consider X_t^M having only polynomial and periodic components, then X_t^N and X_t^M are asymptotically orthogonal; see Section 6.1 of Findley (2005). This implies A^NC^NM(θ*) = 0 for all θ*, resulting in equality in (7.3) always, because Γ₀^M(θ,θ*) does not depend on θ*.

On the other hand, with regressors in X_t^M that are realizations of stationary processes, if A^NC^NM(θ*) is nonzero, then the analogue for A_T^M(θ*) of Grenander's efficiency measure fails by being infinite, because some entries of (A_T^M(θ*) − A^M)D_M,T⁻¹ will have order T^1/2; see (4.2).

Thus this concept of efficiency is not useful in our context.

8. EXTENSIONS AND RELATED RESULTS

From their connection to one-step-ahead forecast error filters, it is not very surprising that GLS estimates of regARMA and regARIMA models have an optimality property for one-step-ahead forecasting. Yet a systematic investigation of the topic has been lacking. A pleasingly simple result, such as Theorem 7.1's connection of optimality with asymptotic bias characteristics, seems possible only for the incorrect regressor case. Indeed, if asymptotic efficiency results are indicative, the correct regressor case will be quite complex. In this case, when the ARMA model for y_t is incorrect, GLS can be more or less efficient than OLS; see Koreisha and Fang (2001). Even when the ARMA model is also correct, the analysis and examples of Grenander and Rosenblatt (1984) and of Section 7.2 show, for nonstochastic regressors, that OLS is asymptotically efficient only for a limited range of relatively simple regressors.

For any fixed θ*, in the incorrect nonstochastic regressor case, a referee conjectures that, under additional assumptions and with the aid of a result like Theorem 4.1 of West (1996), it can be shown that the limit as T → ∞ of the variance of

does not depend on θ*.

So far, we have only provided asymptotic results for the most simply defined GLS estimates, which are obtained by truncating the infinite-past forecast error filters and using conditional maximum likelihood estimation of the ARMA model. Section 2.4 of Findley (2005) and (d) of Lemma 10 of Findley (2005) reveal that the same limits are obtained if the errors of the finite-past one-step-ahead forecasts discussed in Newton and Pagano (1983) are used to define GLS estimates in conjunction with unconditional maximum likelihood estimation of the ARMA model. (Analogous GLS estimates from AR models were considered in Amemiya, 1973.) See Section 9 of the technical report Findley (2003) for additional details, including details about how to weaken the assumptions on X_t^M to include the frequently used intervention variables of Box and Tiao (1975). These decay exponentially to zero and so have weight one in D_M,T, causing (2.5) to fail. Also, with the restriction to measurable minimizers θ^T discussed in Findley et al. (2001, 2004), in the case of nonstochastic X_t, all almost sure convergence results hold with convergence in probability when convergence in (2.1) holds only in this weaker sense.

Findley (2003) also shows how to use the results of Appendix D to generalize Theorem 5.1 to the case of multi-step-ahead forecast errors and to establish the convergence of θ-parameter estimates that minimize average squared multi-step-ahead forecast errors (allowing for y_t^M the more comprehensive model classes of Findley et al., 2004).

Findley (2005) uses the results of Theorems 4.1 and 7.1 to obtain formulas and GLS optimality results for the limiting average of squared out-of-sample (real time) forecast errors of regARIMA models under assumptions on the regressors X_t that are slightly more restrictive than those of Section 2 but are satisfied by all of the specific regressor types we have mentioned. The limit formulas are the same as those of the present paper when X_t^M is A.S. Empirical results are available from the author showing that GLS usually leads to better one-step-ahead out-of-sample forecasting performance than OLS for a suite of monthly series that are modeled with trading day and Easter holiday regressors by the U.S. Census Bureau for the purpose of seasonal adjustment.

APPENDIX A. Compact θ-Sets for Estimation

For each ε > 0 and integer pair p,q ≥ 0, we define Θ_p,q,ε to be the set all of θ = (1,θ₁,θ₂,…) from invertible ARMA(r,s) models with r ≤ p, s ≤ q such that the zeroes of the minimal degree AR and MA polynomials φ(z) and α(z) such that θ(z) = φ(z)/α(z) all belong to {|z| ≥ 1 + ε}. Every sequence θ^T = (1,θ₁^T,θ₂^T,…), T = 1,2,… in Θ_p,q,ε has a subsequence θ^S(T) that converges coordinatewise to some θ ∈ Θ_p,q,ε, i.e., θ_j^S(T) → θ_j, j ≥ 1. Thus Θ_p,q,ε is compact for coordinatewise convergence. Further, for 0 ≤ ε₀ < ε, the sums

converge uniformly on Θ_p,q,ε; i.e.,

and

. See Lemmas 2 and 10 of Findley (2005) for these and other properties mentioned. Our uniform convergence results that are presented subsequently follow from these facts as do some other important properties. First, the functions

are continuous on −π ≤ λ ≤ π and uniformly bounded and bounded away from zero on Θ_{p,q, ε}:

Second, if a sequence θ^T, T = 1,2,… in Θ_p,q,ε converges coordinatewise to some θ, then it also converges in the stronger sense that

whenever 0 ≤ ε₀ < ε. In particular, the topology of coordinatewise convergence on Θ_{p,q, ε} coincides with that of the l¹-norm

Our theorems apply to any compact Θ for which Θ ⊆ Θ_p,q,ε holds, for some ε > 0 and p,q ≥ 0. A typical Θ would arise from constraints on the zeroes of the AR and MA polynomials of the kind of ARMA model of interest.

APPENDIX B. Scalable Asymptotic Stationarity

Under the data assumptions made in Section 2, X_t and y_t in (1.1) together form a multivariate sequence that is S.A.S., a property we now consider in some detail. Let U_t, t ≥ 1 be a real-valued column vector sequence that is S.A.S. and let I_U denote the identity matrix of order dimU, the dimension of U_t. Thus there is a decreasing sequence D₁ ≥ D₂ ≥ … of positive definite diagonal matrices, for which D_T [nearr ] 0 and

hold, such that, for each k = 0,±1,…, the limits

exist (finitely). The properties (B.1) and (B.2) yield lim_T→∞ D_TU_T−j = 0 a.s., j ≥ 0. For example, when j = 0, as T → ∞,

converges a.s. to Γ₀^U − Γ₀^U = 0, whence D_TU_T → 0 a.s. Further, D_T [nearr ] 0 leads to lim_T→∞ D_TU_1+j = 0 a.s. for all j ≥ 0.

Without a formal name, this generalization of stationarity was introduced for regressors in Grenander (1954) to encompass a variety of nonstochastic regressors, including polynomials. (Our notation is the inverse of his, using D_T where he uses D_T⁻¹. He only requires the diagonal elements of Γ₀^U to be positive, which is the nature of (2.9) for

. Our requirement (2.10) for X_t^M is stronger.) Grenander shows that the real matrix sequence Γ_k^U, k = 0,±1,… has a representation

in which G_U(λ) is a Hermitian-matrix-valued function such that the eigenvalues of increments G_U(λ₂) − G_U(λ₁), λ₂ ≥ λ₁, are nonnegative, or, equivalently, the increments are Hermitian nonnegative; see also Grenander and Rosenblatt (1984), Chapter II of Hannan (1970), and Chapter 10 of Anderson (1971). For example, if U_t = t^p, p ≥ 0, then, with D_T = T^−(p+1/2), one obtains Γ_k^U = (2p + 1)⁻¹ for each k, and so G_U(λ) can be taken to be 0 for λ < 0 and (2p + 1)⁻¹ for λ ≥ 0. Grenander (1954) and Grenander and Rosenblatt (1984, Ch. 7) verify the joint scalable asymptotic stationarity property for regressors whose entries X_i,t are polynomials, linear combinations (perhaps infinite) of sinusoids, i.e., of cos ω_jt and/or sin ω_jt, for various 0 ≤ ω_j ≤ π (scaling sequence T^−1/2), and, finally, products of polynomials t^p with linear combinations of sinusoids (scaling sequence T^−p−1/2). By contrast, exponentially increasing regressors, e.g., U_t = e^bt with b > 0, are not S.A.S. because (B.1) fails for

; see Hannan (1970, p. 77).

APPENDIX C. Vector Array Reformulation of Assumptions

The following reformulation of our assumptions (2.1), (2.2), and (2.5)–(2.9) concerning y_t and X_t will enable us to make use of the results of Findley et al. (2001, 2004). The vector array

is A.S. More specifically, for each k = 0,±1,…,

with Γ₀^MM > 0 and A^NΓ₀^NNA^N′ > 0. Further, from Appendix B,

Because of (C.3), the spectral distribution matrix of the Γ_k^U sequence has the block diagonal form G_U(λ) = blockdiag(G_y(λ),G_X(λ)).

APPENDIX D. Uniform Convergence Results for Filtered A.S. Arrays

The proposition and lemma that follow are formulated for proving some of the more general results indicated in Section 8.

PROPOSITION D.1. Let U_t(T), 1 ≤ t ≤ T be an A.S. column vector array satisfying (C.4) and let G_U(λ) denote the spectral distribution matrix of the asymptotic lagged second moments matrices Γ_k^U defined by (C.2). Let H and Z be sets of filters η = (η₀,η₁,…) and ζ = (ζ₀,ζ₁,…) such that

resp.

converges uniformly on H resp. Z. Then the filter output arrays

and

, 1 ≤ t ≤ T, η ∈ H, ζ ∈ Z have the following properties:

(i) lim_T→∞ sup_η∈H∥T^−1/2U_1+j,T [η]∥ = lim_T→∞ sup_η∈H∥T^−1/2U_T−j,T [η]∥ = 0 a.s. for all j ≥ 0, and analogously for U_t [ζ](T).

(ii) As T → ∞,

, where

, for k = 0,±1,….

(iii) The functions Γ_k^U(η,ζ) are bounded on H × Z,

and are jointly continuous in η,ζ in the sense that, if η^T ∈ H, ζ^T ∈ Z are such that η^T → η and ζ^T → ζ (coordinatewise convergence) with η ∈ H, ζ ∈ Z, then Γ_k^U(η^T,ζ^T) → Γ_k^U(η,ζ). Also, if Z = H, then inf_{η∈H,−π≤λ≤π}|η(e^iλ)|²Γ₀^U ≤ Γ₀^U(η, η) ≤ sup_{η∈H,−π≤λ≤π}|η(e^iλ)|²Γ₀^U.

(iv) Let H be an index set for a family of arrays U_t(η,T), 1 ≤ t ≤ T, η ∈ H such that, as T → ∞,

where the Γ₀(η) are positive definite matrices whose minimum eigenvalues are bounded away from zero; i.e.,

holds for some m_H > 0. Then

Proof. Parts (i)–(iii) are straightforward vector extensions of special cases of Theorem 2.1 and Proposition 2.1 of Findley et al. (2001). For (iv), it follows from (D.1) and (D.2) that, given ε > 0, for each realization except those of an event with probability zero, there is a T_ε such that for T ≥ T_ε the inequalities

and

hold. Hence for these T and all η ∈ H,

which establishes (D.3).

We also need the following lemma, whose proof can be obtained by standard arguments, as in the proof of (5.18) of Findley et al. (2004).

LEMMA D.2. Suppose that, on a set Θ*, the sequence β_T(θ*), T = 1,2,… of row vector functions converges uniformly a.s. to a bounded function β(θ*), i.e., (5.3) holds, and similarly for τ_T(θ*), T ≥ 1 and its limit τ(θ*). Let U_t(η,T), η ∈ H and W_t(ζ,T), ζ ∈ Z, 1 ≤ t ≤ T be families of column vector arrays of the same dimension as β(θ*) and τ(θ*), respectively, such that, for k = 0,±1,…,

holds for functions Γ_k(η,ζ) with sup_{η∈H,ζ∈Z}∥Γ₀(η, ζ)∥ < ∞. Then, as T → ∞,

APPENDIX E. Proofs

Proof of Theorem 4.1. We have

By (ii) and (iii) of Proposition D.1,

converges uniformly a.s. to 0 and

and

converge uniformly a.s. to the continuous limits Γ₀^NM(θ) and Γ₀^MM(θ), respectively, with Γ₀^MM(θ) bounded below by the positive definite matrix m_Θ²Γ₀^MM, where m_Θ = min_{π≤λ≤π,θ∈Θ}|θ(e^iλ)| > 0; see Appendix A. It follows from (iv) of Proposition D.1 that

converges uniformly to Γ₀^MM(θ)⁻¹, which is therefore continuous (and bounded above by m_Θ⁻²(Γ₀^MM)⁻¹). Hence (A_T^M(θ) − A^M)T^−1/2D_M,T⁻¹ converges uniformly a.s. to A^NC^NM(θ), which is continuous on Θ and also bounded. █

Proof of Theorem 5.1. The assertions follow from (5.2) and Lemma D.2 with τ_T(θ*) = β_T(θ*), H = Z = Θ, and

, for U_t−j(T) defined by (C.1), because the uniform convergence of

and the boundedness of ∥Γ₀^U(θ)∥ on Θ, which are required to apply Lemma D.2, follow from (ii) and (iii), respectively, of Proposition D.1. The uniform convergence of

required by the proposition is the special case ε₀ = 0 in Appendix A. The fact that G_U(λ) = blockdiag(G_y(λ),G_X(λ)), because of (C.3), yields the form of G_M,θ*(λ) in (5.5). █

Proof of Theorem 7.1. We start by establishing that, for any invertible θ and θ*, we have Γ₀^M(θ,θ) ≤ Γ₀^M(θ,θ*) with equality holding if and only if A^NC^NM(θ*) = A^NC^NM(θ). Indeed, the component of Γ₀^M(θ,θ*) that depends on θ* can be reexpressed in terms of the analogues of C^NM(θ*) and Γ₀^X(θ) obtained by replacing X_t^N with

. Denoting these analogues by

and

, we have

By a standard calculation, for any C with the dimensions of

with equality holding in (E.1) if and only if

Next, note that because Γ₀^M(θ,θ) and Γ₀^M(θ,θ*) are continuous functions of θ on Θ, they have minimizers θ, resp. θ* over Θ. From the result just established, we obtain

Thus Γ₀^M(θ,θ) = Γ₀^M(θ*,θ*) holds, i.e., equality in (7.3), if and only if (7.4) and Γ₀^M(θ*,θ*) = Γ₀^M(θ*,θ*) do, and the latter is equivalent to (7.5), as was just shown.

In particular, equality in (7.3) implies the failure of (7.6) for θ = θ* satisfying (7.4). Conversely, failure of (7.6) for some minimizer θ, i.e., A^NC^NM(θ*) = A^NC^NM(θ), implies Γ₀^M(θ*,θ*) ≤ Γ₀^M(θ,θ*) = Γ₀^M(θ,θ), which, from (E.2), yields Γ₀^M(θ*,θ*) = Γ₀^M(θ,θ) = Γ₀^M(θ*,θ*), i.e., equality in (7.3). Therefore (7.6) for all θ minimizing Γ₀^M(θ,θ) is necessary and sufficient for strict inequality in (7.3).

From Theorem 4.1 and (5.11), it follows that the left-hand side of (7.7) is equal a.s. to the left-hand side of

The assertions concerning (7.7) follow from (E.3) and the fact that, when Θ₀ = {θ}, equality holds in (E.3) because θ^T → θ a.s., from (5.11). █

References

REFERENCES

Aelen, F. (2004) Improving Timeliness of Industrial Short-Term Statistics Using Time Series Analysis. Statistics Netherlands, Division of Technology and Facilities, Methods and Informatics Discussion paper 04005. Statistics Netherlands. http://www.cbs.nl/NR/rdonlyres/EC77EDFA-1579-4AF6-9763-8EFEEB9F4CC5/0/discussionpaper04005.pdf.

Amemiya, T. (1973) Generalized least squares with an estimated autocovariance matrix. Econometrica 41, 723–732.Google Scholar

Anderson, T.W. (1971) The Statistical Analysis of Time Series. Wiley.

Bell, W.R. & S.C. Hillmer (1983) Modelling time series with calendar variation. Journal of the American Statistical Association 78, 526–534.Google Scholar

Box, G.E.P. & G.M. Jenkins (1976) Time Series Analysis: Forecasting and Control, rev. ed. Holden-Day.

Box, G.E.P. & G.C. Tiao (1975) Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 70, 70–79.Google Scholar

Brillinger, D.R. (1975) Time Series. Holt, Rinehart and Winston.

Findley, D.F. (2003) Convergence of Estimates of Misspecified regARIMA Models and Generalizations and the Optimality of Estimated GLS Regression Coefficients for One-Step-Ahead Forecasting. Statistical Research Division Research Report Statistics 2003-03. U.S. Census Bureau. http://www.census.gov/srd/papers/pdf/ rrs2003-06.pdf.

Findley, D.F. (2005) Asymptotic stationarity properties of out-of-sample forecast errors of misspecified regARIMA models and the optimality of GLS for one-step-ahead forecasting. Statistica Sinica 15, 447–476.Google Scholar

Findley, D.F., B.C. Monsell, W.R. Bell, M.C. Otto, & B.C. Chen (1998) New capabilities and methods of the X-12-ARIMA seasonal adjustment program. Journal of Business & Economic Statistics 16, 127–177 (with discussion).Google Scholar

Findley, D.F., B.M. Pötscher, & C.Z. Wei (2001) Uniform convergence of sample second moments of time series arrays. Annals of Statistics 29, 815–838.Google Scholar

Findley, D.F., B.M. Pötscher, & C.Z. Wei (2004) Modeling of time series arrays by multistep prediction or likelihood methods. Journal of Econometrics 118, 151–187.Google Scholar

Findley, D.F. & R.J. Soukup (2000) Modeling and model selection for moving holidays. In 2000 Proceedings of the Business and Economic Statistics Section of American Statistical Association, pp. 102–107. American Statistical Association. Also http://www.census.gov/ts/papers/asa00_eas.pdf.

Grenander, U. (1954) On the estimation of regression coefficients in the case of an autocorrelated disturbance. Annals of Mathematical Statistics 25, 252–272.Google Scholar

Grenander, U. & M. Rosenblatt (1984) Time Series, 2nd ed. Chelsea.

Hannan, E.J. (1970) Multiple Time Series. Wiley.

Hannan, E.J. (1982) Testing for autocorrelation and Akaike's criterion. In J. Gani & E.J. Hannan (eds.), Essays in Statistical Science, Papers in Honour of P.A.P. Moran, pp. 403–412. Applied Probability Trust.

Koreisha, S.G. & Y. Fang (2001) Generalized least squares with misspecified correlation structure. Journal of the Royal Statistical Society, Series B 63, 515–532.Google Scholar

Newton, H.J. & M. Pagano (1983) The finite memory prediction of covariance stationary time series. SIAM Journal of Scientific and Statistical Computation 4, 330–339.Google Scholar

Peña, D., G.E. Tiao, & R.S. Tsay (2001) A Course in Time Series Analysis. Wiley.

Pierce, D.A. (1971) Least squares estimation in the regression model with autoregressive-moving average errors. Biometrika 58, 299–312.Google Scholar

Pötscher, B.M. (1987) Convergence results for maximum likelihood type estimators in multivariate ARMA models. Journal of Multivariate Analysis 21, 29–52.Google Scholar

Pötscher, B.M. (1991) Noninvertibility and pseudo-maximum likelihood estimation of misspecified ARMA models. Econometric Theory 7, 435–449. Corrections: Econometric Theory 10, 811.Google Scholar

Pötscher, B.M. & I.R. Prucha (1997) Dynamic Nonlinear Econometric Models: Asymptotic Theory. Springer-Verlag.

Stock, J.H. & M.W. Watson (2002) Introduction to Econometrics. Addison-Wesley.

Thursby, J.G. (1987) OLS or GLS in the presence of misspecification error? Journal of Econometrics 35, 359–374.Google Scholar

West, K.D. (1996) Asymptotic inference about predictive ability. Econometrics 64, 1067–1084.Google Scholar

Article contents

OPTIMALITY OF GLS FOR ONE-STEP-AHEAD FORECASTING WITH REGARIMA AND RELATED MODELS WHEN THE REGRESSION IS MISSPECIFIED

Abstract

1. INTRODUCTION

1.1. Overview of the Paper

2. THE DATA AND REGRESSOR ASSUMPTIONS

2.1. Consequences of (2.1), (2.8), and (2.9) for yt and ytM

2.2. Sufficient Conditions for (2.1) and (2.8)

3. THE θ-PARAMETERIZATION OF ARMA MODELS

4. UNIFORM CONVERGENCE OF GLS ESTIMATES

5. UNIFORM ASYMPTOTIC STATIONARITY OF FORECAST ERRORS

6. EXTENSION TO ARIMA DISTURBANCE MODELS

6.1. Forecasting a Stochastic Regressor to Impute Values for Late Survey Responders

6.1.1. A Class of Distributed Lag Models Satisfying the Assumptions of Theorem 5.1.

7. OPTIMALITY OF GLS

7.1. Examples Involving AR(1) Models and dim XtM = dim XtN = 1

7.1.1. The Inferiority of White Noise Modeling with OLS when φ* ≠ 0.

7.1.2. Periodic Xt and an Example of ΔNM.

7.2. Regarding Asymptotic Efficiency in the Sense of Grenander (1954)

8. EXTENSIONS AND RELATED RESULTS

APPENDIX A. Compact θ-Sets for Estimation

APPENDIX B. Scalable Asymptotic Stationarity

APPENDIX C. Vector Array Reformulation of Assumptions

APPENDIX D. Uniform Convergence Results for Filtered A.S. Arrays

APPENDIX E. Proofs

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

2.1. Consequences of (2.1), (2.8), and (2.9) for y_t and y_t^M

7.1. Examples Involving AR(1) Models and dim X_t^M = dim X_t^N = 1

7.1.2. Periodic X_t and an Example of Δ^NM.