Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-02-11T07:02:08.319Z Has data issue: false hasContentIssue false

OPTIMALITY OF GLS FOR ONE-STEP-AHEAD FORECASTING WITH REGARIMA AND RELATED MODELS WHEN THE REGRESSION IS MISSPECIFIED

Published online by Cambridge University Press:  06 September 2007

David F. Findley
Affiliation:
U.S. Census Bureau
Rights & Permissions [Opens in a new window]

Abstract

We consider the modeling of a time series described by a linear regression component whose regressor sequence satisfies the generalized asymptotic sample second moment stationarity conditions of Grenander (1954, Annals of Mathematical Statistics 25, 252–272). The associated disturbance process is only assumed to have sample second moments that converge with increasing series length, perhaps after a differencing operation. The model's regression component, which can be stochastic, is taken to be underspecified, perhaps as a result of simplifications, approximations, or parsimony. Also, the autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) model used for the disturbances need not be correct. Both ordinary least squares (OLS) and generalized least squares (GLS) estimates of the mean function are considered. An optimality property of GLS relative to OLS is obtained for one-step-ahead forecasting. Asymptotic bias characteristics of the regression estimates are shown to distinguish the forecasting performance. The results provide theoretical support for a procedure used by Statistics Netherlands to impute the values of late reporters in some economic surveys.The author thanks two referees and the co-editor for comments and suggestions that led to substantial improvements in the exposition and also thanks John Aston and Tucker McElroy for helpful comments on an earlier draft. Any views expressed are the author's and not necessarily those of the U.S. Census Bureau.

Type
Research Article
Copyright
© 2007 Cambridge University Press

1. INTRODUCTION

For many economic indicator series, modeling requires specification of both a regression function and an autocovariance structure for the disturbance process. Suppose that, possibly after a variance stabilizing transformation (e.g., differencing), one has data Wt, 1 ≤ tT of the form

where the Xt are column vectors and the yt are real variates that are asymptotically orthogonal to the Xt in a sense to be defined, whose lagged sample second moments converge as T → ∞. With monthly or quarterly seasonal economic data, AXt might describe a linear or higher degree trend, stable seasonal effects, moving holiday effects (Bell and Hillmer, 1983), trading day effects (Findley, Monsell, Bell, Otto, and Chen, 1998), or other periodic effects. The term Xt might also include values of related stochastic variables, perhaps at leads or lags. We address the situation in which the modeler considers a model

whose regressor XtM is not able to reproduce AXt for all t because of known or unknown omissions, approximations, simplifications, etc. We assume that the modeler, perhaps starting from the ordinary least squares (OLS) estimate for AM given by (1.5) later in this section, has decided upon an autoregressive moving average (ARMA) model family, not necessarily correct, for the disturbance (or residual) process ytM = WtAMXtM. Such a model for (1.2) is called a regARMA model.

Generalized least squares (GLS) estimation of AM occurs simultaneously with ARMA estimation. The simplest definition of (feasible) GLS estimates of AM, given by (1.3), makes use of the ARMA model's innovation filter that is defined as follows. With L denoting the lag operator, let φ(L) be the autoregressive polynomial (AR) and α(L) the moving average (MA) polynomial of a (perhaps incorrect) candidate ARMA model for ytM and let θ = (1,θ12,…) denote the coefficient sequence of the power series expansion

. When yt in (1.1) and the regressors missing from XtM are weakly (i.e., first and second moment) stationary with mean zero, then ytM will be weakly stationary with mean zero. In this case, assuming that values of ytM are available at all past times,

is the model's linear forecast of ytM from ysM, −∞ < st − 1; see Section 5.3.3 of Box and Jenkins (1976) or Hannan (1970, p. 147). The forecast errors

are called the model's innovations series, and the coefficient sequence θ is its innovation filter. If the ARMA model is correct, then for each t, at(θ) is uncorrelated with ysM, −∞ < st − 1, and it follows that yt|t−1M(θ) has minimum mean square error among all such linear forecasts of ytM and that the innovations at(θ) are uncorrelated (white noise). However, we do not assume that a correct ARMA model exists or that ytM is weakly stationary. For example, when a missing regressor is deterministic, e.g., periodic, ytM will not be weakly stationary even when yt is but will instead be asymptotically stationary, meaning that its lagged sample second moments will converge as T increases. Their limits form the autocovariance sequence of a weakly stationary process. In effect, it is this autocovariance sequence for which an ARMA model is sought. All ARMA model–related quantities of interest in this paper depend only on θ and on the Wt and XtM. Thus we can express model dependence in terms of θ, as we do throughout the paper. Further motivation for this “parameterization” is given in Section 3. We refer to each θ as a model.

For given Wt,XtM, 1 ≤ tT and θ, define

and

for 1 ≤ tT and let ′ denote transpose. Following Pierce (1971), we define the θ-model's GLS estimator of AM to be

(We discuss another GLS estimator in Section 8.) With these ATM(θ), an estimate of θ (and of the ARMA coefficients determining θ when they are identified) can be obtained by conditional or unconditional maximum likelihood estimation (MLE). (As usual, Gaussian likelihood functions are used without requiring the data to be Gaussian.) For the conditional MLE estimates on which we focus for simplicity (see Box and Jenkins, 1976, Sect. 7.1.2), for each 1 ≤ tT, one defines the θ-model's forecast of Wt from Ws, 1 ≤ st − 1 to be

, with the convention

. This is the special case Wt|t−1M(θ,θ,T) of the more general forecast function Wt|t−1M(θ,θ*,T) defined in (1.6), which follows. Conditional MLE estimates θT leading to GLS estimates ATMT) are the minimizers

where Θ is a compact set of θ specified by ARMA(p,q) models whose AR and MA polynomials have all zeroes in {|z| ≥ 1 + ε}, for some ε > 0.

Responding to the extensive literature comparing GLS with OLS, we also consider model estimates and forecasts based on the OLS estimate of AM,

This is the special case ATM(θ*) of (1.3) with θ* = (1,0,0,…), the white noise model for ytM. The forecast function of Wt associated with ATM is obtained by using this choice of θ* in

With this formula, for any fixed θ*, conditional MLE yields a specification

.

In this paper, we obtain formulas for the limiting values of average squared one-step-ahead prediction errors obtained from these two types of MLEs,

and, for fixed θ*,

With Theorems 5.1 and 7.1, which are given later in the paper, we show, under general assumptions on Xt and XtM given subsequently, that (1.7) is always less than or equal to (1.8), typically less. This is the optimality property of GLS referred to in the title of this paper. (By contrast, in the correct regressor case, when all our assumptions hold except (2.9) requiring asymptotic nonnegligibility of the omitted regressors, the two limits are equal.) Further, using OLS with the white noise model θ* = (1,0,0,…) for ytM, as is often done, usually leads to even worse forecasts, in the sense that

has a larger value than (1.8); see Section 7.1.1.

1.1. Overview of the Paper

The regressor sequence Xt, t ≥ 1 is required to satisfy the conditions of Grenander (1954), which define a property we call scalable asymptotic stationarity; see Section 2 and Appendix B. Grenander introduced this generalization of stationarity to investigate the efficiency of OLS estimates for a large class of nonstochastic regressors in models with a broad range of weakly stationary disturbances. We indicate in Section 7.2 why efficiency in Grenander's sense is rarely applicable in the context of misspecified nonstochastic regressors. For the models we consider, the regressor XtM in (1.2), which can be stochastic, is taken to be a proper subvector of Xt. The remaining entries of Xt can be those of any vector XtN, compatible with our assumptions, whose variables compensate for the inadequacies of XtM in such a way that, for some AM and AN, the regression function in (1.1) can be decomposed as

Then, in (1.2), ytM = ANXtN + yt.

Our requirements for XtM, XtN, and yt are comprehensively stated in Section 2 and verified for some important classes of models in Sections 2.1 and 6.1.1. More information about ARMA model parameterization with innovations coefficient sequences θ = (1,θ12,…) is provided in Section 3, which includes some elementary examples. For diagonal scaling matrices DM,T such that

is nonsingular, Theorem 4.1 gives a formula for limT→∞ (ATM(θ) − AM)T −1/2DM,T−1 and establishes that convergence is uniform on the compact sets Θ defined in Appendix A. For a given θ, this limit is called the asymptotic bias characteristic of ATM(θ) for AM. Section 5 obtains formulas for the limits of the sample second moments of the forecast errors WtWt|t−1M(θ, θ,T) and WtWt|t−1M(θ, θ*,T). The analogous results for regARIMA-type nonstationary models, for situations in which the disturbance process requires a differencing transformation prior to ARMA modeling, are discussed in Section 6. We describe, in Theorem 7.1 in Section 7, how the optimality property of GLS mentioned previously arises: the better performance of GLS relative to OLS occurs when the OLS estimate has an asymptotic bias characteristic different from that of the GLS estimate. These results provide support for an imputation procedure used by Statistics Netherlands (Aelen, 2004), which uses one-step-ahead forecasts from regARIMA models with stochastic distributed lag regressors to impute the net contribution of late-reporting firms to economic time series from certain monthly surveys; see Section 6.1. Section 7.1 provides elementary expressions for some asymptotic quantities associated with GLS and OLS estimation when ytM is modeled as a first-order autoregression. These are used to illustrate the generality of GLS's optimality. Section 8 discusses related results and extensions.

Proofs of the theorems are given in Appendix E. They use the auxiliary results of Appendix D obtained mainly from Findley, Pötscher, and Wei (2001).

2. THE DATA AND REGRESSOR ASSUMPTIONS

In (1.1), we require yt, t ≥ 1 to be asymptotically stationary (A.S.) in the sense of Pötscher (1987), meaning that for each k = 0,±1,…, the lag k sample second moments have asymptotic limits almost surely (i.e., with probability one), denoted a.s. That is, the limits

exist. (By convention,

, if a < b.) From a well-known result of Herglotz, the sequence of asymptotic lag k second moments γky has a spectral distribution function Gy(λ) such that

for k = 0,±1,….

We require Xt, t ≥ 1 in (1.1) to be scalably asymptotically stationary (S.A.S.), meaning that the limits

exist, where the DX,T are diagonal scaling matrices, DX,T = diag(d1,T,…,ddim X,T), which are positive definite, decrease to zero (DX,T [nearr ] 0), and satisfy limT→∞ DX,T+k−1DX,T = IX for each k ≥ 0. Here IX is the identity matrix of order dim X. (Ordinary convergence is meant in (2.2) if no coordinate of Xt is stochastic.) The resulting sequence ΓkX has a spectral distribution matrix function

for k = 0,±1,…; see Appendix B for further background, including examples.

Partition Xt as

where, as in the Introduction, the superscript N designates regressors not in the model (1.2). Let the corresponding partition of A in (1.1) be A = [AM AN] and let those of DX,T, ΓkX, and GX(λ) be, respectively,

From DX,T [nearr ] 0, we have

We require Γ0MM to be positive definite,

and restrict XtN to being A.S.,

Of course, (2.7) is equivalent to DN,T = T−1/2IN, with IN the identity matrix of order dim XN. We exclude omitted regressors of larger order, e.g., tp with p > 0, because they yield unbounded ytM dominated by ANXtN, which would clearly reveal the inadequacy of XtM with large enough T.

Further, the two series yt and Xt must be asymptotically orthogonal, meaning that

Finally, to keep the focus on the incorrect regressor situation, we assume that

In summary, our assumptions concerning (1.1) are (2.1), (2.2), and (2.5)–(2.9).

2.1. Consequences of (2.1), (2.8), and (2.9) for yt and ytM

First we note that, when Xt contains an entry equal to 1 for all t, then the corresponding scaling factor in DX,T can be taken to be T−1/2, and so (2.8) yields

In this sense, yt in (1.1) can be thought of as an asymptotically mean zero process. A similar result holds for the disturbances ytM = ANXtN + yt of the misspecified model (1.2); see Section 4.

Now we establish the asymptotic stationarity of the ytM. From the requirement (2.7) that XtN be A.S. and from (2.1) and (2.8), for each k,

is given by

where GyM(λ) = ANGNN(λ)AN′ + Gy(λ). From (2.9), we have γ0M > 0. (The term γ0y can be zero.)

Finally, we note that, except in special situations such as that of Section 7.2, the disturbances and regressors in (1.2) will be asymptotically correlated, meaning

for some k, which will usually cause ATM(θ) defined in (1.3) to be biased asymptotically for some θ; see Theorem 4.1.

2.2. Sufficient Conditions for (2.1) and (2.8)

The properties (2.1) and (2.8) hold under reasonably general assumptions on yt and Xt. The verification of (2.8) for a common type of stochastic regression model is discussed in Section 6.1.1. Here we consider the case in which yt is weakly stationary with mean zero and Xt is nonstochastic with Γ0X > 0. Then, for almost sure convergence in (2.1) and (2.8), it suffices to have

, with

for some independent white noise process εt such that supt Et|r < ∞ with r > 2 if yt has a bounded spectral density or, if the spectral density of yt is unbounded but square integrable, with r > 4; see Section 3.1 of Findley et al. (2001).

3. THE θ-PARAMETERIZATION OF ARMA MODELS

Three features of our ARMA model situation may be new to readers not familiar with the vein of research literature of which the papers by Pötscher (1987, 1991) are representative: (a) the disturbances ytM, 1 ≤ tT are not required to have means or covariances but only the asymptotic stationarity property; (b) no ARMA model is assumed to be correct in the sense of being able to exactly model the asymptotic lagged second moment sequence (2.10); (c) the ARMA coefficients of a model envisioned as φ(L)ytM = α(L)at are replaced by the innovations filter θ = (1,θ12,…) defined by the property that

satisfies θ(z) = φ(z)/α(z) for |z| < 1. In this section, we provide some orienting discussion and examples.

We assume that α(z) ≠ 0 for all |z| ≤ 1, i.e., that the model is invertible. When ytM is weakly stationary with mean zero and defined for all t, then there always exists a weakly stationary series at = at(θ) such that the preceding ARMA model formula holds, namely,

. When ytM is only A.S. and defined only for t ≥ 1, we define

, t ≥ 1. This series is A.S. with asymptotic lag k second moment given by

, with GyM(λ) as in (2.10); see (ii) of Proposition D.1 in Appendix D. We would call the θ-model correct if the white noise property, γka(θ) = 0 for k ≠ 0, obtains or, equivalently, if

for all k for some σ2 > 0. However, our theorems do not require any model for ytM, t ≥ 1 to be correct in this sense.

For subsequent discussions, it will be useful to have in mind the θ's of some simple ARMA models. As was indicated in Section 1, a white noise model has θ = (1,0,0,…). For the invertible ARMA(1,1) model, (1 − φL)ytM = (1 − αL)at, with |α|,|φ| < 1, one has θj = αj−1(α − φ), j ≥ 1. For AR(1) and MA(1) models, we have θ = (1,−φ,0,0,…) and θ = (1,α,α2,…), respectively.

Model parameterization by θ is useful because the θ's that are determined by likelihood-maximizing ARMA coefficients have uniquely defined large-sample limits in situations where the ARMA coefficients themselves do not, because of common zeroes in limiting AR and MA polynomials. For example, when an ARMA(1,1) model is fitted to white noise, the sequence of maximum likelihood pairs (φTT) has multiple limit (or cluster) points, all on the line {(α,α) : |α| ≤ 1}; see Hannan (1982). However, when φ = α for an ARMA(1,1) model, then θ = (1,0,0,…), and so this is the only limit point of the filter sequence θT defined by the maximum likelihood estimates φT, αT. That is, θT → θ a.s. coordinatewise, i.e., θjT → θj a.s., j ≥ 0.

As in the preceding examples, the coordinates of θ are always continuous functions of the ARMA coefficients. The converse holds only if the ARMA model is identifiable, i.e., the AR and MA polynomials have no common zero; also see the Appendix of Pötscher (1991) for additional background on the θ-parameterization. (Pötscher's parameter is the coefficient sequence of

. The relationship between θ and

is continuous and invertible; see Section 3 of Findley, Pötscher, and Wei, 2004.)

To obtain the uniform convergence and continuity properties needed to establish the results indicated in the Introduction, ARMA(p,q), model coefficient estimation is restricted to compact sets of AR and MA coefficient vectors whose polynomials have all zeroes in {|z| ≥ 1 + ε} for some ε > 0. Such sets specify compact sets Θ of the type discussed in Appendix A.

4. UNIFORM CONVERGENCE OF GLS ESTIMATES

We now present a fundamental convergence property of the ATM(θ) defined in (1.3). A generalized inverse is to be used in (1.3) when the inverse matrix fails to exist. This can (with probability one when XtM is stochastic) only happen for a finite number of T values, because of (2.6) and (iv) of Proposition D.1 in Appendix D. For any matrix M, define ∥M∥ = λmax1/2(MM′), with λmax(·) denoting the maximum eigenvalue. If M is a vector with real coordinates m1,…,mn, then

.

Partition

analogously to (2.4), i.e.,

with

, etc. For θ from an invertible model, define

In Appendix E, we prove the theorem that follows.

THEOREM 4.1. Let Θ be a compact set of models as described in Appendix A. Under the assumptions (2.1), (2.2), and (2.5)–(2.8), we have, uniformly on Θ,

The function CNM(θ) is continuous on Θ and thus bounded there, maxθ∈ΘCNM(θ)∥ < ∞.

For a given θ, limT→∞ (ATM(θ) − AM)T −1/2DM,T−1 = ANCNM(θ) is called the asymptotic bias characteristic of ATM(θ) for AM. It is nonzero for some θ if ΓkNM ≠ 0 for some k, i.e., if the series ANXtN and XtM are asymptotically correlated. When DM,T = T −1/2, then ANCNM(θ) is the asymptotic bias of ATM(θ) for AM. Omitted variable bias is a fundamental modeling issue; see, e.g., Stock and Watson (2002, pp. 143–149). Section 7 will show that, when ANCNM(θ) varies with θ, there is usually an optimal value of ANCNM(θ) for one-step-ahead forecasting that is determined by the θT sequence of (1.4).

If XtM has one or more coordinates that are A.S., then for any

that differs from AM only in these coordinates we have, uniformly on Θ,

This reveals the important fact that the asymptotic bias characteristic associated with an alternative omitted-regressor decomposition,

with

, differs from the right-hand side of (4.2) by a term that is independent of θ.

Except in special situations, e.g., when the omitted regressors are precisely known, there is always ambiguity concerning XtN and AM. However, it is useful to note that if a coordinate Xi,tM of XtM is constant with value one, then

can be assumed to be zero: by defining

to differ from AM in that

replaces AiM, and by defining

, one obtains

with

. Then, for

we have

.

5. UNIFORM ASYMPTOTIC STATIONARITY OF FORECAST ERRORS

We consider sample second moments of the errors of the one-step-ahead forecasts Wt|t−1M(θ,θ*,T) from (1.6). For 1 ≤ tT, the forecast errors WtWt|t−1M(θ,θ*,T) are observable and equal to Wt[θ] − ATM(θ*)XtM[θ], which yields

Thus, setting Ut(T) = [yt T1/2DM,TXtM XtN]′, 1 ≤ tT and βT(θ*) = [1 (AMATM(θ*))T−1/2DM,T−1 AN], we have

Let Θ* be a compact set in the sense of Appendix A. For β(θ*) = [1 −ANCNM(θ*) AN], Theorem 4.1 yields

This fact and the properties of the Ut(T) array described in Appendix C lead to the following theorem, which is proved in Appendix E. Define

and

For any Θ,Θ*, let Θ × Θ* denote the Cartesian product set {(θ,θ*) : θ ∈ Θ,θ* ∈ Θ*} and define convergence (θT,θ*T) → (θ,θ*) in Θ × Θ* to mean θjT → θj and θj*T → θj* for all j ≥ 0.

THEOREM 5.1. Let Θ and Θ* be compact sets of models as described in Appendix A. Under the assumptions (2.1), (2.2), and (2.5)–(2.8), the forecast-error arrays WtWt|t−1M(θ,θ*,T), 1 ≤ tT are continuous on Θ × Θ* and also jointly uniformly A.S. there. Specifically, for each k = 0,±1,…, as T → ∞, with

for GM,θ*(λ) as in (5.5), the limits

hold uniformly a.s. on Θ × Θ*. Further, the functions ΓkM(θ,θ*) are continuous and uniformly bounded on Θ × Θ*. Also, from (5.7) and (5.1), for given θ and θ*, the values of ΓkM(θ,θ*) depend only on the values of the series AXt, XtM and yt = WtAXt, not on the specification of the compensating regressor XtN in decompositions AXt = AMXtM + ANXtN (see Sect. 4).

Theorem 5.1 shows that the quantities Γ0M(θ,θ*) are of special interest because they describe limiting average squared one-step-ahead forecast errors. With

(5.5) yields the decomposition

By specializing the argument used to establish Theorem 5.1, γ0y(θ) is seen to be the limiting average squared error of the θ-model's one-step-ahead forecast of Wt when XtM = Xt. Similarly, using (4.2), the final quantity in (5.9) is seen to be the limit of the average of the squares of one-step-ahead forecast errors of the regression-function error array AXtATM(θ*)XtM, 1 ≤ tT,

It follows from the results for k = 0 in Theorem 5.1 by standard arguments (see Pötscher and Prucha, 1997, Ch. 3 and Lem. 4.2) that the conditional maximum likelihood estimators θT of (1.4) converge a.s. to the compact set Θ0 of minimizers of Γ0M(θ,θ) over Θ,

That is, on a set of realizations of the random variables in (1.1) with probability one, the limit point of each (coordinatewise) convergent subsequence of θT,T ≥ 1 belongs to Θ0. (So if there is a unique minimizer θ, then θTθ a.s.) Equivalently, in terms of the l1-norm (see Appendix A), limT→∞ minθ∈Θ0∥θT − θ∥1 = 0 a.s.

Similarly, the conditional maximum likelihood estimators θ*T associated with ATM(θ*) for fixed θ* ∈ Θ converge a.s. to the set of minimizers of Γ0M(θ,θ*), which usually does not include θ*; see Section 7.1.1.

6. EXTENSION TO ARIMA DISTURBANCE MODELS

Now suppose the observed data are

, 1 − dtT from a time series of the form

to which a model of the form

is being fit. Suppose also that it has been correctly determined that the disturbances

require “differencing” with an operator

, whose zeroes are on the unit circle, to obtain residuals for which an ARMA model can be considered. The resulting model is called a regARIMA model for

. Such models are extensively used for seasonal time series in the context of seasonal adjustment (see Findley et al., 1998; Peña, Tiao, and Tsay, 2001), often with δ(L) = (1 − L)(1 − Ls),s = 4,12. We assume that (2.1), (2.2), and (2.5)–(2.8) hold for

,

,

, and

and that XtM is a subvector of Xt. For any 1 ≤ tT, because

, for given θ and θ* a natural one-step-ahead forecast for

is

, with Wt|t−1M(θ,θ*,T) defined by (1.6). This leads to

for 1 ≤ tT and therefore to forecast-error limiting results as in Theorem 5.1 with the same functions ΓkM(θ,θ*).

6.1. Forecasting a Stochastic Regressor to Impute Values for Late Survey Responders

We briefly consider an application involving regARIMA models with stochastic regressors. Section 3.3 of Aelen (2004) provides an interesting one-step-ahead forecasting application involving a variety of seasonal time series

whose values come from enterprises that report economic data to Statistics Netherlands a month late, and

includes the sum of the values for month t from all enterprises of the same type that report on time, i.e., in the desired month, and sometimes also lagged values of these sums. Thus

is stochastic. In conjunction with the following discussion of distributed lag models, Theorem 5.1 and Theorem 7.1 in Section 7 provide theoretical support for Aelen's use of the regARIMA model GLS estimation and one-step-ahead forecasting procedures of X-12-ARIMA (Findley et al., 1998) to obtain Statistics Netherlands' imputed value for

in the month in which

becomes available.

6.1.1. A Class of Distributed Lag Models Satisfying the Assumptions of Theorem 5.1.

After differencing, Aelen's model becomes a distributed lag model with regressors and correlated disturbances that are both treated as stationary. We consider a broad class of such models. Suppose that Wt and Zt are jointly covariance stationary variates with zero means and that the spectral density matrix of Zt is Hermitian positive definite at all frequencies. Then, when the autocovariance sequence ΓkV of Vt = [Wt Zt′]′ satisfies

, there exist coefficients Ak satisfying

such that

holds, with EytZtk′ = 0, k = 0,±1,…; see Theorem 8.3.1 of Brillinger (1975). For any m,n ≥ 0, setting XtM = [Zt+n′ … Ztm′]′, XtN = [sum ]k≠−n,…,mAkZtk, AM = [AnAm], and AN = 1 leads to (1.1) and (1.2) having the form of a distributed lag model with stationary disturbances (see, e.g., Stock and Watson, 2002) and to the assumptions of Theorem 5.1 holding under Gaussianity or weaker assumptions on Vt; see Theorem IV.3.6 of Hannan (1970).

7. OPTIMALITY OF GLS

Because of the uniform convergence and continuity results established in Theorem 5.1, for any compact Θ as described in Appendix A, we have

and, for any fixed θ* ∈ Θ,

In Appendix E, we establish the theorem that follows.

THEOREM 7.1. Let Θ be a compact set as described in Appendix A and suppose that (2.1), (2.2), and (2.5)–(2.8) hold. Then for any fixed θ* ∈ Θ,

with equality holding if and only if a minimizer θ* of Γ0M(θ,θ*) over Θ is always a minimizer of Γ0M(θ,θ),

and, simultaneously, the asymptotic bias characteristic of ATM(θ*) as an estimator of AM coincides with that of ATM(θ*),

As a consequence, strict inequality obtains in (7.3) if and only if

holds for every minimizer θ of Γ0M(θ,θ) over Θ. For the maximum likelihood estimators θT of (1.4), this condition implies

Conversely, if Γ0M(θ,θ) has a unique minimizer θ, then (7.7) implies (7.6).

Unless θ* is a minimizer of Γ0M(θ,θ), we expect that both minθ∈Θ Γ0M(θ, θ) < Γ0M(θ*,θ*) and ANCNM(θ*) ≠ ANCNM(θ*) will hold except in quite special situations, the only one known to us being when ANCNM(θ*), and therefore also Γ0M(θ, θ*), does not depend on θ*. In Section 7.1, this is shown to occur with AR(1) models for ytM only in a singular situation. Otherwise θ* is unique. Whenever θ* is unique, failure of (7.5), which implies minθ∈Θ Γ0M(θ, θ) < Γ0M(θ*,θ*) and θ* ≠ θ*, also yields Γ0M(θ*,θ*) < Γ0M(θ*, θ*).

Model sets Θ usually include the white noise model θ* = (1,0,0,…) as a degenerate case. Hence the conclusions of Theorem 7.1 are generally applicable to OLS as an alternative to GLS. They indicate the following optimality property of GLS: In conjunction with maximum likelihood estimation of θ, asymptotically, OLS estimation is never better than GLS estimation for one-step-ahead forecasting. When the regressor is underspecified and ANCNM(θ) is nonconstant, OLS will typically have greater average mean square error than GLS, for large enough T, because of its asymptotic bias characteristics being different from that of GLS.

Thursby (1987) provides comparisons of OLS and GLS biases when yt is known to be independent and identically distributed (i.i.d.) (white noise), dim XtM = 2, dim XtN = 1, the coordinates of Xt are correlated first-order AR processes, and the loss function is the posterior mean squared bias associated with a prior for the parameters that determine the covariance structure between XtN and XtM. With the aid of numerical integrations for the GLS quantities, he establishes that, depending on the choice of the autocovariance structure of XtM, the mean squared asymptotic bias of GLS is sometimes less and sometimes greater than that of OLS. Theorem 7.1 shows that, for either outcome, GLS has an asymptotic advantage over OLS for one-step-ahead forecasting.

7.1. Examples Involving AR(1) Models and dim XtM = dim XtN = 1

The condition (7.5) is the easiest to investigate, because, for AR models, θ* is the solution of a linear system of equations. For simplicity, we consider only the case in which dim XtM = dim XtN = 1 and a first-order AR model, i.e., θ = θ(φ) = (1,−φ,0,0,…), is used for the disturbance series ytM in (1.2). From (5.8) and (5.9), this leads to

where

and

. Also, with θ* = (1,−φ*,0,…), the CNM(θ*) component of BNM(θ*) is

When

the derivative of CNM(θ*) is nonzero on −1 < φ* < 1 and CNM(θ*) is strictly monotonic; see Section 6.3 in the paper by Findley (2005), whose derivation also shows that the unique θ* = (1,−φ*,0,…) minimizing (7.8) is the lag one autocorrelation of GM,θ*(λ) in (5.5),

There is no such simple formula for φ minimizing Γ0M(θ,θ) because the critical point equation for φ provides φ as a zero of a polynomial of degree five in general. However, from strict monotonicity of CNM(θ*(φ*)), if φ* ≠ φ* then (7.5) fails, and therefore strict inequality holds in (7.3) by Theorem 7.1. For the OLS choice, φ* = 0, when CNM(θ*) = CNM, (7.10) shows that φ* ≠ 0 (except possibly at a single value of (AN)2), when either γ1y or ΔNM = Γ1NN + (CNM)2Γ1MMCNM1NM + Γ−1NM) is nonzero, which will usually be the case. A periodic Xt satisfying (7.9) and ΔNM ≠ 0 is given in Section 7.1.2.

When (7.9) fails, CNM(θ*) = CNM = Γ0NM0MM for all θ*, and so equality holds in (7.3).

7.1.1. The Inferiority of White Noise Modeling with OLS when φ* ≠ 0.

If Θ is a compact model set containing the AR(1) models θ = θ(φ), then Γ0M(θ*,θ*) ≤ Γ0M(θ*(φ*),θ*). So, under (7.9) and φ* ≠ φ*, we have, from (7.3), that minθ∈Θ Γ0M(θ, θ) ≤ Γ0M(θ*, θ*) < Γ0M(θ*, θ*). Thus, for θ* = (1,0,0,…), it follows from (7.1) and (7.2) that when φ* ≠ 0, using OLS estimation of AM with the white noise model for ytM leads to asymptotically worse one-step-ahead forecasts than GLS with (1.4), for any such model set Θ.

7.1.2. Periodic Xt and an Example of ΔNM.

The trading day and holiday regressors discussed in Findley et al. (1998), Bell and Hillmer (1983), and Findley and Soukup (2000) are effectively periodic functions; i.e., Xt+PM = XtM holds for all t, for rather large periods P (e.g., 12 × 28 = 336 months for trading day regressors, 12 × 19 = 228 months for some lunar holiday regressors, more for other holidays, e.g., Easter). The simplest holiday regressors are one-dimensional and specify that the effect of the holiday is the same for each day in some interval near the holiday, a dubious but simplifying assumption. For such regressors, the compensating XtN can be assumed to be one-dimensional and have the same period.

Every regressor of period P has a Fourier representation [sum ]jαjcos(2πjt/P) + βjsin(2πjt/P) with at most P nonzero coefficients, which are uniquely determined linear functions of P consecutive values of the regressor; see Section 4.2.3 of Anderson (1971). To give a more complete analysis of (7.3) for the function (7.8), we consider a simplified period P = 4 regressor XtM having the representation XtM = a1M cos(π/2)t + a2M(−1)t, with a1M, a2M ≠ 0, for which XtN = a1N cos(π/2)t + b1N sin(π/2)t, with a1N, b1N ≠ 0. Thus Xt = [XtM XtN]′ = α1cos(π/2)t + α2(−1)t + β1sin(π/2)t, where α1 = [a1M a1N], α2 = [a2M 0], and β1 = [0 b1N]. Consequently, ΓkX = ½α1′α1cos(π/2)k + α2′α2(−1)k + ½β1′β1sin(π/2)k, k = 0,±1,…, and GX(λ) is piecewise constant with upward jumps at λ = ±π/2,π; see Anderson (1971, p. 581).

For this Xt, the left-hand side of (7.9) has the value −a1Ma1N(a2M)2, and so (7.9) holds. Further, CNM = a1Ma1N{(a1M)2 + 2(a2M)2}−1 and ΔNM = −(a2MCNM)2. Strict inequality holds in (7.3) for OLS estimation except when γ1y > 0 and (AN)2 = γ1y(a2MCNM)−2, in which case φ* = 0 = φ*.

7.2. Regarding Asymptotic Efficiency in the Sense of Grenander (1954)

Here we restrict attention to nonrandom regressors Xt in (1.1) whose components are polynomials, periodic functions, or realizations of stationary processes with continuous spectral densities and with convergent sample second moments. The disturbance process yt is assumed to be a mean zero stationary process with the last-mentioned properties. Grenander (1954) considers the correct regressor case and calls the OLS estimates

asymptotically efficient if limT→∞ DT−1E {(ATA)′(ATA)}DT−1 is minimal (in the ordering of symmetric matrices) among all linear, unbiased estimates AT of A. For this situation, his result, given on p. 244 of Grenander and Rosenblatt (1984), is that OLS is efficient if and only if the spectral distribution function GX(λ) has at most dim Xt jumps and the sum of the ranks of the jumps GX(λ+) − GX(λ+), 0 ≤ λ ≤ π is equal to dim Xt. These conditions are not satisfied, and OLS is not efficient, for most of the regressors discussed in Section 7.1.2, including the calendar effect regressors and the period four regressor with bN ≠ 0; see Chapter 7.7 and case (1) on p. 253 of Grenander and Rosenblatt (1984): usually, the number of terms in the Fourier representation of Xt, and thus also the number of jumps in GX(λ), exceeds dim Xt.

To be able to apply Grenander's result to our underspecified regression situation, assume that XtM and ytM have the properties hypothesized previously for Xt and yt. Thus XtN has a continuous spectral density and so cannot have periodic components. If we consider XtM having only polynomial and periodic components, then XtN and XtM are asymptotically orthogonal; see Section 6.1 of Findley (2005). This implies ANCNM(θ*) = 0 for all θ*, resulting in equality in (7.3) always, because Γ0M(θ,θ*) does not depend on θ*.

On the other hand, with regressors in XtM that are realizations of stationary processes, if ANCNM(θ*) is nonzero, then the analogue for ATM(θ*) of Grenander's efficiency measure fails by being infinite, because some entries of (ATM(θ*) − AM)DM,T−1 will have order T1/2; see (4.2).

Thus this concept of efficiency is not useful in our context.

8. EXTENSIONS AND RELATED RESULTS

From their connection to one-step-ahead forecast error filters, it is not very surprising that GLS estimates of regARMA and regARIMA models have an optimality property for one-step-ahead forecasting. Yet a systematic investigation of the topic has been lacking. A pleasingly simple result, such as Theorem 7.1's connection of optimality with asymptotic bias characteristics, seems possible only for the incorrect regressor case. Indeed, if asymptotic efficiency results are indicative, the correct regressor case will be quite complex. In this case, when the ARMA model for yt is incorrect, GLS can be more or less efficient than OLS; see Koreisha and Fang (2001). Even when the ARMA model is also correct, the analysis and examples of Grenander and Rosenblatt (1984) and of Section 7.2 show, for nonstochastic regressors, that OLS is asymptotically efficient only for a limited range of relatively simple regressors.

For any fixed θ*, in the incorrect nonstochastic regressor case, a referee conjectures that, under additional assumptions and with the aid of a result like Theorem 4.1 of West (1996), it can be shown that the limit as T → ∞ of the variance of

does not depend on θ*.

So far, we have only provided asymptotic results for the most simply defined GLS estimates, which are obtained by truncating the infinite-past forecast error filters and using conditional maximum likelihood estimation of the ARMA model. Section 2.4 of Findley (2005) and (d) of Lemma 10 of Findley (2005) reveal that the same limits are obtained if the errors of the finite-past one-step-ahead forecasts discussed in Newton and Pagano (1983) are used to define GLS estimates in conjunction with unconditional maximum likelihood estimation of the ARMA model. (Analogous GLS estimates from AR models were considered in Amemiya, 1973.) See Section 9 of the technical report Findley (2003) for additional details, including details about how to weaken the assumptions on XtM to include the frequently used intervention variables of Box and Tiao (1975). These decay exponentially to zero and so have weight one in DM,T, causing (2.5) to fail. Also, with the restriction to measurable minimizers θT discussed in Findley et al. (2001, 2004), in the case of nonstochastic Xt, all almost sure convergence results hold with convergence in probability when convergence in (2.1) holds only in this weaker sense.

Findley (2003) also shows how to use the results of Appendix D to generalize Theorem 5.1 to the case of multi-step-ahead forecast errors and to establish the convergence of θ-parameter estimates that minimize average squared multi-step-ahead forecast errors (allowing for ytM the more comprehensive model classes of Findley et al., 2004).

Findley (2005) uses the results of Theorems 4.1 and 7.1 to obtain formulas and GLS optimality results for the limiting average of squared out-of-sample (real time) forecast errors of regARIMA models under assumptions on the regressors Xt that are slightly more restrictive than those of Section 2 but are satisfied by all of the specific regressor types we have mentioned. The limit formulas are the same as those of the present paper when XtM is A.S. Empirical results are available from the author showing that GLS usually leads to better one-step-ahead out-of-sample forecasting performance than OLS for a suite of monthly series that are modeled with trading day and Easter holiday regressors by the U.S. Census Bureau for the purpose of seasonal adjustment.

APPENDIX A. Compact θ-Sets for Estimation

For each ε > 0 and integer pair p,q ≥ 0, we define Θp,q to be the set all of θ = (1,θ12,…) from invertible ARMA(r,s) models with rp, sq such that the zeroes of the minimal degree AR and MA polynomials φ(z) and α(z) such that θ(z) = φ(z)/α(z) all belong to {|z| ≥ 1 + ε}. Every sequence θT = (1,θ1T2T,…), T = 1,2,… in Θp,q has a subsequence θS(T) that converges coordinatewise to some θ ∈ Θp,q, i.e., θjS(T) → θj, j ≥ 1. Thus Θp,q is compact for coordinatewise convergence. Further, for 0 ≤ ε0 < ε, the sums

converge uniformly on Θp,q; i.e.,

and

. See Lemmas 2 and 10 of Findley (2005) for these and other properties mentioned. Our uniform convergence results that are presented subsequently follow from these facts as do some other important properties. First, the functions

are continuous on −π ≤ λ ≤ π and uniformly bounded and bounded away from zero on Θp,q, ε:

Second, if a sequence θT, T = 1,2,… in Θp,q converges coordinatewise to some θ, then it also converges in the stronger sense that

whenever 0 ≤ ε0 < ε. In particular, the topology of coordinatewise convergence on Θp,q, ε coincides with that of the l1-norm

.

Our theorems apply to any compact Θ for which Θ ⊆ Θp,q holds, for some ε > 0 and p,q ≥ 0. A typical Θ would arise from constraints on the zeroes of the AR and MA polynomials of the kind of ARMA model of interest.

APPENDIX B. Scalable Asymptotic Stationarity

Under the data assumptions made in Section 2, Xt and yt in (1.1) together form a multivariate sequence that is S.A.S., a property we now consider in some detail. Let Ut, t ≥ 1 be a real-valued column vector sequence that is S.A.S. and let IU denote the identity matrix of order dimU, the dimension of Ut. Thus there is a decreasing sequence D1D2 ≥ … of positive definite diagonal matrices, for which DT [nearr ] 0 and

hold, such that, for each k = 0,±1,…, the limits

exist (finitely). The properties (B.1) and (B.2) yield limT→∞ DTUTj = 0 a.s., j ≥ 0. For example, when j = 0, as T → ∞,

converges a.s. to Γ0U − Γ0U = 0, whence DTUT → 0 a.s. Further, DT [nearr ] 0 leads to limT→∞ DTU1+j = 0 a.s. for all j ≥ 0.

Without a formal name, this generalization of stationarity was introduced for regressors in Grenander (1954) to encompass a variety of nonstochastic regressors, including polynomials. (Our notation is the inverse of his, using DT where he uses DT−1. He only requires the diagonal elements of Γ0U to be positive, which is the nature of (2.9) for

. Our requirement (2.10) for XtM is stronger.) Grenander shows that the real matrix sequence ΓkU, k = 0,±1,… has a representation

in which GU(λ) is a Hermitian-matrix-valued function such that the eigenvalues of increments GU2) − GU1), λ2 ≥ λ1, are nonnegative, or, equivalently, the increments are Hermitian nonnegative; see also Grenander and Rosenblatt (1984), Chapter II of Hannan (1970), and Chapter 10 of Anderson (1971). For example, if Ut = tp, p ≥ 0, then, with DT = T−(p+1/2), one obtains ΓkU = (2p + 1)−1 for each k, and so GU(λ) can be taken to be 0 for λ < 0 and (2p + 1)−1 for λ ≥ 0. Grenander (1954) and Grenander and Rosenblatt (1984, Ch. 7) verify the joint scalable asymptotic stationarity property for regressors whose entries Xi,t are polynomials, linear combinations (perhaps infinite) of sinusoids, i.e., of cos ωjt and/or sin ωjt, for various 0 ≤ ωj ≤ π (scaling sequence T−1/2), and, finally, products of polynomials tp with linear combinations of sinusoids (scaling sequence Tp−1/2). By contrast, exponentially increasing regressors, e.g., Ut = ebt with b > 0, are not S.A.S. because (B.1) fails for

; see Hannan (1970, p. 77).

APPENDIX C. Vector Array Reformulation of Assumptions

The following reformulation of our assumptions (2.1), (2.2), and (2.5)–(2.9) concerning yt and Xt will enable us to make use of the results of Findley et al. (2001, 2004). The vector array

is A.S. More specifically, for each k = 0,±1,…,

with Γ0MM > 0 and ANΓ0NNAN′ > 0. Further, from Appendix B,

Because of (C.3), the spectral distribution matrix of the ΓkU sequence has the block diagonal form GU(λ) = blockdiag(Gy(λ),GX(λ)).

APPENDIX D. Uniform Convergence Results for Filtered A.S. Arrays

The proposition and lemma that follow are formulated for proving some of the more general results indicated in Section 8.

PROPOSITION D.1. Let Ut(T), 1 ≤ tT be an A.S. column vector array satisfying (C.4) and let GU(λ) denote the spectral distribution matrix of the asymptotic lagged second moments matrices ΓkU defined by (C.2). Let H and Z be sets of filters η = (η01,…) and ζ = (ζ01,…) such that

resp.

converges uniformly on H resp. Z. Then the filter output arrays

and

, 1 ≤ tT, η ∈ H, ζ ∈ Z have the following properties:

(i) limT→∞ supη∈HT−1/2U1+j,T [η]∥ = limT→∞ supη∈HT−1/2UTj,T [η]∥ = 0 a.s. for all j ≥ 0, and analogously for Ut [ζ](T).

(ii) As T → ∞,

, where

, for k = 0,±1,….

(iii) The functions ΓkU(η,ζ) are bounded on H × Z,

and are jointly continuous in η,ζ in the sense that, if ηTH, ζTZ are such that ηT → η and ζT → ζ (coordinatewise convergence) with η ∈ H, ζ ∈ Z, then ΓkUTT) → ΓkU(η,ζ). Also, if Z = H, then infη∈H,−π≤λ≤π|η(eiλ)|2Γ0U ≤ Γ0U(η, η) ≤ supη∈H,−π≤λ≤π|η(eiλ)|2Γ0U.

(iv) Let H be an index set for a family of arrays Ut(η,T), 1 ≤ tT, η ∈ H such that, as T → ∞,

where the Γ0(η) are positive definite matrices whose minimum eigenvalues are bounded away from zero; i.e.,

holds for some mH > 0. Then

Proof. Parts (i)–(iii) are straightforward vector extensions of special cases of Theorem 2.1 and Proposition 2.1 of Findley et al. (2001). For (iv), it follows from (D.1) and (D.2) that, given ε > 0, for each realization except those of an event with probability zero, there is a Tε such that for TTε the inequalities

and

hold. Hence for these T and all η ∈ H,

which establishes (D.3).

We also need the following lemma, whose proof can be obtained by standard arguments, as in the proof of (5.18) of Findley et al. (2004).

LEMMA D.2. Suppose that, on a set Θ*, the sequence βT(θ*), T = 1,2,… of row vector functions converges uniformly a.s. to a bounded function β(θ*), i.e., (5.3) holds, and similarly for τT(θ*), T ≥ 1 and its limit τ(θ*). Let Ut,T), η ∈ H and Wt,T), ζ ∈ Z, 1 ≤ tT be families of column vector arrays of the same dimension as β(θ*) and τ(θ*), respectively, such that, for k = 0,±1,…,

holds for functions Γk(η,ζ) with supη∈H,ζ∈Z∥Γ0(η, ζ)∥ < ∞. Then, as T → ∞,

APPENDIX E. Proofs

Proof of Theorem 4.1. We have

By (ii) and (iii) of Proposition D.1,

converges uniformly a.s. to 0 and

and

converge uniformly a.s. to the continuous limits Γ0NM(θ) and Γ0MM(θ), respectively, with Γ0MM(θ) bounded below by the positive definite matrix mΘ2Γ0MM, where mΘ = minπ≤λ≤π,θ∈Θ|θ(eiλ)| > 0; see Appendix A. It follows from (iv) of Proposition D.1 that

converges uniformly to Γ0MM(θ)−1, which is therefore continuous (and bounded above by mΘ−20MM)−1). Hence (ATM(θ) − AM)T −1/2DM,T−1 converges uniformly a.s. to ANCNM(θ), which is continuous on Θ and also bounded. █

Proof of Theorem 5.1. The assertions follow from (5.2) and Lemma D.2 with τT(θ*) = βT(θ*), H = Z = Θ, and

, for Utj(T) defined by (C.1), because the uniform convergence of

to

and the boundedness of ∥Γ0U(θ)∥ on Θ, which are required to apply Lemma D.2, follow from (ii) and (iii), respectively, of Proposition D.1. The uniform convergence of

required by the proposition is the special case ε0 = 0 in Appendix A. The fact that GU(λ) = blockdiag(Gy(λ),GX(λ)), because of (C.3), yields the form of GM,θ*(λ) in (5.5). █

Proof of Theorem 7.1. We start by establishing that, for any invertible θ and θ*, we have Γ0M(θ,θ) ≤ Γ0M(θ,θ*) with equality holding if and only if ANCNM(θ*) = ANCNM(θ). Indeed, the component of Γ0M(θ,θ*) that depends on θ* can be reexpressed in terms of the analogues of CNM(θ*) and Γ0X(θ) obtained by replacing XtN with

. Denoting these analogues by

and

, we have

By a standard calculation, for any C with the dimensions of

,

with equality holding in (E.1) if and only if

.

Next, note that because Γ0M(θ,θ) and Γ0M(θ,θ*) are continuous functions of θ on Θ, they have minimizers θ, resp. θ* over Θ. From the result just established, we obtain

Thus Γ0M(θ,θ) = Γ0M(θ*,θ*) holds, i.e., equality in (7.3), if and only if (7.4) and Γ0M(θ*,θ*) = Γ0M(θ*,θ*) do, and the latter is equivalent to (7.5), as was just shown.

In particular, equality in (7.3) implies the failure of (7.6) for θ = θ* satisfying (7.4). Conversely, failure of (7.6) for some minimizer θ, i.e., ANCNM(θ*) = ANCNM(θ), implies Γ0M(θ*,θ*) ≤ Γ0M(θ,θ*) = Γ0M(θ,θ), which, from (E.2), yields Γ0M(θ*,θ*) = Γ0M(θ,θ) = Γ0M(θ*,θ*), i.e., equality in (7.3). Therefore (7.6) for all θ minimizing Γ0M(θ,θ) is necessary and sufficient for strict inequality in (7.3).

From Theorem 4.1 and (5.11), it follows that the left-hand side of (7.7) is equal a.s. to the left-hand side of

The assertions concerning (7.7) follow from (E.3) and the fact that, when Θ0 = {θ}, equality holds in (E.3) because θTθ a.s., from (5.11). █

References

REFERENCES

Aelen, F. (2004) Improving Timeliness of Industrial Short-Term Statistics Using Time Series Analysis. Statistics Netherlands, Division of Technology and Facilities, Methods and Informatics Discussion paper 04005. Statistics Netherlands. http://www.cbs.nl/NR/rdonlyres/EC77EDFA-1579-4AF6-9763-8EFEEB9F4CC5/0/discussionpaper04005.pdf.
Amemiya, T. (1973) Generalized least squares with an estimated autocovariance matrix. Econometrica 41, 723732.Google Scholar
Anderson, T.W. (1971) The Statistical Analysis of Time Series. Wiley.
Bell, W.R. & S.C. Hillmer (1983) Modelling time series with calendar variation. Journal of the American Statistical Association 78, 526534.Google Scholar
Box, G.E.P. & G.M. Jenkins (1976) Time Series Analysis: Forecasting and Control, rev. ed. Holden-Day.
Box, G.E.P. & G.C. Tiao (1975) Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 70, 7079.Google Scholar
Brillinger, D.R. (1975) Time Series. Holt, Rinehart and Winston.
Findley, D.F. (2003) Convergence of Estimates of Misspecified regARIMA Models and Generalizations and the Optimality of Estimated GLS Regression Coefficients for One-Step-Ahead Forecasting. Statistical Research Division Research Report Statistics 2003-03. U.S. Census Bureau. http://www.census.gov/srd/papers/pdf/ rrs2003-06.pdf.
Findley, D.F. (2005) Asymptotic stationarity properties of out-of-sample forecast errors of misspecified regARIMA models and the optimality of GLS for one-step-ahead forecasting. Statistica Sinica 15, 447476.Google Scholar
Findley, D.F., B.C. Monsell, W.R. Bell, M.C. Otto, & B.C. Chen (1998) New capabilities and methods of the X-12-ARIMA seasonal adjustment program. Journal of Business & Economic Statistics 16, 127177 (with discussion).Google Scholar
Findley, D.F., B.M. Pötscher, & C.Z. Wei (2001) Uniform convergence of sample second moments of time series arrays. Annals of Statistics 29, 815838.Google Scholar
Findley, D.F., B.M. Pötscher, & C.Z. Wei (2004) Modeling of time series arrays by multistep prediction or likelihood methods. Journal of Econometrics 118, 151187.Google Scholar
Findley, D.F. & R.J. Soukup (2000) Modeling and model selection for moving holidays. In 2000 Proceedings of the Business and Economic Statistics Section of American Statistical Association, pp. 102107. American Statistical Association. Also http://www.census.gov/ts/papers/asa00_eas.pdf.
Grenander, U. (1954) On the estimation of regression coefficients in the case of an autocorrelated disturbance. Annals of Mathematical Statistics 25, 252272.Google Scholar
Grenander, U. & M. Rosenblatt (1984) Time Series, 2nd ed. Chelsea.
Hannan, E.J. (1970) Multiple Time Series. Wiley.
Hannan, E.J. (1982) Testing for autocorrelation and Akaike's criterion. In J. Gani & E.J. Hannan (eds.), Essays in Statistical Science, Papers in Honour of P.A.P. Moran, pp. 403412. Applied Probability Trust.
Koreisha, S.G. & Y. Fang (2001) Generalized least squares with misspecified correlation structure. Journal of the Royal Statistical Society, Series B 63, 515532.Google Scholar
Newton, H.J. & M. Pagano (1983) The finite memory prediction of covariance stationary time series. SIAM Journal of Scientific and Statistical Computation 4, 330339.Google Scholar
Peña, D., G.E. Tiao, & R.S. Tsay (2001) A Course in Time Series Analysis. Wiley.
Pierce, D.A. (1971) Least squares estimation in the regression model with autoregressive-moving average errors. Biometrika 58, 299312.Google Scholar
Pötscher, B.M. (1987) Convergence results for maximum likelihood type estimators in multivariate ARMA models. Journal of Multivariate Analysis 21, 2952.Google Scholar
Pötscher, B.M. (1991) Noninvertibility and pseudo-maximum likelihood estimation of misspecified ARMA models. Econometric Theory 7, 435449. Corrections: Econometric Theory 10, 811.Google Scholar
Pötscher, B.M. & I.R. Prucha (1997) Dynamic Nonlinear Econometric Models: Asymptotic Theory. Springer-Verlag.
Stock, J.H. & M.W. Watson (2002) Introduction to Econometrics. Addison-Wesley.
Thursby, J.G. (1987) OLS or GLS in the presence of misspecification error? Journal of Econometrics 35, 359374.Google Scholar
West, K.D. (1996) Asymptotic inference about predictive ability. Econometrics 64, 10671084.Google Scholar