1. INTRODUCTION
For many economic indicator series, modeling requires specification of both a regression function and an autocovariance structure for the disturbance process. Suppose that, possibly after a variance stabilizing transformation (e.g., differencing), one has data Wt, 1 ≤ t ≤ T of the form
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm001.gif?pub-status=live)
where the Xt are column vectors and the yt are real variates that are asymptotically orthogonal to the Xt in a sense to be defined, whose lagged sample second moments converge as T → ∞. With monthly or quarterly seasonal economic data, AXt might describe a linear or higher degree trend, stable seasonal effects, moving holiday effects (Bell and Hillmer, 1983), trading day effects (Findley, Monsell, Bell, Otto, and Chen, 1998), or other periodic effects. The term Xt might also include values of related stochastic variables, perhaps at leads or lags. We address the situation in which the modeler considers a model
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm002.gif?pub-status=live)
whose regressor XtM is not able to reproduce AXt for all t because of known or unknown omissions, approximations, simplifications, etc. We assume that the modeler, perhaps starting from the ordinary least squares (OLS) estimate for AM given by (1.5) later in this section, has decided upon an autoregressive moving average (ARMA) model family, not necessarily correct, for the disturbance (or residual) process ytM = Wt − AMXtM. Such a model for (1.2) is called a regARMA model.
Generalized least squares (GLS) estimation of AM occurs simultaneously with ARMA estimation. The simplest definition of (feasible) GLS estimates of AM, given by (1.3), makes use of the ARMA model's innovation filter that is defined as follows. With L denoting the lag operator, let φ(L) be the autoregressive polynomial (AR) and α(L) the moving average (MA) polynomial of a (perhaps incorrect) candidate ARMA model for ytM and let θ = (1,θ1,θ2,…) denote the coefficient sequence of the power series expansion
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm001.gif?pub-status=live)
. When yt in (1.1) and the regressors missing from XtM are weakly (i.e., first and second moment) stationary with mean zero, then ytM will be weakly stationary with mean zero. In this case, assuming that values of ytM are available at all past times,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm002.gif?pub-status=live)
is the model's linear forecast of ytM from ysM, −∞ < s ≤ t − 1; see Section 5.3.3 of Box and Jenkins (1976) or Hannan (1970, p. 147). The forecast errors
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm003.gif?pub-status=live)
are called the model's innovations series, and the coefficient sequence θ is its innovation filter. If the ARMA model is correct, then for each t, at(θ) is uncorrelated with ysM, −∞ < s ≤ t − 1, and it follows that yt|t−1M(θ) has minimum mean square error among all such linear forecasts of ytM and that the innovations at(θ) are uncorrelated (white noise). However, we do not assume that a correct ARMA model exists or that ytM is weakly stationary. For example, when a missing regressor is deterministic, e.g., periodic, ytM will not be weakly stationary even when yt is but will instead be asymptotically stationary, meaning that its lagged sample second moments will converge as T increases. Their limits form the autocovariance sequence of a weakly stationary process. In effect, it is this autocovariance sequence for which an ARMA model is sought. All ARMA model–related quantities of interest in this paper depend only on θ and on the Wt and XtM. Thus we can express model dependence in terms of θ, as we do throughout the paper. Further motivation for this “parameterization” is given in Section 3. We refer to each θ as a model.
For given Wt,XtM, 1 ≤ t ≤ T and θ, define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm004.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm005.gif?pub-status=live)
for 1 ≤ t ≤ T and let ′ denote transpose. Following Pierce (1971), we define the θ-model's GLS estimator of AM to be
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm003.gif?pub-status=live)
(We discuss another GLS estimator in Section 8.) With these ATM(θ), an estimate of θ (and of the ARMA coefficients determining θ when they are identified) can be obtained by conditional or unconditional maximum likelihood estimation (MLE). (As usual, Gaussian likelihood functions are used without requiring the data to be Gaussian.) For the conditional MLE estimates on which we focus for simplicity (see Box and Jenkins, 1976, Sect. 7.1.2), for each 1 ≤ t ≤ T, one defines the θ-model's forecast of Wt from Ws, 1 ≤ s ≤ t − 1 to be
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm006.gif?pub-status=live)
, with the convention
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm007.gif?pub-status=live)
. This is the special case Wt|t−1M(θ,θ,T) of the more general forecast function Wt|t−1M(θ,θ*,T) defined in (1.6), which follows. Conditional MLE estimates θT leading to GLS estimates ATM(θT) are the minimizers
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm004.gif?pub-status=live)
where Θ is a compact set of θ specified by ARMA(p,q) models whose AR and MA polynomials have all zeroes in {|z| ≥ 1 + ε}, for some ε > 0.
Responding to the extensive literature comparing GLS with OLS, we also consider model estimates and forecasts based on the OLS estimate of AM,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm005.gif?pub-status=live)
This is the special case ATM(θ*) of (1.3) with θ* = (1,0,0,…), the white noise model for ytM. The forecast function of Wt associated with ATM is obtained by using this choice of θ* in
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm006.gif?pub-status=live)
With this formula, for any fixed θ*, conditional MLE yields a specification
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm008.gif?pub-status=live)
.
In this paper, we obtain formulas for the limiting values of average squared one-step-ahead prediction errors obtained from these two types of MLEs,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm007.gif?pub-status=live)
and, for fixed θ*,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm008.gif?pub-status=live)
With Theorems 5.1 and 7.1, which are given later in the paper, we show, under general assumptions on Xt and XtM given subsequently, that (1.7) is always less than or equal to (1.8), typically less. This is the optimality property of GLS referred to in the title of this paper. (By contrast, in the correct regressor case, when all our assumptions hold except (2.9) requiring asymptotic nonnegligibility of the omitted regressors, the two limits are equal.) Further, using OLS with the white noise model θ* = (1,0,0,…) for ytM, as is often done, usually leads to even worse forecasts, in the sense that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm009.gif?pub-status=live)
has a larger value than (1.8); see Section 7.1.1.
1.1. Overview of the Paper
The regressor sequence Xt, t ≥ 1 is required to satisfy the conditions of Grenander (1954), which define a property we call scalable asymptotic stationarity; see Section 2 and Appendix B. Grenander introduced this generalization of stationarity to investigate the efficiency of OLS estimates for a large class of nonstochastic regressors in models with a broad range of weakly stationary disturbances. We indicate in Section 7.2 why efficiency in Grenander's sense is rarely applicable in the context of misspecified nonstochastic regressors. For the models we consider, the regressor XtM in (1.2), which can be stochastic, is taken to be a proper subvector of Xt. The remaining entries of Xt can be those of any vector XtN, compatible with our assumptions, whose variables compensate for the inadequacies of XtM in such a way that, for some AM and AN, the regression function in (1.1) can be decomposed as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm009.gif?pub-status=live)
Then, in (1.2), ytM = ANXtN + yt.
Our requirements for XtM, XtN, and yt are comprehensively stated in Section 2 and verified for some important classes of models in Sections 2.1 and 6.1.1. More information about ARMA model parameterization with innovations coefficient sequences θ = (1,θ1,θ2,…) is provided in Section 3, which includes some elementary examples. For diagonal scaling matrices DM,T such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm010.gif?pub-status=live)
is nonsingular, Theorem 4.1 gives a formula for limT→∞ (ATM(θ) − AM)T −1/2DM,T−1 and establishes that convergence is uniform on the compact sets Θ defined in Appendix A. For a given θ, this limit is called the asymptotic bias characteristic of ATM(θ) for AM. Section 5 obtains formulas for the limits of the sample second moments of the forecast errors Wt − Wt|t−1M(θ, θ,T) and Wt − Wt|t−1M(θ, θ*,T). The analogous results for regARIMA-type nonstationary models, for situations in which the disturbance process requires a differencing transformation prior to ARMA modeling, are discussed in Section 6. We describe, in Theorem 7.1 in Section 7, how the optimality property of GLS mentioned previously arises: the better performance of GLS relative to OLS occurs when the OLS estimate has an asymptotic bias characteristic different from that of the GLS estimate. These results provide support for an imputation procedure used by Statistics Netherlands (Aelen, 2004), which uses one-step-ahead forecasts from regARIMA models with stochastic distributed lag regressors to impute the net contribution of late-reporting firms to economic time series from certain monthly surveys; see Section 6.1. Section 7.1 provides elementary expressions for some asymptotic quantities associated with GLS and OLS estimation when ytM is modeled as a first-order autoregression. These are used to illustrate the generality of GLS's optimality. Section 8 discusses related results and extensions.
Proofs of the theorems are given in Appendix E. They use the auxiliary results of Appendix D obtained mainly from Findley, Pötscher, and Wei (2001).
2. THE DATA AND REGRESSOR ASSUMPTIONS
In (1.1), we require yt, t ≥ 1 to be asymptotically stationary (A.S.) in the sense of Pötscher (1987), meaning that for each k = 0,±1,…, the lag k sample second moments have asymptotic limits almost surely (i.e., with probability one), denoted a.s. That is, the limits
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm010.gif?pub-status=live)
exist. (By convention,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm011.gif?pub-status=live)
, if a < b.) From a well-known result of Herglotz, the sequence of asymptotic lag k second moments γky has a spectral distribution function Gy(λ) such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm012.gif?pub-status=live)
for k = 0,±1,….
We require Xt, t ≥ 1 in (1.1) to be scalably asymptotically stationary (S.A.S.), meaning that the limits
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm011.gif?pub-status=live)
exist, where the DX,T are diagonal scaling matrices, DX,T = diag(d1,T,…,ddim X,T), which are positive definite, decrease to zero (DX,T [nearr ] 0), and satisfy limT→∞ DX,T+k−1DX,T = IX for each k ≥ 0. Here IX is the identity matrix of order dim X. (Ordinary convergence is meant in (2.2) if no coordinate of Xt is stochastic.) The resulting sequence ΓkX has a spectral distribution matrix function
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm013.gif?pub-status=live)
for k = 0,±1,…; see Appendix B for further background, including examples.
Partition Xt as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm012.gif?pub-status=live)
where, as in the Introduction, the superscript N designates regressors not in the model (1.2). Let the corresponding partition of A in (1.1) be A = [AM AN] and let those of DX,T, ΓkX, and GX(λ) be, respectively,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm013.gif?pub-status=live)
From DX,T [nearr ] 0, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm014.gif?pub-status=live)
We require Γ0MM to be positive definite,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm015.gif?pub-status=live)
and restrict XtN to being A.S.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm016.gif?pub-status=live)
Of course, (2.7) is equivalent to DN,T = T−1/2IN, with IN the identity matrix of order dim XN. We exclude omitted regressors of larger order, e.g., tp with p > 0, because they yield unbounded ytM dominated by ANXtN, which would clearly reveal the inadequacy of XtM with large enough T.
Further, the two series yt and Xt must be asymptotically orthogonal, meaning that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm017.gif?pub-status=live)
Finally, to keep the focus on the incorrect regressor situation, we assume that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm018.gif?pub-status=live)
In summary, our assumptions concerning (1.1) are (2.1), (2.2), and (2.5)–(2.9).
2.1. Consequences of (2.1), (2.8), and (2.9) for yt and ytM
First we note that, when Xt contains an entry equal to 1 for all t, then the corresponding scaling factor in DX,T can be taken to be T−1/2, and so (2.8) yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm014.gif?pub-status=live)
In this sense, yt in (1.1) can be thought of as an asymptotically mean zero process. A similar result holds for the disturbances ytM = ANXtN + yt of the misspecified model (1.2); see Section 4.
Now we establish the asymptotic stationarity of the ytM. From the requirement (2.7) that XtN be A.S. and from (2.1) and (2.8), for each k,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm015.gif?pub-status=live)
is given by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm019.gif?pub-status=live)
where GyM(λ) = ANGNN(λ)AN′ + Gy(λ). From (2.9), we have γ0M > 0. (The term γ0y can be zero.)
Finally, we note that, except in special situations such as that of Section 7.2, the disturbances and regressors in (1.2) will be asymptotically correlated, meaning
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm016.gif?pub-status=live)
for some k, which will usually cause ATM(θ) defined in (1.3) to be biased asymptotically for some θ; see Theorem 4.1.
2.2. Sufficient Conditions for (2.1) and (2.8)
The properties (2.1) and (2.8) hold under reasonably general assumptions on yt and Xt. The verification of (2.8) for a common type of stochastic regression model is discussed in Section 6.1.1. Here we consider the case in which yt is weakly stationary with mean zero and Xt is nonstochastic with Γ0X > 0. Then, for almost sure convergence in (2.1) and (2.8), it suffices to have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm017.gif?pub-status=live)
, with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm018.gif?pub-status=live)
for some independent white noise process εt such that supt E|εt|r < ∞ with r > 2 if yt has a bounded spectral density or, if the spectral density of yt is unbounded but square integrable, with r > 4; see Section 3.1 of Findley et al. (2001).
3. THE θ-PARAMETERIZATION OF ARMA MODELS
Three features of our ARMA model situation may be new to readers not familiar with the vein of research literature of which the papers by Pötscher (1987, 1991) are representative: (a) the disturbances ytM, 1 ≤ t ≤ T are not required to have means or covariances but only the asymptotic stationarity property; (b) no ARMA model is assumed to be correct in the sense of being able to exactly model the asymptotic lagged second moment sequence (2.10); (c) the ARMA coefficients of a model envisioned as φ(L)ytM = α(L)at are replaced by the innovations filter θ = (1,θ1,θ2,…) defined by the property that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm019.gif?pub-status=live)
satisfies θ(z) = φ(z)/α(z) for |z| < 1. In this section, we provide some orienting discussion and examples.
We assume that α(z) ≠ 0 for all |z| ≤ 1, i.e., that the model is invertible. When ytM is weakly stationary with mean zero and defined for all t, then there always exists a weakly stationary series at = at(θ) such that the preceding ARMA model formula holds, namely,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm020.gif?pub-status=live)
. When ytM is only A.S. and defined only for t ≥ 1, we define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm021.gif?pub-status=live)
, t ≥ 1. This series is A.S. with asymptotic lag k second moment given by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm022.gif?pub-status=live)
, with GyM(λ) as in (2.10); see (ii) of Proposition D.1 in Appendix D. We would call the θ-model correct if the white noise property, γka(θ) = 0 for k ≠ 0, obtains or, equivalently, if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm023.gif?pub-status=live)
for all k for some σ2 > 0. However, our theorems do not require any model for ytM, t ≥ 1 to be correct in this sense.
For subsequent discussions, it will be useful to have in mind the θ's of some simple ARMA models. As was indicated in Section 1, a white noise model has θ = (1,0,0,…). For the invertible ARMA(1,1) model, (1 − φL)ytM = (1 − αL)at, with |α|,|φ| < 1, one has θj = αj−1(α − φ), j ≥ 1. For AR(1) and MA(1) models, we have θ = (1,−φ,0,0,…) and θ = (1,α,α2,…), respectively.
Model parameterization by θ is useful because the θ's that are determined by likelihood-maximizing ARMA coefficients have uniquely defined large-sample limits in situations where the ARMA coefficients themselves do not, because of common zeroes in limiting AR and MA polynomials. For example, when an ARMA(1,1) model is fitted to white noise, the sequence of maximum likelihood pairs (φT,αT) has multiple limit (or cluster) points, all on the line {(α,α) : |α| ≤ 1}; see Hannan (1982). However, when φ = α for an ARMA(1,1) model, then θ = (1,0,0,…), and so this is the only limit point of the filter sequence θT defined by the maximum likelihood estimates φT, αT. That is, θT → θ a.s. coordinatewise, i.e., θjT → θj a.s., j ≥ 0.
As in the preceding examples, the coordinates of θ are always continuous functions of the ARMA coefficients. The converse holds only if the ARMA model is identifiable, i.e., the AR and MA polynomials have no common zero; also see the Appendix of Pötscher (1991) for additional background on the θ-parameterization. (Pötscher's parameter is the coefficient sequence of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm024.gif?pub-status=live)
. The relationship between θ and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm025.gif?pub-status=live)
is continuous and invertible; see Section 3 of Findley, Pötscher, and Wei, 2004.)
To obtain the uniform convergence and continuity properties needed to establish the results indicated in the Introduction, ARMA(p,q), model coefficient estimation is restricted to compact sets of AR and MA coefficient vectors whose polynomials have all zeroes in {|z| ≥ 1 + ε} for some ε > 0. Such sets specify compact sets Θ of the type discussed in Appendix A.
4. UNIFORM CONVERGENCE OF GLS ESTIMATES
We now present a fundamental convergence property of the ATM(θ) defined in (1.3). A generalized inverse is to be used in (1.3) when the inverse matrix fails to exist. This can (with probability one when XtM is stochastic) only happen for a finite number of T values, because of (2.6) and (iv) of Proposition D.1 in Appendix D. For any matrix M, define ∥M∥ = λmax1/2(MM′), with λmax(·) denoting the maximum eigenvalue. If M is a vector with real coordinates m1,…,mn, then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm026.gif?pub-status=live)
.
Partition
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm027.gif?pub-status=live)
analogously to (2.4), i.e.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm028.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm029.gif?pub-status=live)
, etc. For θ from an invertible model, define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm020.gif?pub-status=live)
In Appendix E, we prove the theorem that follows.
THEOREM 4.1. Let Θ be a compact set of models as described in Appendix A. Under the assumptions (2.1), (2.2), and (2.5)–(2.8), we have, uniformly on Θ,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm021.gif?pub-status=live)
The function CNM(θ) is continuous on Θ and thus bounded there, maxθ∈Θ∥CNM(θ)∥ < ∞.
For a given θ, limT→∞ (ATM(θ) − AM)T −1/2DM,T−1 = ANCNM(θ) is called the asymptotic bias characteristic of ATM(θ) for AM. It is nonzero for some θ if ΓkNM ≠ 0 for some k, i.e., if the series ANXtN and XtM are asymptotically correlated. When DM,T = T −1/2, then ANCNM(θ) is the asymptotic bias of ATM(θ) for AM. Omitted variable bias is a fundamental modeling issue; see, e.g., Stock and Watson (2002, pp. 143–149). Section 7 will show that, when ANCNM(θ) varies with θ, there is usually an optimal value of ANCNM(θ) for one-step-ahead forecasting that is determined by the θT sequence of (1.4).
If XtM has one or more coordinates that are A.S., then for any
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm030.gif?pub-status=live)
that differs from AM only in these coordinates we have, uniformly on Θ,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm022.gif?pub-status=live)
This reveals the important fact that the asymptotic bias characteristic associated with an alternative omitted-regressor decomposition,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm031.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm032.gif?pub-status=live)
, differs from the right-hand side of (4.2) by a term that is independent of θ.
Except in special situations, e.g., when the omitted regressors are precisely known, there is always ambiguity concerning XtN and AM. However, it is useful to note that if a coordinate Xi,tM of XtM is constant with value one, then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm033.gif?pub-status=live)
can be assumed to be zero: by defining
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm034.gif?pub-status=live)
to differ from AM in that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm035.gif?pub-status=live)
replaces AiM, and by defining
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm036.gif?pub-status=live)
, one obtains
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm037.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm038.gif?pub-status=live)
. Then, for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm039.gif?pub-status=live)
we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm040.gif?pub-status=live)
.
5. UNIFORM ASYMPTOTIC STATIONARITY OF FORECAST ERRORS
We consider sample second moments of the errors of the one-step-ahead forecasts Wt|t−1M(θ,θ*,T) from (1.6). For 1 ≤ t ≤ T, the forecast errors Wt − Wt|t−1M(θ,θ*,T) are observable and equal to Wt[θ] − ATM(θ*)XtM[θ], which yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm023.gif?pub-status=live)
Thus, setting Ut(T) = [yt T1/2DM,TXtM XtN]′, 1 ≤ t ≤ T and βT(θ*) = [1 (AM − ATM(θ*))T−1/2DM,T−1 AN], we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm024.gif?pub-status=live)
Let Θ* be a compact set in the sense of Appendix A. For β(θ*) = [1 −ANCNM(θ*) AN], Theorem 4.1 yields
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm025.gif?pub-status=live)
This fact and the properties of the Ut(T) array described in Appendix C lead to the following theorem, which is proved in Appendix E. Define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm026.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm027.gif?pub-status=live)
For any Θ,Θ*, let Θ × Θ* denote the Cartesian product set {(θ,θ*) : θ ∈ Θ,θ* ∈ Θ*} and define convergence (θT,θ*T) → (θ,θ*) in Θ × Θ* to mean θjT → θj and θj*T → θj* for all j ≥ 0.
THEOREM 5.1. Let Θ and Θ* be compact sets of models as described in Appendix A. Under the assumptions (2.1), (2.2), and (2.5)–(2.8), the forecast-error arrays Wt − Wt|t−1M(θ,θ*,T), 1 ≤ t ≤ T are continuous on Θ × Θ* and also jointly uniformly A.S. there. Specifically, for each k = 0,±1,…, as T → ∞, with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm028.gif?pub-status=live)
for GM,θ*(λ) as in (5.5), the limits
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm029.gif?pub-status=live)
hold uniformly a.s. on Θ × Θ*. Further, the functions ΓkM(θ,θ*) are continuous and uniformly bounded on Θ × Θ*. Also, from (5.7) and (5.1), for given θ and θ*, the values of ΓkM(θ,θ*) depend only on the values of the series AXt, XtM and yt = Wt − AXt, not on the specification of the compensating regressor XtN in decompositions AXt = AMXtM + ANXtN (see Sect. 4).
Theorem 5.1 shows that the quantities Γ0M(θ,θ*) are of special interest because they describe limiting average squared one-step-ahead forecast errors. With
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm030.gif?pub-status=live)
(5.5) yields the decomposition
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm031.gif?pub-status=live)
By specializing the argument used to establish Theorem 5.1, γ0y(θ) is seen to be the limiting average squared error of the θ-model's one-step-ahead forecast of Wt when XtM = Xt. Similarly, using (4.2), the final quantity in (5.9) is seen to be the limit of the average of the squares of one-step-ahead forecast errors of the regression-function error array AXt − ATM(θ*)XtM, 1 ≤ t ≤ T,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm032.gif?pub-status=live)
It follows from the results for k = 0 in Theorem 5.1 by standard arguments (see Pötscher and Prucha, 1997, Ch. 3 and Lem. 4.2) that the conditional maximum likelihood estimators θT of (1.4) converge a.s. to the compact set Θ0 of minimizers of Γ0M(θ,θ) over Θ,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm033.gif?pub-status=live)
That is, on a set of realizations of the random variables in (1.1) with probability one, the limit point of each (coordinatewise) convergent subsequence of θT,T ≥ 1 belongs to Θ0. (So if there is a unique minimizer θ, then θT → θ a.s.) Equivalently, in terms of the l1-norm (see Appendix A), limT→∞ minθ∈Θ0∥θT − θ∥1 = 0 a.s.
Similarly, the conditional maximum likelihood estimators θ*T associated with ATM(θ*) for fixed θ* ∈ Θ converge a.s. to the set of minimizers of Γ0M(θ,θ*), which usually does not include θ*; see Section 7.1.1.
6. EXTENSION TO ARIMA DISTURBANCE MODELS
Now suppose the observed data are
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm041.gif?pub-status=live)
, 1 − d ≤ t ≤ T from a time series of the form
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm042.gif?pub-status=live)
to which a model of the form
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm043.gif?pub-status=live)
is being fit. Suppose also that it has been correctly determined that the disturbances
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm044.gif?pub-status=live)
require “differencing” with an operator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm045.gif?pub-status=live)
, whose zeroes are on the unit circle, to obtain residuals for which an ARMA model can be considered. The resulting model is called a regARIMA model for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm046.gif?pub-status=live)
. Such models are extensively used for seasonal time series in the context of seasonal adjustment (see Findley et al., 1998; Peña, Tiao, and Tsay, 2001), often with δ(L) = (1 − L)(1 − Ls),s = 4,12. We assume that (2.1), (2.2), and (2.5)–(2.8) hold for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm047.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm048.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm049.gif?pub-status=live)
, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm050.gif?pub-status=live)
and that XtM is a subvector of Xt. For any 1 ≤ t ≤ T, because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm051.gif?pub-status=live)
, for given θ and θ* a natural one-step-ahead forecast for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm052.gif?pub-status=live)
is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm053.gif?pub-status=live)
, with Wt|t−1M(θ,θ*,T) defined by (1.6). This leads to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm054.gif?pub-status=live)
for 1 ≤ t ≤ T and therefore to forecast-error limiting results as in Theorem 5.1 with the same functions ΓkM(θ,θ*).
6.1. Forecasting a Stochastic Regressor to Impute Values for Late Survey Responders
We briefly consider an application involving regARIMA models with stochastic regressors. Section 3.3 of Aelen (2004) provides an interesting one-step-ahead forecasting application involving a variety of seasonal time series
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm055.gif?pub-status=live)
whose values come from enterprises that report economic data to Statistics Netherlands a month late, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm056.gif?pub-status=live)
includes the sum of the values for month t from all enterprises of the same type that report on time, i.e., in the desired month, and sometimes also lagged values of these sums. Thus
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm057.gif?pub-status=live)
is stochastic. In conjunction with the following discussion of distributed lag models, Theorem 5.1 and Theorem 7.1 in Section 7 provide theoretical support for Aelen's use of the regARIMA model GLS estimation and one-step-ahead forecasting procedures of X-12-ARIMA (Findley et al., 1998) to obtain Statistics Netherlands' imputed value for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm058.gif?pub-status=live)
in the month in which
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm059.gif?pub-status=live)
becomes available.
6.1.1. A Class of Distributed Lag Models Satisfying the Assumptions of Theorem 5.1.
After differencing, Aelen's model becomes a distributed lag model with regressors and correlated disturbances that are both treated as stationary. We consider a broad class of such models. Suppose that Wt and Zt are jointly covariance stationary variates with zero means and that the spectral density matrix of Zt is Hermitian positive definite at all frequencies. Then, when the autocovariance sequence ΓkV of Vt = [Wt Zt′]′ satisfies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm060.gif?pub-status=live)
, there exist coefficients Ak satisfying
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm061.gif?pub-status=live)
such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm062.gif?pub-status=live)
holds, with EytZt−k′ = 0, k = 0,±1,…; see Theorem 8.3.1 of Brillinger (1975). For any m,n ≥ 0, setting XtM = [Zt+n′ … Zt−m′]′, XtN = [sum ]k≠−n,…,mAkZt−k, AM = [A−n … Am], and AN = 1 leads to (1.1) and (1.2) having the form of a distributed lag model with stationary disturbances (see, e.g., Stock and Watson, 2002) and to the assumptions of Theorem 5.1 holding under Gaussianity or weaker assumptions on Vt; see Theorem IV.3.6 of Hannan (1970).
7. OPTIMALITY OF GLS
Because of the uniform convergence and continuity results established in Theorem 5.1, for any compact Θ as described in Appendix A, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm034.gif?pub-status=live)
and, for any fixed θ* ∈ Θ,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm035.gif?pub-status=live)
In Appendix E, we establish the theorem that follows.
THEOREM 7.1. Let Θ be a compact set as described in Appendix A and suppose that (2.1), (2.2), and (2.5)–(2.8) hold. Then for any fixed θ* ∈ Θ,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm036.gif?pub-status=live)
with equality holding if and only if a minimizer θ* of Γ0M(θ,θ*) over Θ is always a minimizer of Γ0M(θ,θ),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm037.gif?pub-status=live)
and, simultaneously, the asymptotic bias characteristic of ATM(θ*) as an estimator of AM coincides with that of ATM(θ*),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm038.gif?pub-status=live)
As a consequence, strict inequality obtains in (7.3) if and only if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm039.gif?pub-status=live)
holds for every minimizer θ of Γ0M(θ,θ) over Θ. For the maximum likelihood estimators θT of (1.4), this condition implies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm040.gif?pub-status=live)
Conversely, if Γ0M(θ,θ) has a unique minimizer θ, then (7.7) implies (7.6).
Unless θ* is a minimizer of Γ0M(θ,θ), we expect that both minθ∈Θ Γ0M(θ, θ) < Γ0M(θ*,θ*) and ANCNM(θ*) ≠ ANCNM(θ*) will hold except in quite special situations, the only one known to us being when ANCNM(θ*), and therefore also Γ0M(θ, θ*), does not depend on θ*. In Section 7.1, this is shown to occur with AR(1) models for ytM only in a singular situation. Otherwise θ* is unique. Whenever θ* is unique, failure of (7.5), which implies minθ∈Θ Γ0M(θ, θ) < Γ0M(θ*,θ*) and θ* ≠ θ*, also yields Γ0M(θ*,θ*) < Γ0M(θ*, θ*).
Model sets Θ usually include the white noise model θ* = (1,0,0,…) as a degenerate case. Hence the conclusions of Theorem 7.1 are generally applicable to OLS as an alternative to GLS. They indicate the following optimality property of GLS: In conjunction with maximum likelihood estimation of θ, asymptotically, OLS estimation is never better than GLS estimation for one-step-ahead forecasting. When the regressor is underspecified and ANCNM(θ) is nonconstant, OLS will typically have greater average mean square error than GLS, for large enough T, because of its asymptotic bias characteristics being different from that of GLS.
Thursby (1987) provides comparisons of OLS and GLS biases when yt is known to be independent and identically distributed (i.i.d.) (white noise), dim XtM = 2, dim XtN = 1, the coordinates of Xt are correlated first-order AR processes, and the loss function is the posterior mean squared bias associated with a prior for the parameters that determine the covariance structure between XtN and XtM. With the aid of numerical integrations for the GLS quantities, he establishes that, depending on the choice of the autocovariance structure of XtM, the mean squared asymptotic bias of GLS is sometimes less and sometimes greater than that of OLS. Theorem 7.1 shows that, for either outcome, GLS has an asymptotic advantage over OLS for one-step-ahead forecasting.
7.1. Examples Involving AR(1) Models and dim XtM = dim XtN = 1
The condition (7.5) is the easiest to investigate, because, for AR models, θ* is the solution of a linear system of equations. For simplicity, we consider only the case in which dim XtM = dim XtN = 1 and a first-order AR model, i.e., θ = θ(φ) = (1,−φ,0,0,…), is used for the disturbance series ytM in (1.2). From (5.8) and (5.9), this leads to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm041.gif?pub-status=live)
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm063.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm064.gif?pub-status=live)
. Also, with θ* = (1,−φ*,0,…), the CNM(θ*) component of BNM(θ*) is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm065.gif?pub-status=live)
When
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm042.gif?pub-status=live)
the derivative of CNM(θ*) is nonzero on −1 < φ* < 1 and CNM(θ*) is strictly monotonic; see Section 6.3 in the paper by Findley (2005), whose derivation also shows that the unique θ* = (1,−φ*,0,…) minimizing (7.8) is the lag one autocorrelation of GM,θ*(λ) in (5.5),
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm043.gif?pub-status=live)
There is no such simple formula for φ minimizing Γ0M(θ,θ) because the critical point equation for φ provides φ as a zero of a polynomial of degree five in general. However, from strict monotonicity of CNM(θ*(φ*)), if φ* ≠ φ* then (7.5) fails, and therefore strict inequality holds in (7.3) by Theorem 7.1. For the OLS choice, φ* = 0, when CNM(θ*) = CNM, (7.10) shows that φ* ≠ 0 (except possibly at a single value of (AN)2), when either γ1y or ΔNM = Γ1NN + (CNM)2Γ1MM − CNM(Γ1NM + Γ−1NM) is nonzero, which will usually be the case. A periodic Xt satisfying (7.9) and ΔNM ≠ 0 is given in Section 7.1.2.
When (7.9) fails, CNM(θ*) = CNM = Γ0NM/Γ0MM for all θ*, and so equality holds in (7.3).
7.1.1. The Inferiority of White Noise Modeling with OLS when φ* ≠ 0.
If Θ is a compact model set containing the AR(1) models θ = θ(φ), then Γ0M(θ*,θ*) ≤ Γ0M(θ*(φ*),θ*). So, under (7.9) and φ* ≠ φ*, we have, from (7.3), that minθ∈Θ Γ0M(θ, θ) ≤ Γ0M(θ*, θ*) < Γ0M(θ*, θ*). Thus, for θ* = (1,0,0,…), it follows from (7.1) and (7.2) that when φ* ≠ 0, using OLS estimation of AM with the white noise model for ytM leads to asymptotically worse one-step-ahead forecasts than GLS with (1.4), for any such model set Θ.
7.1.2. Periodic Xt and an Example of ΔNM.
The trading day and holiday regressors discussed in Findley et al. (1998), Bell and Hillmer (1983), and Findley and Soukup (2000) are effectively periodic functions; i.e., Xt+PM = XtM holds for all t, for rather large periods P (e.g., 12 × 28 = 336 months for trading day regressors, 12 × 19 = 228 months for some lunar holiday regressors, more for other holidays, e.g., Easter). The simplest holiday regressors are one-dimensional and specify that the effect of the holiday is the same for each day in some interval near the holiday, a dubious but simplifying assumption. For such regressors, the compensating XtN can be assumed to be one-dimensional and have the same period.
Every regressor of period P has a Fourier representation [sum ]jαjcos(2πjt/P) + βjsin(2πjt/P) with at most P nonzero coefficients, which are uniquely determined linear functions of P consecutive values of the regressor; see Section 4.2.3 of Anderson (1971). To give a more complete analysis of (7.3) for the function (7.8), we consider a simplified period P = 4 regressor XtM having the representation XtM = a1M cos(π/2)t + a2M(−1)t, with a1M, a2M ≠ 0, for which XtN = a1N cos(π/2)t + b1N sin(π/2)t, with a1N, b1N ≠ 0. Thus Xt = [XtM XtN]′ = α1cos(π/2)t + α2(−1)t + β1sin(π/2)t, where α1 = [a1M a1N], α2 = [a2M 0], and β1 = [0 b1N]. Consequently, ΓkX = ½α1′α1cos(π/2)k + α2′α2(−1)k + ½β1′β1sin(π/2)k, k = 0,±1,…, and GX(λ) is piecewise constant with upward jumps at λ = ±π/2,π; see Anderson (1971, p. 581).
For this Xt, the left-hand side of (7.9) has the value −a1Ma1N(a2M)2, and so (7.9) holds. Further, CNM = a1Ma1N{(a1M)2 + 2(a2M)2}−1 and ΔNM = −(a2MCNM)2. Strict inequality holds in (7.3) for OLS estimation except when γ1y > 0 and (AN)2 = γ1y(a2MCNM)−2, in which case φ* = 0 = φ*.
7.2. Regarding Asymptotic Efficiency in the Sense of Grenander (1954)
Here we restrict attention to nonrandom regressors Xt in (1.1) whose components are polynomials, periodic functions, or realizations of stationary processes with continuous spectral densities and with convergent sample second moments. The disturbance process yt is assumed to be a mean zero stationary process with the last-mentioned properties. Grenander (1954) considers the correct regressor case and calls the OLS estimates
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm066.gif?pub-status=live)
asymptotically efficient if limT→∞ DT−1E {(AT − A)′(AT − A)}DT−1 is minimal (in the ordering of symmetric matrices) among all linear, unbiased estimates AT of A. For this situation, his result, given on p. 244 of Grenander and Rosenblatt (1984), is that OLS is efficient if and only if the spectral distribution function GX(λ) has at most dim Xt jumps and the sum of the ranks of the jumps GX(λ+) − GX(λ+), 0 ≤ λ ≤ π is equal to dim Xt. These conditions are not satisfied, and OLS is not efficient, for most of the regressors discussed in Section 7.1.2, including the calendar effect regressors and the period four regressor with bN ≠ 0; see Chapter 7.7 and case (1) on p. 253 of Grenander and Rosenblatt (1984): usually, the number of terms in the Fourier representation of Xt, and thus also the number of jumps in GX(λ), exceeds dim Xt.
To be able to apply Grenander's result to our underspecified regression situation, assume that XtM and ytM have the properties hypothesized previously for Xt and yt. Thus XtN has a continuous spectral density and so cannot have periodic components. If we consider XtM having only polynomial and periodic components, then XtN and XtM are asymptotically orthogonal; see Section 6.1 of Findley (2005). This implies ANCNM(θ*) = 0 for all θ*, resulting in equality in (7.3) always, because Γ0M(θ,θ*) does not depend on θ*.
On the other hand, with regressors in XtM that are realizations of stationary processes, if ANCNM(θ*) is nonzero, then the analogue for ATM(θ*) of Grenander's efficiency measure fails by being infinite, because some entries of (ATM(θ*) − AM)DM,T−1 will have order T1/2; see (4.2).
Thus this concept of efficiency is not useful in our context.
8. EXTENSIONS AND RELATED RESULTS
From their connection to one-step-ahead forecast error filters, it is not very surprising that GLS estimates of regARMA and regARIMA models have an optimality property for one-step-ahead forecasting. Yet a systematic investigation of the topic has been lacking. A pleasingly simple result, such as Theorem 7.1's connection of optimality with asymptotic bias characteristics, seems possible only for the incorrect regressor case. Indeed, if asymptotic efficiency results are indicative, the correct regressor case will be quite complex. In this case, when the ARMA model for yt is incorrect, GLS can be more or less efficient than OLS; see Koreisha and Fang (2001). Even when the ARMA model is also correct, the analysis and examples of Grenander and Rosenblatt (1984) and of Section 7.2 show, for nonstochastic regressors, that OLS is asymptotically efficient only for a limited range of relatively simple regressors.
For any fixed θ*, in the incorrect nonstochastic regressor case, a referee conjectures that, under additional assumptions and with the aid of a result like Theorem 4.1 of West (1996), it can be shown that the limit as T → ∞ of the variance of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm067.gif?pub-status=live)
does not depend on θ*.
So far, we have only provided asymptotic results for the most simply defined GLS estimates, which are obtained by truncating the infinite-past forecast error filters and using conditional maximum likelihood estimation of the ARMA model. Section 2.4 of Findley (2005) and (d) of Lemma 10 of Findley (2005) reveal that the same limits are obtained if the errors of the finite-past one-step-ahead forecasts discussed in Newton and Pagano (1983) are used to define GLS estimates in conjunction with unconditional maximum likelihood estimation of the ARMA model. (Analogous GLS estimates from AR models were considered in Amemiya, 1973.) See Section 9 of the technical report Findley (2003) for additional details, including details about how to weaken the assumptions on XtM to include the frequently used intervention variables of Box and Tiao (1975). These decay exponentially to zero and so have weight one in DM,T, causing (2.5) to fail. Also, with the restriction to measurable minimizers θT discussed in Findley et al. (2001, 2004), in the case of nonstochastic Xt, all almost sure convergence results hold with convergence in probability when convergence in (2.1) holds only in this weaker sense.
Findley (2003) also shows how to use the results of Appendix D to generalize Theorem 5.1 to the case of multi-step-ahead forecast errors and to establish the convergence of θ-parameter estimates that minimize average squared multi-step-ahead forecast errors (allowing for ytM the more comprehensive model classes of Findley et al., 2004).
Findley (2005) uses the results of Theorems 4.1 and 7.1 to obtain formulas and GLS optimality results for the limiting average of squared out-of-sample (real time) forecast errors of regARIMA models under assumptions on the regressors Xt that are slightly more restrictive than those of Section 2 but are satisfied by all of the specific regressor types we have mentioned. The limit formulas are the same as those of the present paper when XtM is A.S. Empirical results are available from the author showing that GLS usually leads to better one-step-ahead out-of-sample forecasting performance than OLS for a suite of monthly series that are modeled with trading day and Easter holiday regressors by the U.S. Census Bureau for the purpose of seasonal adjustment.
APPENDIX A. Compact θ-Sets for Estimation
For each ε > 0 and integer pair p,q ≥ 0, we define Θp,q,ε to be the set all of θ = (1,θ1,θ2,…) from invertible ARMA(r,s) models with r ≤ p, s ≤ q such that the zeroes of the minimal degree AR and MA polynomials φ(z) and α(z) such that θ(z) = φ(z)/α(z) all belong to {|z| ≥ 1 + ε}. Every sequence θT = (1,θ1T,θ2T,…), T = 1,2,… in Θp,q,ε has a subsequence θS(T) that converges coordinatewise to some θ ∈ Θp,q,ε, i.e., θjS(T) → θj, j ≥ 1. Thus Θp,q,ε is compact for coordinatewise convergence. Further, for 0 ≤ ε0 < ε, the sums
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm068.gif?pub-status=live)
converge uniformly on Θp,q,ε; i.e.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm069.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm070.gif?pub-status=live)
. See Lemmas 2 and 10 of Findley (2005) for these and other properties mentioned. Our uniform convergence results that are presented subsequently follow from these facts as do some other important properties. First, the functions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm071.gif?pub-status=live)
are continuous on −π ≤ λ ≤ π and uniformly bounded and bounded away from zero on Θp,q, ε:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm072.gif?pub-status=live)
Second, if a sequence θT, T = 1,2,… in Θp,q,ε converges coordinatewise to some θ, then it also converges in the stronger sense that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm073.gif?pub-status=live)
whenever 0 ≤ ε0 < ε. In particular, the topology of coordinatewise convergence on Θp,q, ε coincides with that of the l1-norm
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm074.gif?pub-status=live)
.
Our theorems apply to any compact Θ for which Θ ⊆ Θp,q,ε holds, for some ε > 0 and p,q ≥ 0. A typical Θ would arise from constraints on the zeroes of the AR and MA polynomials of the kind of ARMA model of interest.
APPENDIX B. Scalable Asymptotic Stationarity
Under the data assumptions made in Section 2, Xt and yt in (1.1) together form a multivariate sequence that is S.A.S., a property we now consider in some detail. Let Ut, t ≥ 1 be a real-valued column vector sequence that is S.A.S. and let IU denote the identity matrix of order dimU, the dimension of Ut. Thus there is a decreasing sequence D1 ≥ D2 ≥ … of positive definite diagonal matrices, for which DT [nearr ] 0 and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm044.gif?pub-status=live)
hold, such that, for each k = 0,±1,…, the limits
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm045.gif?pub-status=live)
exist (finitely). The properties (B.1) and (B.2) yield limT→∞ DTUT−j = 0 a.s., j ≥ 0. For example, when j = 0, as T → ∞,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm075.gif?pub-status=live)
converges a.s. to Γ0U − Γ0U = 0, whence DTUT → 0 a.s. Further, DT [nearr ] 0 leads to limT→∞ DTU1+j = 0 a.s. for all j ≥ 0.
Without a formal name, this generalization of stationarity was introduced for regressors in Grenander (1954) to encompass a variety of nonstochastic regressors, including polynomials. (Our notation is the inverse of his, using DT where he uses DT−1. He only requires the diagonal elements of Γ0U to be positive, which is the nature of (2.9) for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm076.gif?pub-status=live)
. Our requirement (2.10) for XtM is stronger.) Grenander shows that the real matrix sequence ΓkU, k = 0,±1,… has a representation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm077.gif?pub-status=live)
in which GU(λ) is a Hermitian-matrix-valued function such that the eigenvalues of increments GU(λ2) − GU(λ1), λ2 ≥ λ1, are nonnegative, or, equivalently, the increments are Hermitian nonnegative; see also Grenander and Rosenblatt (1984), Chapter II of Hannan (1970), and Chapter 10 of Anderson (1971). For example, if Ut = tp, p ≥ 0, then, with DT = T−(p+1/2), one obtains ΓkU = (2p + 1)−1 for each k, and so GU(λ) can be taken to be 0 for λ < 0 and (2p + 1)−1 for λ ≥ 0. Grenander (1954) and Grenander and Rosenblatt (1984, Ch. 7) verify the joint scalable asymptotic stationarity property for regressors whose entries Xi,t are polynomials, linear combinations (perhaps infinite) of sinusoids, i.e., of cos ωjt and/or sin ωjt, for various 0 ≤ ωj ≤ π (scaling sequence T−1/2), and, finally, products of polynomials tp with linear combinations of sinusoids (scaling sequence T−p−1/2). By contrast, exponentially increasing regressors, e.g., Ut = ebt with b > 0, are not S.A.S. because (B.1) fails for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm078.gif?pub-status=live)
; see Hannan (1970, p. 77).
APPENDIX C. Vector Array Reformulation of Assumptions
The following reformulation of our assumptions (2.1), (2.2), and (2.5)–(2.9) concerning yt and Xt will enable us to make use of the results of Findley et al. (2001, 2004). The vector array
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm046.gif?pub-status=live)
is A.S. More specifically, for each k = 0,±1,…,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm047.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm048.gif?pub-status=live)
with Γ0MM > 0 and ANΓ0NNAN′ > 0. Further, from Appendix B,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm049.gif?pub-status=live)
Because of (C.3), the spectral distribution matrix of the ΓkU sequence has the block diagonal form GU(λ) = blockdiag(Gy(λ),GX(λ)).
APPENDIX D. Uniform Convergence Results for Filtered A.S. Arrays
The proposition and lemma that follow are formulated for proving some of the more general results indicated in Section 8.
PROPOSITION D.1. Let Ut(T), 1 ≤ t ≤ T be an A.S. column vector array satisfying (C.4) and let GU(λ) denote the spectral distribution matrix of the asymptotic lagged second moments matrices ΓkU defined by (C.2). Let H and Z be sets of filters η = (η0,η1,…) and ζ = (ζ0,ζ1,…) such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm079.gif?pub-status=live)
resp.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm080.gif?pub-status=live)
converges uniformly on H resp. Z. Then the filter output arrays
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm081.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm082.gif?pub-status=live)
, 1 ≤ t ≤ T, η ∈ H, ζ ∈ Z have the following properties:
(i) limT→∞ supη∈H∥T−1/2U1+j,T [η]∥ = limT→∞ supη∈H∥T−1/2UT−j,T [η]∥ = 0 a.s. for all j ≥ 0, and analogously for Ut [ζ](T).
(ii) As T → ∞,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm083.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm084.gif?pub-status=live)
, for k = 0,±1,….
(iii) The functions ΓkU(η,ζ) are bounded on H × Z,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm085.gif?pub-status=live)
and are jointly continuous in η,ζ in the sense that, if ηT ∈ H, ζT ∈ Z are such that ηT → η and ζT → ζ (coordinatewise convergence) with η ∈ H, ζ ∈ Z, then ΓkU(ηT,ζT) → ΓkU(η,ζ). Also, if Z = H, then infη∈H,−π≤λ≤π|η(eiλ)|2Γ0U ≤ Γ0U(η, η) ≤ supη∈H,−π≤λ≤π|η(eiλ)|2Γ0U.
(iv) Let H be an index set for a family of arrays Ut(η,T), 1 ≤ t ≤ T, η ∈ H such that, as T → ∞,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm050.gif?pub-status=live)
where the Γ0(η) are positive definite matrices whose minimum eigenvalues are bounded away from zero; i.e.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm051.gif?pub-status=live)
holds for some mH > 0. Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm052.gif?pub-status=live)
Proof. Parts (i)–(iii) are straightforward vector extensions of special cases of Theorem 2.1 and Proposition 2.1 of Findley et al. (2001). For (iv), it follows from (D.1) and (D.2) that, given ε > 0, for each realization except those of an event with probability zero, there is a Tε such that for T ≥ Tε the inequalities
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm086.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm087.gif?pub-status=live)
hold. Hence for these T and all η ∈ H,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408041402-92128-mediumThumb-S0266466607070430ffm088.jpg?pub-status=live)
which establishes (D.3).
We also need the following lemma, whose proof can be obtained by standard arguments, as in the proof of (5.18) of Findley et al. (2004).
LEMMA D.2. Suppose that, on a set Θ*, the sequence βT(θ*), T = 1,2,… of row vector functions converges uniformly a.s. to a bounded function β(θ*), i.e., (5.3) holds, and similarly for τT(θ*), T ≥ 1 and its limit τ(θ*). Let Ut(η,T), η ∈ H and Wt(ζ,T), ζ ∈ Z, 1 ≤ t ≤ T be families of column vector arrays of the same dimension as β(θ*) and τ(θ*), respectively, such that, for k = 0,±1,…,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm089.gif?pub-status=live)
holds for functions Γk(η,ζ) with supη∈H,ζ∈Z∥Γ0(η, ζ)∥ < ∞. Then, as T → ∞,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm090.gif?pub-status=live)
APPENDIX E. Proofs
Proof of Theorem 4.1. We have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170408041402-26927-mediumThumb-S0266466607070430ffm091.jpg?pub-status=live)
By (ii) and (iii) of Proposition D.1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm092.gif?pub-status=live)
converges uniformly a.s. to 0 and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm093.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm094.gif?pub-status=live)
converge uniformly a.s. to the continuous limits Γ0NM(θ) and Γ0MM(θ), respectively, with Γ0MM(θ) bounded below by the positive definite matrix mΘ2Γ0MM, where mΘ = minπ≤λ≤π,θ∈Θ|θ(eiλ)| > 0; see Appendix A. It follows from (iv) of Proposition D.1 that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm095.gif?pub-status=live)
converges uniformly to Γ0MM(θ)−1, which is therefore continuous (and bounded above by mΘ−2(Γ0MM)−1). Hence (ATM(θ) − AM)T −1/2DM,T−1 converges uniformly a.s. to ANCNM(θ), which is continuous on Θ and also bounded. █
Proof of Theorem 5.1. The assertions follow from (5.2) and Lemma D.2 with τT(θ*) = βT(θ*), H = Z = Θ, and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm096.gif?pub-status=live)
, for Ut−j(T) defined by (C.1), because the uniform convergence of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm097.gif?pub-status=live)
to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm098.gif?pub-status=live)
and the boundedness of ∥Γ0U(θ)∥ on Θ, which are required to apply Lemma D.2, follow from (ii) and (iii), respectively, of Proposition D.1. The uniform convergence of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm099.gif?pub-status=live)
required by the proposition is the special case ε0 = 0 in Appendix A. The fact that GU(λ) = blockdiag(Gy(λ),GX(λ)), because of (C.3), yields the form of GM,θ*(λ) in (5.5). █
Proof of Theorem 7.1. We start by establishing that, for any invertible θ and θ*, we have Γ0M(θ,θ) ≤ Γ0M(θ,θ*) with equality holding if and only if ANCNM(θ*) = ANCNM(θ). Indeed, the component of Γ0M(θ,θ*) that depends on θ* can be reexpressed in terms of the analogues of CNM(θ*) and Γ0X(θ) obtained by replacing XtN with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm100.gif?pub-status=live)
. Denoting these analogues by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm101.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm102.gif?pub-status=live)
, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm103.gif?pub-status=live)
By a standard calculation, for any C with the dimensions of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm104.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm053.gif?pub-status=live)
with equality holding in (E.1) if and only if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430ffm105.gif?pub-status=live)
.
Next, note that because Γ0M(θ,θ) and Γ0M(θ,θ*) are continuous functions of θ on Θ, they have minimizers θ, resp. θ* over Θ. From the result just established, we obtain
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm054.gif?pub-status=live)
Thus Γ0M(θ,θ) = Γ0M(θ*,θ*) holds, i.e., equality in (7.3), if and only if (7.4) and Γ0M(θ*,θ*) = Γ0M(θ*,θ*) do, and the latter is equivalent to (7.5), as was just shown.
In particular, equality in (7.3) implies the failure of (7.6) for θ = θ* satisfying (7.4). Conversely, failure of (7.6) for some minimizer θ, i.e., ANCNM(θ*) = ANCNM(θ), implies Γ0M(θ*,θ*) ≤ Γ0M(θ,θ*) = Γ0M(θ,θ), which, from (E.2), yields Γ0M(θ*,θ*) = Γ0M(θ,θ) = Γ0M(θ*,θ*), i.e., equality in (7.3). Therefore (7.6) for all θ minimizing Γ0M(θ,θ) is necessary and sufficient for strict inequality in (7.3).
From Theorem 4.1 and (5.11), it follows that the left-hand side of (7.7) is equal a.s. to the left-hand side of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170215065918315-0585:S0266466607070430:S0266466607070430frm055.gif?pub-status=live)
The assertions concerning (7.7) follow from (E.3) and the fact that, when Θ0 = {θ}, equality holds in (E.3) because θT → θ a.s., from (5.11). █