ADMISSIBLE AND NONADMISSIBLE TESTS IN UNIT-ROOT-LIKE SITUATIONS

Werner Ploberger

doi:10.1017/S0266466608080031

ADMISSIBLE AND NONADMISSIBLE TESTS IN UNIT-ROOT-LIKE SITUATIONS

Published online by Cambridge University Press: 06 September 2007

Werner Ploberger

Show author details

Werner Ploberger: Affiliation:
Washington University in St. Louis

Article contents

Abstract
1. INTRODUCTION
2. THE MAIN THEOREMS FOR THE UNIDIMENSIONAL CASE
3. APPLICATIONS
APPENDIX
References

Rights & Permissions

Abstract

This paper investigates the asymptotic behavior of tests in situations where the likelihood is locally asymptotically quadratic. Necessary and sufficient conditions are given for a test to be admissible. Even without these restrictive parametric assumptions, it is shown that certain common procedures—such as the augmented Dickey–Fuller test in cases where no deterministic trend is present or standard tests for restrictions on cointegrating relationships—are asymptotically inadmissible. These results confirm the existence of tests that dominate these classical tests for all parameters.I express my gratitude to the editors, H. Lütkepohl and especially Peter C.B. Phillips, for their help, which enormously exceeded the usual amount of support. Also I thank the referees for their helpful comments. Their contribution greatly improved the paper. All remaining errors are mine.

Type: Research Article
Information: Econometric Theory , Volume 24 , Issue 1 , February 2008 , pp. 15 - 42

DOI: https://doi.org/10.1017/S0266466608080031 [Opens in a new window]
Copyright: © 2008 Cambridge University Press

1. INTRODUCTION

This paper discusses optimality properties of certain statistical tests and makes two contributions. First we analyze a restricted parametric model. These restrictions enable us to determine the class of asymptotically optimal tests. Moreover, we show that in nonclassical situations many traditional tests for testing simple hypotheses, such as, e.g., the usual generalizations of the t-test, are not in our class and are inadmissible. Second, we use these results to show the inadmissibility of some familiar procedures, such as the Dickey–Fuller and related tests, when there is no deterministic trend in the regression or standard t-type tests for restrictions on cointegrating relations.

We begin by outlining the asymptotic testing problem for which we can exactly describe the class of all admissible tests. Let us assume that our parametric family is unidimensional, and later on we move on to discuss more general cases.

Many problems in asymptotic statistics have a specific form, namely, that the logarithm of the likelihood can be approximated by a quadratic function of the (properly normalized) parameter. Classical asymptotic analysis then rests on two crucial assumptions: first, that the linear term of the quadratic function is (asymptotically) normal; and second, that the quadratic term (the Fisher information) is asymptotically constant.

In many situations relevant for econometrics—most of them connected to problems with unit roots—these assumptions are not fulfilled. Although the log-likelihood can asymptotically be approximated by a quadratic function, the linear term in the parameter is not Gaussian, and the matrix describing the quadratic term is not a constant. The most typical example arises with the simplest autoregressive (AR) model

where u_t is normal with zero expectation and known variance. Then it is straightforward to show that the likelihood is quadratic around δ = 0. The properly normalized coefficients of this quadratic function are neither normal nor constant. The prominence of this example indicates that there should be a significant number of possible applications. These will be discussed in more detail in Section 3.

So let us denote by p_n(θ) the density corresponding to parameter θ. For simplicity, let us assume that the parameter θ is unidimensional. So, for a given true parameter θ₀, we assume that asymptotically with a suitable scaling sequence D_n ↑ ∞

where we assume that the pair (W_n,A_n) converges in distribution to a nontrivial limit (W,A). We need not discuss here the precise meaning of the approximation or convergence in (1), for which the reader is referred to Le Cam and Yang (1990), Ploberger (2004), and Jeganathan (1991, 1995). In Ploberger (2004), specific problems associated with testing

were discussed.

First of all let us analyze (1). If the log-likelihood were exactly quadratic (i.e., in (1) we would replace ∼ with =), then the likelihood only depends on (W_n,A_n). Consequently, this pair would be a sufficient statistic. In that case, we would be able to find for every test φ another test ψ depending only on (W_n,A_n) so that φ and ψ have identical power. Hence, we would only need to analyze tests depending on (W_n,A_n).

Modern statistical theory allows us to use similar reasoning in the asymptotic case. We can, however, no longer analyze the tests themselves; the theory only gives us convergence results about the power functions of the tests. Typically, the theory allows us to show that the behavior of power functions of problems like (2) is similar to a fixed “asymptotic” testing problem. The theory shows that one can approximate the power functions of tests for (2) with power functions of tests for the “asymptotic” problem up to an arbitrary accuracy.

We now briefly sketch the reason for this kind of analysis, emphasizing that the arguments in the following paragraph are heuristic. A rigorous treatment of the subject can be found in Strasser (1985).

If a test ψ is admissible for the asymptotic problem, then we know that we can find a sequence φ_n of tests for (2) such that with increasing sample size the difference between the power functions of the φ_n and the power function of ψ converges to zero. (Although the tests are defined on different sample spaces, their power functions have the same domain, namely, the parameter space!) Then the φ_n are in a certain sense asymptotically admissible. Assume one has given some “competitors” ρ_n. Then the power functions of the ρ_n can be approximated by power functions of tests ν_n for the asymptotic problem. There are compactness results available that show that the power functions of the ν_n must contain a convergent subsequence, converging to the power function of a test ν. We did assume ψ to be admissible. Hence it cannot be dominated by ν, and so it must have better power properties for some parameters. Hence, asymptotically, the same must be true for the power functions of φ_n and ρ_n. The same kind of argument applies, vice versa, for tests that are not admissible for the asymptotic problem.

This type of testing problem was analyzed for a one-dimensional parameter in Ploberger (2004), and one of the main results was that power functions of tests for (2) can either be uniformly approximated or dominated by power functions of tests for the following testing problem. Assume we have given a parametrized family Q_λ, λ ∈ R on the space

so that

We can think of this parametrized family as a statistical experiment, namely, a sample of size one of a random two-dimensional vector distributed according to a Q_λ with an unknown λ. Suppose we want to consider tests for the hypothesis λ = 0. In Ploberger (2004) a set of tests was constructed such that for every other test we can find one from our class with “better” power properties. Such a class is called “essentially complete” in Schervish (1995, p. 174) and “complete” in Strasser (1985, p. 41) and Ploberger (2004). Because the present paper heavily relies on Ploberger (2004), we will continue to use the term complete for this kind of set of tests.

Together with the limiting result, this complete class theorem characterizes all possible limits for power functions of tests for the problem (2). However, the problem is that every set covering (in our sense) a complete class is a complete class again. Hence the result from Ploberger (2004) does not allow us to directly classify tests. If we know that a test is within our class we cannot directly conclude that it is admissible, because our definition allows the “complete class” to contain additional tests.

Moreover, a test outside the complete class may be admissible. The results of Ploberger (2004) do not preclude that there exists another test (within the class) with exactly the same power function. In this paper, we show that this is in fact not the case. The main theorem implies that every test within our class is admissible and characterized by the power function. Hence we can conclude that tests not in our class are asymptotically inadmissible. So essentially we show that for every test outside our complete class one can find a test from our class that is better uniformly—i.e., it is better for all values of the parameter!

Moreover, the same paper establishes the fact that the test based on the t- or F-test-type statistic, namely, rejecting if

for some critical values U, is not within this class. In Section 3 we discuss important examples where our results show the existence of tests that dominate standard procedures uniformly. Obviously, it is also important to know how much power one can win by using the admissible tests. For the case of unit root testing, this question was analyzed in Elliot, Rothenberg, and Stock (1996).

This result has also some interesting consequences when we are testing (2) for our family (1). Suppose we have a sequence of tests ψ_n for the finite-sample problem such that their asymptotic power function converges to the power function of (3). Then the standard convergence results of decision problems (cf. Strasser, 1985) allow us to conclude that there exists a sequence of tests with power functions dominating the power functions of ψ_n uniformly!

In Section 3, we then extend these results to show the inadmissibility of certain familiar procedures.

1. The Dickey–Fuller τ-tests when no trend is present: Consider a univariate time series y_t and consider the models

and suppose we want to test whether δ = 0 against δ ≠ 0. Then it seems natural to compute the t-test statistic based on an estimator for δ and use the critical values established by Dickey and Fuller (1979, 1981).

2. Suppose we have an estimator—say,

—for a scalar parameter ρ and there exists a scaling statistic

. Let us assume that (with some normalizing sequence D_n)

[bull ]

converges in distribution to a standard normal distribution.

[bull ]

converges in distribution to some random variable σ, which is almost surely positive and not a constant.

[bull ] Asymptotically, the distributions of

and

are independent.

Although there are some other applications, the most important case where this kind of phenomenon occurs is the estimation of parameters describing cointegrating relationships. So when testing a restriction—say, ρ = 0—one is tempted to use the analogue to the Wald test. Indeed, one can easily see that in case the null is true,

converges in distribution to a standard normal distribution. Therefore it seems natural to use tests based on (5) and use one- or two-sided critical values. Here I show that these procedures are inadmissible in the two-sided case.

Here we only deal with the simplest cases of unit root testing. Clearly, the results would be more valuable if we allowed for deterministic terms. Typically, one is interested in tests invariant with respect to these terms. The advantage of this approach would be a reduction in the parameter space, which would suit us well. However, the “reduced” problem no longer satisfies (1). So a more general analysis of the problem would be beyond the scope of the paper.

2. THE MAIN THEOREMS FOR THE UNIDIMENSIONAL CASE

Tests are functions φ from the set R × R₊ to the interval [0,1] (φ = 0 means that we accept the null; φ = 1 means that we reject the null; and in between we randomize).

The power function of a test φ is the function that attaches to each λ ∈ R the value ∫φdQ_λ. The concept of a power function enables us to compare tests. A test φ is better than a test ψ if

So a better test dominates the worse test for all possible values of the parameter. In Ploberger (2004), it was established that the set C of tests consisting of the test functions defined subsequently is complete (or essentially complete according to Schervish, 1995).

These test functions are described by the following parameters: U,a,b,c and measures μ,ν subject to the following restrictions.

1. ν is a σ-finite measure on [−1,1] − {0} (i.e., no mass outside the unit interval with the number zero excluded) such that for all compact sets K ⊂ [−1,1] − {0} ν(K) < ∞ and

(in particular ∫_[−1,1] λ² dν(λ) is finite and c ≥ 0).

2. μ is defined on R − [−1,1] (i.e., the real line except the unit interval), and for all M > 1

3. Not all the numbers a,b,c,U and measures μ and ν are trivial.

4. The test has the correct size—say, α < 1—under the null.

Then define the function

Now let the class of tests be characterized by the following properties: (i) the test should reject the null for all (A,W), so that A < a; (ii) if A > a, then the test should reject if

and accept if

These parameters do not describe the test completely. In particular, we did not define what happens if A = a or if u(A,W) = U. So if these equations do not describe null sets (with respect to all Q_h), our complete class defined previously may contain some additional (not optimal) tests. This, however, is not a contradiction to the definition of a complete class—any larger set of a complete class is a complete class, too!

Remark 1. Because of our restrictions 1 and 2, in particular expressions (7) and (8), we can easily see that every function u(A,W) satisfying (9) can for every β,γ > 0 be written as

where ν′,μ′ are the restrictions of the measure μ + ν on [−β,γ] and R − [−β,γ], respectively.

Remark 2. Because

we can replace the first integral in (9) by

Additionally, we assume that the following three conditions are fulfilled.

Condition C1. For some ε > 0 we have

Condition C2. The support of the measure Q₀ (and hence the support of all the measures Q_λ) is the whole R × R₊ = {(W,A) : A > 0} (i.e., every open set of R × R₊ has positive probability under Q₀).

Condition C3. The distribution function of A is continuous (i.e., for all a Q₀([A = a]) = 0).

THEOREM 1. Suppose restriction 1 and Conditions C1, C2, and C3 hold and assume that ψ is a test that has the same or a better power function than the test φ, which is from our class and is determined by the parameters a,b,c,U,μ,ν. Then ψ is from our class, too, and determined by the same parameters.

Proof. The proof is provided in the Appendix.

It is, however, a nontrivial task to show that tests are not in our complete class. For a proof of this fact, it is not sufficient to show that the corresponding test statistic cannot be written in the form (9). One has to show that the corresponding test (or the critical set) cannot be generated as a level set of any function (9)!

Suppose now that we have given a family Q_λ,λ ∈ R and a test φ (which is a function from R × R₊ into [0,1]) satisfying the following requirements.

1. All the assumptions of the previous theorem are fulfilled.

2. Every neighborhood of the point (0,0) has positive probability with respect to Q₀.

3. There exists a constant K > 1 such that

Then the following theorem holds.

THEOREM 2. The test φ is not in our class and is therefore inadmissible.

Proof. The assumptions of this theorem are quite similar to those of Ploberger (2004), and only our second assumption here is stronger. Here we only postulate that all neighborhoods of the origin have positive mass, instead of assuming that every open set of the half plan R × R₊ has positive mass. One can easily see that this requirement is sufficient, because only the behavior of the power function for alternatives near the origin is used in the proof. █

In case the likelihoods are given by (1), one consequence of the preceding theorems is that the analogues to the “classical” Lagrange multiplier (LM), likelihood ratio (LR), and Wald tests for testing θ = θ₀ are asymptotically inadmissible. More precisely, let us denote by φ_n such a test. The “classical” tests reject when test statistics exceed the critical values. Moreover, it is an easy exercise—very similar to the standard proof of asymptotic equivalence in the stationary case—to show that differences between suitably normed and transformed test statistics and W_n²/A_n converge to zero under the null. Let us denote the asymptotic critical value for the test statistic W_n²/A_n by K and assume that K is such that the cumulative distribution function of W²/A is continuous at K. This is not a very strong restriction, because a cumulative distribution function can have at most countably many discontinuities. Let us define φ_n by

Then we can easily see that under the null the probability of the LM, LR, and Wald tests and the tests φ_n giving different results converges to zero. (So, with probability converging to one under the null, either all of them accept or all of them reject!) Hence the power functions of φ_n and the classical tests are asymptotically the same under the null.

We now have to show that the same holds true for the alternatives. Consider local alternatives of the form

In Ploberger (2004), contiguity of the probability measures corresponding to θ₀ and θ_n was established. This means that for any sequence E_n of events such that P_θ₀(E_n) → 0 P_{θ_n}(E_n) → 0, too. (Contiguity is one of the cornerstones of modern theoretical asymptotic theory. For a more detailed discussion see Le Cam and Yang, 1990; Strasser, 1985.) Because the probability of conflict between the LM, LR, and Wald tests and φ_n converges to zero under the null, the result quoted previously allows us to conclude that the probability of a conflict under the local alternatives converges to zero, too. Hence, the difference of the probabilities of rejection—which is exactly the definition of a power function—must converge to zero for n → ∞. Therefore, the power functions are the same asymptotically. We assumed that (W_n,A_n) → (W,A) in distribution under the null, and so W_n²/A_n → W²/A in distribution, too. Because the cumulative distribution function of the limiting distribution is continuous at K, the power function of the φ_n (and, consequently, the power functions of the LM, LR, and Wald tests) must converge to the power function of our test φ under the null.

To show the convergence of the power functions of the φ_n to the power function of φ at the local alternatives requires only a bit more technical effort. Let us fix the sequence of local alternatives and consider the densities

Then we can easily conclude from (1) that

in distribution. We did assume, however, that

because otherwise Q_C would not be a probability measure. Now let ε > 0 be arbitrary. Then we can find an L so that

Consequently, as [ell ]_n → [ell ] in distribution,

and as ∫[ell ]_n dP_{θ_n} = 1,

Hence we have

One can easily see that

Further, because of the convergence of (A_n,W_n) and [ell ]_n,

We have that (when using (17))

As ε + ∫φ min([ell ],L) dQ₀ ≥ ∫([ell ] − min([ell ],L)) dQ₀ + ∫φ min([ell ],L) dQ₀ ≥ ∫φ[ell ] dQ₀,we can conclude that

Because ε was arbitrary, we have shown that

This establishes the fact that the power function of the φ_n converges to the power function of φ for all local alternatives. Because we already know that φ is not within our class, we can find a test ψ with (strictly) better power properties. In Theorem 3 in Section 3 we show that in this situation it is possible to construct tests ψ_n with strictly better asymptotic power properties than the φ_n, which establishes our result, namely, that the φ_n are asymptotically inadmissible.

In view of the preceding results one is tempted to look at other “generalizations” of the classical LR test to situations with stochastic information. One such generalization is the test based on the posterior information criterion (PIC) (cf. Phillips, 1996; Phillips and Ploberger, 1996; Ploberger and Phillips, 2003). Consider tests that reject if

and accept if

where K is a constant determined by the desired significance level. These tests are all admissible. Simply consider our measures μ,ν to be the Lesbesgue measure and choose the other constants appropriately. Then it is a simple but tedious calculation to show that (12) yields a test based on

which is immediately seen to be equivalent to the test determined by (18) and (19).

It should be noted that in the “classical” case of A being constant this test is equivalent to a test based on W²/A, too. So we can view the PIC test as a generalization of the classical tests, too.

3. APPLICATIONS

The result of the previous section establishes a minimal complete class theorem for a rather large class of testing problems. Apart from the three technical conditions we only require the likelihood function to be locally asymptotically quadratic. The main point of interest seems to be that this class of admissible tests is quite different from the one for the classical problem where W is Gaussian and A constant. In this kind of situation the set of all admissible tests is well known even for multivariate parameters (Birnbaum's theorem; cf. Strasser, 1985). Each of these tests is characterized by a convex set (in the univariate case an interval) and rejects if W lies outside the interval. Hence Theorem 1 of the previous section seems to be of considerable theoretical interest. It shows that even a small change in our assumptions makes enormous differences for the class of admissible tests.

In the previous section we only dealt with one-parameter families. This fact represents a significant hurdle for a direct application of the result. However, we can apply the result to prove the inadmissibility of tests. We will show that certain tests—e.g., the Dickey–Fuller τ-test and the pseudo-t-test based on the statistic (5)—can be interpreted as tests on a unidimensional parameter. Our result then shows that there must exist a test with uniformly better power function. The drawback is that the result is nonconstructive. The only information we supply about the better test is that it has a form of (9). Nevertheless, the result may motivate researchers in the field. Efforts to find better tests are now not doomed from the start. We know that a better test exists!

Let us first analyze the Dickey–Fuller test. Unfortunately, the techniques presented here cover only the cases that do not occur that often in practice. In most cases the data contain a drift or a deterministic trend. It should be noted that especially the presence of deterministic trends changes the situation completely. I conjecture that in this case the Dickey–Fuller τ-test is admissible. The results of Müller and Elliot (2001) establish an optimality property. For a detailed analysis also see Elliot et al. (1996), Phillips and Xiao (1998), and Stock (1995). Nevertheless, I think that the results of this paper are important to the theory of testing for unit roots. It shows, in particular, that the situation is radically different from the classical one.

Let us now assume that a process y_t is defined by (4) and let us furthermore assume that the

with zero expectation and that all of the roots of the polynomial

lie outside the unit circle. Then let us consider local alternatives

Jeganathan (1991, 1995) performed a rigorous asymptotic analysis for these kinds of testing problems. In particular, he established limiting properties of these experiments. Using his results, one can easily show that in many interesting cases the necessary conditions of our theorems are fulfilled. The “elementary” case of the Dickey–Fuller test without a trend is discussed in Ploberger (2004).

In many applications, however, one wants to avoid relatively narrow parametric assumptions such as (20). Standard statistical theory of optimal testing (cf. Strasser, 1985) is based on Neyman–Pearson tests, which critically depend on densities. Hence we cannot show the admissibility of tests. Under certain circumstances, however, we can show that certain tests are not admissible in the sense that there exists a test with a uniformly better power function.

Let us now assume that P_θ is a family of probability measures parametrized by a parameter θ. We assume that the parameter can be split into two parts,

where δ should be one-dimensional, whereas the nuisance parameter β can be of arbitrary dimension. Suppose we want to test the hypothesis δ = 0.

Suppose one has an estimator—say,

—for δ; then one will be tempted to use an analogue to the usual t-test. One will construct some other statistic—say,

—so that

Then it is easy to construct a test. In the “classical” case, where

is asymptotically normal,

will be an appropriately scaled consistent estimator for the asymptotic variance. We are more interested in the nonstandard case, where the limiting distribution of the properly scaled

is nontrivial.

In some cases, we can utilize our approach analyzing the nonoptimality of tests. In the previous section, we were mainly working with the scores and the “information” (the second derivative) of the likelihood function. Here the situation is different. We start with estimators for the parameter and their accuracy.

So let us review the basic assumption of the previous section. For doing so, we must find analogues to the W_n and A_n defined earlier. We started with the model (1):

Then it is easily seen that—in case the logarithm of the likelihood is twice continuously differentiable—the maximum likelihood estimator

can be approximated in the following way:

Moreover, we can easily conclude from our assumptions that

converges in distribution, and so

would be a feasible normalizing estimator.

In our situation, however, we start with the estimators

. So let us assume that there exists an appropriate sequence of scaling factors D_n and define (keeping in mind that we want to test δ = δ₀)

Then we assume that the following four assumptions are fulfilled.

Assumption A1. The nuisance parameter β is fixed.

Assumption A2. (A_n,W_n) converges (for

) in distribution to some random variable (A,W) with distribution Q₀, which fulfills the prerequisites of the previous section.

Assumption A3. For local alternatives

(i.e.,

), the distributions of (A_n,W_n) converge to a measure Q_c with

Assumption A4. The mapping c → Q_c is continuous (with respect to the weak topology of measures). This means that c_n → c implies Q_{c_n} → Q_c weakly.

Essentially, we assume that the statistic (A_n,W_n) behaves “the same way” as in the case of a unidimensional parameter. We will now construct some examples that show that these assumptions will be fulfilled in many cases of considerable interest. In particular, it should be noted that we do not require parametric assumptions such as, e.g., Gaussian distributions of the error terms!

1. The generalized augmented Dickey–Fuller (ADF) test with no deterministic trend. Said and Dickey (1984) investigated the ADF test for unit roots for autoregressive moving average (ARMA) processes with independent and identically distributed (i.i.d.) innovations. Lately, in Chang and Park (2002) further generalizations of the model (4) are considered (e.g., u_t are martingale differences with a very general structure, and the index p may depend on the sample size and increase to infinity). So let us consider the model (4) and define

They show that, even under the general circumstances considered in their paper (using our notation), letting

be the OLS estimator for δ and using an estimator

, where

equals the denominator in the t-test (i.e., “variance estimator”), the properly normalized distributions of

and the estimation error jointly converge in distribution to the corresponding distributions for the (simple) Dickey–Fuller test. Thie convergence holds for δ = 0 and also for the local alternatives (22) (cf. Chang and Park, 2002). So let us define the “scores” by

and denote by Q_c the limiting distribution of

The measures Q_c are defined on the set {(A,W) : A ≥ 0} ⊂ R². As mentioned before, the limiting distributions of our statistics are the same as for the Dickey–Fuller test. Hence we can conclude that the measures Q_c are the same as the asymptotic distributions for the simple random walk model, which was discussed in Ploberger (2004). So we may conclude that

Moreover, it can easily be seen that other assumptions of our main theorem are fulfilled, too. Elementary calculations show that the two-sided ADF test rejects when

exceeds the critical value—say, U. Hence the power of the ADF test against a local alternative (22) will converge to

This only holds for all those U for which Q_c({W²/A = U}) = 0. Because, however, the Q_c have densities with respect to Q₀, Q_c({W²/A = U}) > 0 implies that Q₀({W²/A = U}) > 0. But this inequality can only be true for countably many U. Hence we have convergence for all but at most countably many U.

2. Cointegrated system testing. Another possible application is the testing of restrictions on certain parameters of cointegrated systems. Let us first consider the simplest possible case, namely, a bivariate system in Phillips' triangular form:

For simplicity, let us assume that u_t,v_t are i.i.d. standard normal with zero expectation and the covariance being the identity matrix (which implies that u_t,v_t are uncorrelated). Then we can estimate γ with OLS, and our estimation error will be

Let us now suppose that we want to test γ = γ₀. Then some tedious but elementary calculations show that the asymptotic distribution of

converges under the null to

where W₁,W₂ are independent Wiener processes. Moreover, we can also consider local alternatives of the form

Then

converges in distribution to

3. Some generalizations. We can generalize the ideas of the preceding example for use in a more general context. Suppose we want to test whether a parameter

and we have an estimator

, an estimator for the “information”

, and a sequence of scaling factors D_n, so that the following properties hold true.

For local alternatives

the distributions of

, where

, converge in distribution to a (W,A) with the following properties.

The distribution of A is independent of c.
The conditional distribution of W given A is normal with expectation cA and variance A.
The distribution of A satisfies all the assumptions of our main theorem.

Many popular estimators for the cointegrating relationships have these asymptotically mixed normal properties: It is well established that the standard estimators for the cointegrating relationships are normal only after conditioning: cf. Ahn and Reinsel (1990), Phillips and Hansen (1990), Phillips and Ouliaris (1990), Saikkonen (1991), Johansen (1988, 1991), Stock and Watson (1993), and also Hamilton (1994, Sect. 19.3). Testing linear restrictions on these parameters with Wald-type tests (cf. Davidson, 1998) therefore will be covered by our theory, at least for one-dimensional restrictions.

A testing problem satisfying the preceding conditions is called locally asymptotically mixed normal (LAMN) (cf. Le Cam and Yang, 1990; Jeganathan, 1991, 1995). Apart from the economic applications quoted later, this type of model has also received much attention in the statistical literature (cf. the references cited previously).

4. Other cases. Asymptotically mixed normal families do not only occur in connection with unit root tests or cointegration. Other cases of interest in economics are given in Park and Phillips (2001) and Aït-Sahalia (1999).

So let us now assume that we have given a testing problem satisfying the preceding assumptions. We might be tempted to use the generalized t-test φ_T for our testing problem. This means we reject when the absolute value or, equivalently, the square of

is larger than the critical value. Furthermore, let us assume that this critical value has an absolute value larger than one, i.e.,

Let us additionally assume that the cumulative distribution function of the limiting distribution of our test statistic is continuous at U. This is equivalent to

Hence the cumulative distribution function of the limiting distribution of our test statistic is continuous in U.

THEOREM 3. Suppose Assumptions A1–A4 and also (25) and (26) are satisfied. Then the tests

defined earlier are inadmissible. There exist tests

depending on W_n and A_n alone that have a strictly better asymptotic power function than φ_n. For alternatives (24) we have

where for at least one c ≠ 0 the left-hand side is smaller than (and not equal to) the right-hand side. Moreover, we also have

Remark 3. In many cases, it is not that surprising that the tests φ_n are inadmissible. As an example take, e.g., one of the preceding models with non-Gaussian error terms u_t, v_t, when their distribution is known up to a finite-dimensional parameter. Then estimators based on parametric models for u_t, v_t should be more accurate, and hence the corresponding tests would yield more power. However, it is interesting to observe that there exists a test better than φ_n depending on (W_n,A_n) alone! Therefore there exists some way of computing a test statistic with (W_n,A_n) other than the t-test statistic, which defines a test with a uniformly better asymptotic power function!

Remark 4. Unfortunately, the proof is nonconstructive. I only assert the existence of such a test. The proof does not provide a way to construct this dominating test.

Remark 5. Here we will prove pointwise convergence in (27) and (28). With some technical refinements, the proof can be modified to show that the convergence in (27) is uniform on bounded sets of C.

Proof of Theorem 3. The proof can be found in the Appendix.

APPENDIX

Proof of Theorem 1.

We give an indirect proof. Let us assume that the test φ is from our class and has the same (or worse) power function as ψ.

Let us first deal with the extreme cases. So let us assume that the parameters describing the test φ are zero except for U. Then U must be nonzero (because we did assume that not all of the parameters vanish), and we can conclude that φ must be a trivial test, either accepting for all (A,W) if U is positive or rejecting in every case if U is negative. Hence we can easily see that—as ψ must have the same or better power function as φ—ψ must be trivial, too.

PROPOSITION 1. Let us assume that the parameter a for our test φ is nonzero. Then the test ψ = 1 on the set {(A,W) : A < a} Q₀ almost surely.

For the proof of our proposition the following lemma is helpful.

LEMMA 1. There exist κ₁,C₁ > 0, so that the test φ rejects on the set

or there exist κ₂,C₂ < 0, so that the test φ rejects on the set

Let us now prove the lemma. First, let us look at the trivial cases. If μ + ν = 0, the lemma is trivial. Let us now assume that μ + ν is nontrivial. Then, at least one of the sets {λ : λ > 0} and {λ : λ < 0} is not a μ + ν null set. Let us assume without limitation in generality that

If this is true, then there exist α,β > 0 such that

Consequently using (13), we may write our criterion function u as

With

So let us define

and let us define C₁ later on. We will, however, without limitation of generality, assume that

We now want to show that our test rejects if

To do so, we have to show that for all values of (W,A) satisfying (A.4) u(A,W) becomes bigger than the critical value. Hence, we have to find lower bounds for all the terms (A.2). Subsequently we will show that all terms—except the last one—are bounded from below by polynomials in W + A, at least for the values of (W,A) that are of interest to us.

Because (A.4) implies that W > 0 and W > A we have

and, as c′ ≥ 0,

Now observe that for W − κ₁ A > 1 for z with z < α

and, because of our definition of κ₁,

To analyze the first integral in (A.2) let us distinguish two cases: λ < 0 and λ ≥ 0. For the first case, observe that for z < 0,W > 0

Hence, for −α ≤ λ < 0

On the other hand, for W > 0 and λ > 0

because the expression inside the integral is nonnegative and all the bounds of the integrals are nonnegative. Because the integrand in the last term on the right-hand side of (A.2) is nonnegative, we have for W > 0

where p(.) is a polynomial (of third order) representing our estimates for the first three terms of (A.2). Let us fix the constant C₁ > 0 later on. For any nonnegative C₁, if W > κ₁ A + C₁, then for λ ∈ (2α,β), W − (λ/2)A ≥ (β/2)A, and hence W + A = W − (λ/2)A + (1 + λ/2)A ≤ (W − (λ/2)A)(1 + (2/β)(1 + λ/2)). Hence, for all W > κ₁ A + C₁

with γ = 1/(1 + (2/β)(1 + κ₁ /2)) and

If W > κ₁ A + C₁, then obviously W + A > C₁. Because an exponential function grows faster than any polynomial, we can find a C₁ such that for all W + A > C₁ the right-hand side of the preceding inequality is strictly bigger than U—our critical value. Hence, our test φ will reject.

Now we can proceed to prove Proposition 1. Let us assume that the proposition does not hold true and ψ = 0 on the set B ⊂ {(A,W) : A < a} with Q₀(B) > 0. Because with

B_n ↑ B and consequently Q₀(B_n) ↑ Q₀(B) > 0, we can find an n so that

Now let us consider the power functions of our tests φ, ψ for probability measures Q_λ, where we choose λ depending on the behavior of the test φ. Lemma 1 guarantees that the test φ = 1 on the set {(A,W) : W > κ₁ A + C₁} (in this case choose λ → ∞) or φ = 1 on the set {(A,W) : W < κ₂ A + C₂} (in this case choose λ → −∞). We will discuss the first case only, because the second one is perfectly analogous. As the power function of ψ is not worse than the one of φ for all values of the alternative, we have

The fact that φ = 1 on B_n implies that

We can easily see that for λ → ∞

and hence

On the other hand

Now we can easily see that exp(−(λ²/2)(1/2n) + C₁λ + λκ₁ a + (κ₁²/2)) → 0 for λ → ∞. Because exp(−(κ₁²/2) + λ(κ₁(A − a) − (λ²/2)(A − a))) ≤ 1, one can conclude that the preceding estimates guarantee that

Therefore (A.6) implies that

which contradicts (A.5). Hence our assumption that there exists a nontrivial set B must be wrong, which is just the proposition we wanted to prove.

Now that our main tools are established, we can proceed with the proof of the theorem. We will show that φ = ψ. So let us start with an extreme case, namely, that

Then the proposition we just proved shows that ψ rejects on (A < a), too. Hence, we may conclude that φ ≤ ψ. Moreover, the power function of ψ is better than the power function of φ on the null; hence ∫(ψ − φ) dQ₀ ≤ 0. Therefore we can conclude that ψ = φQ₀ almost surely.

Next we investigate the general case, namely, where at least one of the parameters b,c,μ,ν is nontrivial. For this purpose we introduce the following notation. For each test ρ let

and define the functional L_n by

It is easily seen that our assumptions (14) and (15) guarantee that all power functions of tests are three times differentiable. Hence our functionals are well defined. The first integral on the right-hand side of the preceding definition is finite because of the definition of the integrand (o(λ²)) and (7). Moreover, because our functionals only depend on the power functions, we have

Moreover, we can easily see that with

and

then

and

Hence,

We will now analyze the behavior for n → ∞.

Let us first analyze ∫_[u<U](φ − ψ)(u_n − U) dQ₀ = ∫_[u<U] ψ(U − u_n) dQ₀. We can apply the dominated convergence theorem (U − u_n is monotonically falling and on the event [u < U] between 0 and U). We therefore have

For the second term, observe that

with

and c(A,W) does not depend on n. Moreover, assumptions (14) and (15) for the integral in conjunction with Remark 2 guarantee the absolute integrability of c with respect to Q₀. For the second term on the right-hand side of (A.9) we may use the monotone convergence theorem and conclude that

converges to

(or ∞, if the integral is not finite). Hence, we may conclude that

where again the integral on the right-hand side may be infinite (as the integrand is nonnegative, this is not a serious problem). Because in (A.8) the limit is finite, we may add (A.8) to the preceding equation and conclude that

However equation (A.7) guarantees that all the terms of the preceding sequence are zero; hence the limit has to be zero, too. Therefore

which implies that ψ = 0 on [u < U] and ψ = 1 on [u > U]. So ψ and φ can differ only on the set [u = U], which was just the result we wanted to establish.

Proof of Theorem 3.

Assumption A3 states that the distributions of (W_T,A_T) under P_{δ_T} converge weakly to Q_c. Hence,

because we did assume that the measure of the boundary of the set {(W,A) : W²/A > U} has Q₀-measure, and hence Q_c-measure 0. Hence the limiting power function of the test φ_T is the power function of the test φ = I {(W,A) : W²/A > U} with respect to the measures Q_c. Ploberger (2004) establishes that this test is not within the complete class described in the previous section, and hence our main theorem establishes that this test cannot be admissible: Hence there must exist a test ψ = ψ(A,W), such that

and for at least one c the left-hand side in (A.10) is strictly smaller than the right-hand side. To prove our theorem, we have to establish the existence of tests ψ_n = ψ_n(A_n,W_n), such that

for all c. The test ψ is a measurable function of R² to the interval [0,1]. Hence, Lusin's theorem (cf. Rudin, 1974, p. 56) guarantees the existence of continuous functions φ_n, such that

Because 0 ≤ ψ ≤ 1, without limitation of generality we can assume that 0 ≤ φ_n ≤ 1, too (otherwise replace φ_n by max(0,min(φ_n,1))). Hence, we can interpret φ_n(A_n,W_n) as tests. By applying Chebyshev's inequality, we can easily see that for every ε > 0 and all M there exists a K = K(ε), such that sup_|c≤M Q_c([|W| > K]) < ε. Because for |W| ≤ K,|c| ≤ M exp(cW − c²A/2) ≤ exp(KM), we have for arbitrary ε > 0 and arbitrary M

and therefore (as |∫ψ dQ_c − ∫φ_n dQ_c| ≤ Q_c([ψ ≠ φ]))

On the other hand, φ_n are continuous functions. So our fourth assumption and Prohorov's theorem guarantee the relative compactness of the set of distributions of the (W_n, A_n) under P_n(δ_n), where δ_n = D_n⁻¹ c + δ₀ and |c| ≤ M. Moreover, we did assume that the distribution of (W_n, A_n) under P_n(δ_n) converges (for all c) to the distributions to the Q_c; hence it is easy to see that the set containing the

and the Q_c, both for |c| ≤ M, is compact. Therefore, it can easily be seen that

This limiting relation and (A.11) together imply that there exist sequences M_n ↑ ∞, m_n → ∞ such that

which is just the result we wanted to show.

References

REFERENCES

Aït-Sahalia, Y. (1999) Maximum Likelihood Estimation of Discretely Sampled Diffusions: A Closed-Form Approach. Discussion paper, Princeton University.

Ahn, S.K. & G.C. Reinsel (1990) Estimation for partially nonstationary multivariate autoregressive models. Journal of the American Statistical Association 85, 813–823.Google Scholar

Chang, Y. & J. Park (2002) On the asymptotics of ADF tests for unit roots. Econometric Reviews 21, 432–477.Google Scholar

Davidson, J. (1998) A Wald test of restrictions on the cointegrating space based on Johansen's estimator. Economics Letters 59, 183–187.Google Scholar

Dickey, D.A. & W.A. Fuller (1979) Distribution of estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–431.Google Scholar

Dickey, D.A. & W.A. Fuller (1981) Likelihood ratio tests for autoregressive series with a unit root. Econometrica 64, 1057–1072.Google Scholar

Elliot, G., T.J. Rothenberg, & J.H. Stock (1996) Efficient tests for an autoregressive unit root. Econometrica 64, 813–836.Google Scholar

Hamilton, J.D. (1994) Time Series Analysis. Princeton University Press.

Jeganathan, P. (1991) Some aspects of asymptotic theory with application to time series models. Econometric Theory 7, 269–306.Google Scholar

Jeganathan, P. (1995) Some aspects of asymptotic theory with applications to time series models. Econometric Theory 11, 818–867.Google Scholar

Johansen, S. (1988) Statistical analysis of cointegrating vectors. Journal of Economics Dynamics and Control 12, 231–254.Google Scholar

Johansen, S. (1991) Cointegration and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59, 1551–1580.Google Scholar

Le Cam, L. & G.L. Yang (1990) Asymptotics in Statistics: Some Basic Concepts. Springer-Verlag.

Müller, U. & G. Elliot (2001) Tests for Unit Root and the Initial Observation. Discussion paper, University of St. Gallen.

Park, J. & P.C.B. Phillips (2001) Nonlinear regressions with integrated time series. Econometrics 69, 117–161.Google Scholar

Phillips, P.C.B (1996) Econometric model determination. Econometrica 64, 763–812.Google Scholar

Phillips, P.C.B. & B.E. Hansen (1990) Statistical inference in instrumental variables regression with I(1) processes. Review of Economic Studies 57, 99–125.Google Scholar

Phillips, P.C.B. & S. Ouliaris (1990) Asymptotic properties of residual based tests for cointegration. Econometrica 58, 165–193.Google Scholar

Phillips, P.C.B. & W. Ploberger (1996) An asymptotic theory of Bayesian inference for time series. Econometrica 64, 381–412.Google Scholar

Phillips, P.C.B. & Z. Xiao (1998) A Primer on Unit Root Testing. Cowles Foundation Discussion paper, Yale University.

Ploberger, W. (2004) A complete class of tests when the likelihood is locally asymptotically quadratic. Journal of Econometrics 118, 67–94.Google Scholar

Ploberger, W. & P.C.B. Phillips (2003) Empirical limits for time series models. Econometrica 71, 627–674.Google Scholar

Rudin, W. (1974) Real and Complex Analysis, 2nd ed. Reprinted by Tata-McGraw-Hill.

Said, S.E. & D.A. Dickey (1984) Testing for unit roots in ARMA models of unknown order. Biometrika 71, 599–607.Google Scholar

Saikkonen, P. (1991) Asymptotically efficient estimation of cointegrating regressions. Econometric Theory 7, 1–21.Google Scholar

Schervish, M.J. (1995) Theory of Statistics. Springer-Verlag.

Stock, J.H. (1995) Unit roots, structural breaks and trends. In R.F. Engle & D. McFadden (eds.), Handbook of Econometrics, vol. 4, pp. 2739–2841. North-Holland.

Stock, J.H. & M. Watson (1993) A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica 51, 783–820.Google Scholar

Strasser, H. (1985) Mathematical Theory of Statistics. De Gruyter.

Article contents

ADMISSIBLE AND NONADMISSIBLE TESTS IN UNIT-ROOT-LIKE SITUATIONS

Abstract

1. INTRODUCTION

2. THE MAIN THEOREMS FOR THE UNIDIMENSIONAL CASE

3. APPLICATIONS

APPENDIX

Proof of Theorem 1.

Proof of Theorem 3.

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests