1. INTRODUCTION
This paper discusses optimality properties of certain statistical tests and makes two contributions. First we analyze a restricted parametric model. These restrictions enable us to determine the class of asymptotically optimal tests. Moreover, we show that in nonclassical situations many traditional tests for testing simple hypotheses, such as, e.g., the usual generalizations of the t-test, are not in our class and are inadmissible. Second, we use these results to show the inadmissibility of some familiar procedures, such as the Dickey–Fuller and related tests, when there is no deterministic trend in the regression or standard t-type tests for restrictions on cointegrating relations.
We begin by outlining the asymptotic testing problem for which we can exactly describe the class of all admissible tests. Let us assume that our parametric family is unidimensional, and later on we move on to discuss more general cases.
Many problems in asymptotic statistics have a specific form, namely, that the logarithm of the likelihood can be approximated by a quadratic function of the (properly normalized) parameter. Classical asymptotic analysis then rests on two crucial assumptions: first, that the linear term of the quadratic function is (asymptotically) normal; and second, that the quadratic term (the Fisher information) is asymptotically constant.
In many situations relevant for econometrics—most of them connected to problems with unit roots—these assumptions are not fulfilled. Although the log-likelihood can asymptotically be approximated by a quadratic function, the linear term in the parameter is not Gaussian, and the matrix describing the quadratic term is not a constant. The most typical example arises with the simplest autoregressive (AR) model
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm001.gif?pub-status=live)
where ut is normal with zero expectation and known variance. Then it is straightforward to show that the likelihood is quadratic around δ = 0. The properly normalized coefficients of this quadratic function are neither normal nor constant. The prominence of this example indicates that there should be a significant number of possible applications. These will be discussed in more detail in Section 3.
So let us denote by pn(θ) the density corresponding to parameter θ. For simplicity, let us assume that the parameter θ is unidimensional. So, for a given true parameter θ0, we assume that asymptotically with a suitable scaling sequence Dn ↑ ∞
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm001.gif?pub-status=live)
where we assume that the pair (Wn,An) converges in distribution to a nontrivial limit (W,A). We need not discuss here the precise meaning of the approximation or convergence in (1), for which the reader is referred to Le Cam and Yang (1990), Ploberger (2004), and Jeganathan (1991, 1995). In Ploberger (2004), specific problems associated with testing
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm002.gif?pub-status=live)
were discussed.
First of all let us analyze (1). If the log-likelihood were exactly quadratic (i.e., in (1) we would replace ∼ with =), then the likelihood only depends on (Wn,An). Consequently, this pair would be a sufficient statistic. In that case, we would be able to find for every test φ another test ψ depending only on (Wn,An) so that φ and ψ have identical power. Hence, we would only need to analyze tests depending on (Wn,An).
Modern statistical theory allows us to use similar reasoning in the asymptotic case. We can, however, no longer analyze the tests themselves; the theory only gives us convergence results about the power functions of the tests. Typically, the theory allows us to show that the behavior of power functions of problems like (2) is similar to a fixed “asymptotic” testing problem. The theory shows that one can approximate the power functions of tests for (2) with power functions of tests for the “asymptotic” problem up to an arbitrary accuracy.
We now briefly sketch the reason for this kind of analysis, emphasizing that the arguments in the following paragraph are heuristic. A rigorous treatment of the subject can be found in Strasser (1985).
If a test ψ is admissible for the asymptotic problem, then we know that we can find a sequence φn of tests for (2) such that with increasing sample size the difference between the power functions of the φn and the power function of ψ converges to zero. (Although the tests are defined on different sample spaces, their power functions have the same domain, namely, the parameter space!) Then the φn are in a certain sense asymptotically admissible. Assume one has given some “competitors” ρn. Then the power functions of the ρn can be approximated by power functions of tests νn for the asymptotic problem. There are compactness results available that show that the power functions of the νn must contain a convergent subsequence, converging to the power function of a test ν. We did assume ψ to be admissible. Hence it cannot be dominated by ν, and so it must have better power properties for some parameters. Hence, asymptotically, the same must be true for the power functions of φn and ρn. The same kind of argument applies, vice versa, for tests that are not admissible for the asymptotic problem.
This type of testing problem was analyzed for a one-dimensional parameter in Ploberger (2004), and one of the main results was that power functions of tests for (2) can either be uniformly approximated or dominated by power functions of tests for the following testing problem. Assume we have given a parametrized family Qλ, λ ∈ R on the space
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm002.gif?pub-status=live)
so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm003.gif?pub-status=live)
We can think of this parametrized family as a statistical experiment, namely, a sample of size one of a random two-dimensional vector distributed according to a Qλ with an unknown λ. Suppose we want to consider tests for the hypothesis λ = 0. In Ploberger (2004) a set of tests was constructed such that for every other test we can find one from our class with “better” power properties. Such a class is called “essentially complete” in Schervish (1995, p. 174) and “complete” in Strasser (1985, p. 41) and Ploberger (2004). Because the present paper heavily relies on Ploberger (2004), we will continue to use the term complete for this kind of set of tests.
Together with the limiting result, this complete class theorem characterizes all possible limits for power functions of tests for the problem (2). However, the problem is that every set covering (in our sense) a complete class is a complete class again. Hence the result from Ploberger (2004) does not allow us to directly classify tests. If we know that a test is within our class we cannot directly conclude that it is admissible, because our definition allows the “complete class” to contain additional tests.
Moreover, a test outside the complete class may be admissible. The results of Ploberger (2004) do not preclude that there exists another test (within the class) with exactly the same power function. In this paper, we show that this is in fact not the case. The main theorem implies that every test within our class is admissible and characterized by the power function. Hence we can conclude that tests not in our class are asymptotically inadmissible. So essentially we show that for every test outside our complete class one can find a test from our class that is better uniformly—i.e., it is better for all values of the parameter!
Moreover, the same paper establishes the fact that the test based on the t- or F-test-type statistic, namely, rejecting if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm003.gif?pub-status=live)
for some critical values U, is not within this class. In Section 3 we discuss important examples where our results show the existence of tests that dominate standard procedures uniformly. Obviously, it is also important to know how much power one can win by using the admissible tests. For the case of unit root testing, this question was analyzed in Elliot, Rothenberg, and Stock (1996).
This result has also some interesting consequences when we are testing (2) for our family (1). Suppose we have a sequence of tests ψn for the finite-sample problem such that their asymptotic power function converges to the power function of (3). Then the standard convergence results of decision problems (cf. Strasser, 1985) allow us to conclude that there exists a sequence of tests with power functions dominating the power functions of ψn uniformly!
In Section 3, we then extend these results to show the inadmissibility of certain familiar procedures.
1. The Dickey–Fuller τ-tests when no trend is present: Consider a univariate time series yt and consider the models
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm004.gif?pub-status=live)
and suppose we want to test whether δ = 0 against δ ≠ 0. Then it seems natural to compute the t-test statistic based on an estimator for δ and use the critical values established by Dickey and Fuller (1979, 1981).
2. Suppose we have an estimator—say,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm004.gif?pub-status=live)
—for a scalar parameter ρ and there exists a scaling statistic
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm005.gif?pub-status=live)
. Let us assume that (with some normalizing sequence Dn)
[bull ]
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm006.gif?pub-status=live)
converges in distribution to a standard normal distribution.
[bull ]
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm007.gif?pub-status=live)
converges in distribution to some random variable σ, which is almost surely positive and not a constant.
[bull ] Asymptotically, the distributions of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm008.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm009.gif?pub-status=live)
are independent.
Although there are some other applications, the most important case where this kind of phenomenon occurs is the estimation of parameters describing cointegrating relationships. So when testing a restriction—say, ρ = 0—one is tempted to use the analogue to the Wald test. Indeed, one can easily see that in case the null is true,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm005.gif?pub-status=live)
converges in distribution to a standard normal distribution. Therefore it seems natural to use tests based on (5) and use one- or two-sided critical values. Here I show that these procedures are inadmissible in the two-sided case.
Here we only deal with the simplest cases of unit root testing. Clearly, the results would be more valuable if we allowed for deterministic terms. Typically, one is interested in tests invariant with respect to these terms. The advantage of this approach would be a reduction in the parameter space, which would suit us well. However, the “reduced” problem no longer satisfies (1). So a more general analysis of the problem would be beyond the scope of the paper.
2. THE MAIN THEOREMS FOR THE UNIDIMENSIONAL CASE
Tests are functions φ from the set R × R+ to the interval [0,1] (φ = 0 means that we accept the null; φ = 1 means that we reject the null; and in between we randomize).
The power function of a test φ is the function that attaches to each λ ∈ R the value ∫φdQλ. The concept of a power function enables us to compare tests. A test φ is better than a test ψ if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm006.gif?pub-status=live)
So a better test dominates the worse test for all possible values of the parameter. In Ploberger (2004), it was established that the set C of tests consisting of the test functions defined subsequently is complete (or essentially complete according to Schervish, 1995).
These test functions are described by the following parameters: U,a,b,c and measures μ,ν subject to the following restrictions.
1. ν is a σ-finite measure on [−1,1] − {0} (i.e., no mass outside the unit interval with the number zero excluded) such that for all compact sets K ⊂ [−1,1] − {0} ν(K) < ∞ and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm007.gif?pub-status=live)
(in particular ∫[−1,1] λ2 dν(λ) is finite and c ≥ 0).
2. μ is defined on R − [−1,1] (i.e., the real line except the unit interval), and for all M > 1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm008.gif?pub-status=live)
3. Not all the numbers a,b,c,U and measures μ and ν are trivial.
4. The test has the correct size—say, α < 1—under the null.
Then define the function
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-30223-mediumThumb-S0266466608080031frm009.jpg?pub-status=live)
Now let the class of tests be characterized by the following properties: (i) the test should reject the null for all (A,W), so that A < a; (ii) if A > a, then the test should reject if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm010.gif?pub-status=live)
and accept if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm011.gif?pub-status=live)
These parameters do not describe the test completely. In particular, we did not define what happens if A = a or if u(A,W) = U. So if these equations do not describe null sets (with respect to all Qh), our complete class defined previously may contain some additional (not optimal) tests. This, however, is not a contradiction to the definition of a complete class—any larger set of a complete class is a complete class, too!
Remark 1. Because of our restrictions 1 and 2, in particular expressions (7) and (8), we can easily see that every function u(A,W) satisfying (9) can for every β,γ > 0 be written as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-21132-mediumThumb-S0266466608080031frm012.jpg?pub-status=live)
where ν′,μ′ are the restrictions of the measure μ + ν on [−β,γ] and R − [−β,γ], respectively.
Remark 2. Because
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm010.gif?pub-status=live)
we can replace the first integral in (9) by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-19738-mediumThumb-S0266466608080031frm013.jpg?pub-status=live)
Additionally, we assume that the following three conditions are fulfilled.
Condition C1. For some ε > 0 we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm014.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm015.gif?pub-status=live)
Condition C2. The support of the measure Q0 (and hence the support of all the measures Qλ) is the whole R × R+ = {(W,A) : A > 0} (i.e., every open set of R × R+ has positive probability under Q0).
Condition C3. The distribution function of A is continuous (i.e., for all a Q0([A = a]) = 0).
THEOREM 1. Suppose restriction 1 and Conditions C1, C2, and C3 hold and assume that ψ is a test that has the same or a better power function than the test φ, which is from our class and is determined by the parameters a,b,c,U,μ,ν. Then ψ is from our class, too, and determined by the same parameters.
Proof. The proof is provided in the Appendix.
It is, however, a nontrivial task to show that tests are not in our complete class. For a proof of this fact, it is not sufficient to show that the corresponding test statistic cannot be written in the form (9). One has to show that the corresponding test (or the critical set) cannot be generated as a level set of any function (9)!
Suppose now that we have given a family Qλ,λ ∈ R and a test φ (which is a function from R × R+ into [0,1]) satisfying the following requirements.
1. All the assumptions of the previous theorem are fulfilled.
2. Every neighborhood of the point (0,0) has positive probability with respect to Q0.
3. There exists a constant K > 1 such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm011.gif?pub-status=live)
Then the following theorem holds.
THEOREM 2. The test φ is not in our class and is therefore inadmissible.
Proof. The assumptions of this theorem are quite similar to those of Ploberger (2004), and only our second assumption here is stronger. Here we only postulate that all neighborhoods of the origin have positive mass, instead of assuming that every open set of the half plan R × R+ has positive mass. One can easily see that this requirement is sufficient, because only the behavior of the power function for alternatives near the origin is used in the proof. █
In case the likelihoods are given by (1), one consequence of the preceding theorems is that the analogues to the “classical” Lagrange multiplier (LM), likelihood ratio (LR), and Wald tests for testing θ = θ0 are asymptotically inadmissible. More precisely, let us denote by φn such a test. The “classical” tests reject when test statistics exceed the critical values. Moreover, it is an easy exercise—very similar to the standard proof of asymptotic equivalence in the stationary case—to show that differences between suitably normed and transformed test statistics and Wn2/An converge to zero under the null. Let us denote the asymptotic critical value for the test statistic Wn2/An by K and assume that K is such that the cumulative distribution function of W2/A is continuous at K. This is not a very strong restriction, because a cumulative distribution function can have at most countably many discontinuities. Let us define φn by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm012.gif?pub-status=live)
Then we can easily see that under the null the probability of the LM, LR, and Wald tests and the tests φn giving different results converges to zero. (So, with probability converging to one under the null, either all of them accept or all of them reject!) Hence the power functions of φn and the classical tests are asymptotically the same under the null.
We now have to show that the same holds true for the alternatives. Consider local alternatives of the form
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm013.gif?pub-status=live)
In Ploberger (2004), contiguity of the probability measures corresponding to θ0 and θn was established. This means that for any sequence En of events such that Pθ0(En) → 0 Pθn(En) → 0, too. (Contiguity is one of the cornerstones of modern theoretical asymptotic theory. For a more detailed discussion see Le Cam and Yang, 1990; Strasser, 1985.) Because the probability of conflict between the LM, LR, and Wald tests and φn converges to zero under the null, the result quoted previously allows us to conclude that the probability of a conflict under the local alternatives converges to zero, too. Hence, the difference of the probabilities of rejection—which is exactly the definition of a power function—must converge to zero for n → ∞. Therefore, the power functions are the same asymptotically. We assumed that (Wn,An) → (W,A) in distribution under the null, and so Wn2/An → W2/A in distribution, too. Because the cumulative distribution function of the limiting distribution is continuous at K, the power function of the φn (and, consequently, the power functions of the LM, LR, and Wald tests) must converge to the power function of our test φ under the null.
To show the convergence of the power functions of the φn to the power function of φ at the local alternatives requires only a bit more technical effort. Let us fix the sequence of local alternatives and consider the densities
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm014.gif?pub-status=live)
Then we can easily conclude from (1) that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm016.gif?pub-status=live)
in distribution. We did assume, however, that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm015.gif?pub-status=live)
because otherwise QC would not be a probability measure. Now let ε > 0 be arbitrary. Then we can find an L so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm016.gif?pub-status=live)
Consequently, as [ell ]n → [ell ] in distribution,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm017.gif?pub-status=live)
and as ∫[ell ]n dPθn = 1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm017.gif?pub-status=live)
Hence we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm018.gif?pub-status=live)
One can easily see that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm019.gif?pub-status=live)
Further, because of the convergence of (An,Wn) and [ell ]n,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm020.gif?pub-status=live)
We have that (when using (17))
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm021.gif?pub-status=live)
As ε + ∫φ min([ell ],L) dQ0 ≥ ∫([ell ] − min([ell ],L)) dQ0 + ∫φ min([ell ],L) dQ0 ≥ ∫φ[ell ] dQ0,we can conclude that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm022.gif?pub-status=live)
Because ε was arbitrary, we have shown that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm023.gif?pub-status=live)
This establishes the fact that the power function of the φn converges to the power function of φ for all local alternatives. Because we already know that φ is not within our class, we can find a test ψ with (strictly) better power properties. In Theorem 3 in Section 3 we show that in this situation it is possible to construct tests ψn with strictly better asymptotic power properties than the φn, which establishes our result, namely, that the φn are asymptotically inadmissible.
In view of the preceding results one is tempted to look at other “generalizations” of the classical LR test to situations with stochastic information. One such generalization is the test based on the posterior information criterion (PIC) (cf. Phillips, 1996; Phillips and Ploberger, 1996; Ploberger and Phillips, 2003). Consider tests that reject if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm018.gif?pub-status=live)
and accept if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm019.gif?pub-status=live)
where K is a constant determined by the desired significance level. These tests are all admissible. Simply consider our measures μ,ν to be the Lesbesgue measure and choose the other constants appropriately. Then it is a simple but tedious calculation to show that (12) yields a test based on
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm024.gif?pub-status=live)
which is immediately seen to be equivalent to the test determined by (18) and (19).
It should be noted that in the “classical” case of A being constant this test is equivalent to a test based on W2/A, too. So we can view the PIC test as a generalization of the classical tests, too.
3. APPLICATIONS
The result of the previous section establishes a minimal complete class theorem for a rather large class of testing problems. Apart from the three technical conditions we only require the likelihood function to be locally asymptotically quadratic. The main point of interest seems to be that this class of admissible tests is quite different from the one for the classical problem where W is Gaussian and A constant. In this kind of situation the set of all admissible tests is well known even for multivariate parameters (Birnbaum's theorem; cf. Strasser, 1985). Each of these tests is characterized by a convex set (in the univariate case an interval) and rejects if W lies outside the interval. Hence Theorem 1 of the previous section seems to be of considerable theoretical interest. It shows that even a small change in our assumptions makes enormous differences for the class of admissible tests.
In the previous section we only dealt with one-parameter families. This fact represents a significant hurdle for a direct application of the result. However, we can apply the result to prove the inadmissibility of tests. We will show that certain tests—e.g., the Dickey–Fuller τ-test and the pseudo-t-test based on the statistic (5)—can be interpreted as tests on a unidimensional parameter. Our result then shows that there must exist a test with uniformly better power function. The drawback is that the result is nonconstructive. The only information we supply about the better test is that it has a form of (9). Nevertheless, the result may motivate researchers in the field. Efforts to find better tests are now not doomed from the start. We know that a better test exists!
Let us first analyze the Dickey–Fuller test. Unfortunately, the techniques presented here cover only the cases that do not occur that often in practice. In most cases the data contain a drift or a deterministic trend. It should be noted that especially the presence of deterministic trends changes the situation completely. I conjecture that in this case the Dickey–Fuller τ-test is admissible. The results of Müller and Elliot (2001) establish an optimality property. For a detailed analysis also see Elliot et al. (1996), Phillips and Xiao (1998), and Stock (1995). Nevertheless, I think that the results of this paper are important to the theory of testing for unit roots. It shows, in particular, that the situation is radically different from the classical one.
Let us now assume that a process yt is defined by (4) and let us furthermore assume that the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm020.gif?pub-status=live)
with zero expectation and that all of the roots of the polynomial
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm025.gif?pub-status=live)
lie outside the unit circle. Then let us consider local alternatives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm021.gif?pub-status=live)
Jeganathan (1991, 1995) performed a rigorous asymptotic analysis for these kinds of testing problems. In particular, he established limiting properties of these experiments. Using his results, one can easily show that in many interesting cases the necessary conditions of our theorems are fulfilled. The “elementary” case of the Dickey–Fuller test without a trend is discussed in Ploberger (2004).
In many applications, however, one wants to avoid relatively narrow parametric assumptions such as (20). Standard statistical theory of optimal testing (cf. Strasser, 1985) is based on Neyman–Pearson tests, which critically depend on densities. Hence we cannot show the admissibility of tests. Under certain circumstances, however, we can show that certain tests are not admissible in the sense that there exists a test with a uniformly better power function.
Let us now assume that Pθ is a family of probability measures parametrized by a parameter θ. We assume that the parameter can be split into two parts,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm026.gif?pub-status=live)
where δ should be one-dimensional, whereas the nuisance parameter β can be of arbitrary dimension. Suppose we want to test the hypothesis δ = 0.
Suppose one has an estimator—say,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm027.gif?pub-status=live)
—for δ; then one will be tempted to use an analogue to the usual t-test. One will construct some other statistic—say,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm028.gif?pub-status=live)
—so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm029.gif?pub-status=live)
Then it is easy to construct a test. In the “classical” case, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm030.gif?pub-status=live)
is asymptotically normal,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm031.gif?pub-status=live)
will be an appropriately scaled consistent estimator for the asymptotic variance. We are more interested in the nonstandard case, where the limiting distribution of the properly scaled
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm032.gif?pub-status=live)
is nontrivial.
In some cases, we can utilize our approach analyzing the nonoptimality of tests. In the previous section, we were mainly working with the scores and the “information” (the second derivative) of the likelihood function. Here the situation is different. We start with estimators for the parameter and their accuracy.
So let us review the basic assumption of the previous section. For doing so, we must find analogues to the Wn and An defined earlier. We started with the model (1):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm033.gif?pub-status=live)
Then it is easily seen that—in case the logarithm of the likelihood is twice continuously differentiable—the maximum likelihood estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm034.gif?pub-status=live)
can be approximated in the following way:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm035.gif?pub-status=live)
Moreover, we can easily conclude from our assumptions that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm036.gif?pub-status=live)
converges in distribution, and so
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm037.gif?pub-status=live)
would be a feasible normalizing estimator.
In our situation, however, we start with the estimators
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm038.gif?pub-status=live)
,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm039.gif?pub-status=live)
. So let us assume that there exists an appropriate sequence of scaling factors Dn and define (keeping in mind that we want to test δ = δ0)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm040.gif?pub-status=live)
Then we assume that the following four assumptions are fulfilled.
Assumption A1. The nuisance parameter β is fixed.
Assumption A2. (An,Wn) converges (for
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm041.gif?pub-status=live)
) in distribution to some random variable (A,W) with distribution Q0, which fulfills the prerequisites of the previous section.
Assumption A3. For local alternatives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm022.gif?pub-status=live)
(i.e.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm042.gif?pub-status=live)
), the distributions of (An,Wn) converge to a measure Qc with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm043.gif?pub-status=live)
Assumption A4. The mapping c → Qc is continuous (with respect to the weak topology of measures). This means that cn → c implies Qcn → Qc weakly.
Essentially, we assume that the statistic (An,Wn) behaves “the same way” as in the case of a unidimensional parameter. We will now construct some examples that show that these assumptions will be fulfilled in many cases of considerable interest. In particular, it should be noted that we do not require parametric assumptions such as, e.g., Gaussian distributions of the error terms!
1. The generalized augmented Dickey–Fuller (ADF) test with no deterministic trend. Said and Dickey (1984) investigated the ADF test for unit roots for autoregressive moving average (ARMA) processes with independent and identically distributed (i.i.d.) innovations. Lately, in Chang and Park (2002) further generalizations of the model (4) are considered (e.g., ut are martingale differences with a very general structure, and the index p may depend on the sample size and increase to infinity). So let us consider the model (4) and define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm044.gif?pub-status=live)
They show that, even under the general circumstances considered in their paper (using our notation), letting
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm045.gif?pub-status=live)
be the OLS estimator for δ and using an estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm046.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm047.gif?pub-status=live)
equals the denominator in the t-test (i.e., “variance estimator”), the properly normalized distributions of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm048.gif?pub-status=live)
and the estimation error jointly converge in distribution to the corresponding distributions for the (simple) Dickey–Fuller test. Thie convergence holds for δ = 0 and also for the local alternatives (22) (cf. Chang and Park, 2002). So let us define the “scores” by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm049.gif?pub-status=live)
and denote by Qc the limiting distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm050.gif?pub-status=live)
The measures Qc are defined on the set {(A,W) : A ≥ 0} ⊂ R2. As mentioned before, the limiting distributions of our statistics are the same as for the Dickey–Fuller test. Hence we can conclude that the measures Qc are the same as the asymptotic distributions for the simple random walk model, which was discussed in Ploberger (2004). So we may conclude that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm051.gif?pub-status=live)
Moreover, it can easily be seen that other assumptions of our main theorem are fulfilled, too. Elementary calculations show that the two-sided ADF test rejects when
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm052.gif?pub-status=live)
exceeds the critical value—say, U. Hence the power of the ADF test against a local alternative (22) will converge to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm053.gif?pub-status=live)
This only holds for all those U for which Qc({W2/A = U}) = 0. Because, however, the Qc have densities with respect to Q0, Qc({W2/A = U}) > 0 implies that Q0({W2/A = U}) > 0. But this inequality can only be true for countably many U. Hence we have convergence for all but at most countably many U.
2. Cointegrated system testing. Another possible application is the testing of restrictions on certain parameters of cointegrated systems. Let us first consider the simplest possible case, namely, a bivariate system in Phillips' triangular form:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm023.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm111.gif?pub-status=live)
For simplicity, let us assume that ut,vt are i.i.d. standard normal with zero expectation and the covariance being the identity matrix (which implies that ut,vt are uncorrelated). Then we can estimate γ with OLS, and our estimation error will be
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm054.gif?pub-status=live)
Let us now suppose that we want to test γ = γ0. Then some tedious but elementary calculations show that the asymptotic distribution of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm055.gif?pub-status=live)
converges under the null to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm056.gif?pub-status=live)
where W1,W2 are independent Wiener processes. Moreover, we can also consider local alternatives of the form
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm057.gif?pub-status=live)
Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm058.gif?pub-status=live)
converges in distribution to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm059.gif?pub-status=live)
3. Some generalizations. We can generalize the ideas of the preceding example for use in a more general context. Suppose we want to test whether a parameter
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm060.gif?pub-status=live)
and we have an estimator
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm061.gif?pub-status=live)
, an estimator for the “information”
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm062.gif?pub-status=live)
, and a sequence of scaling factors Dn, so that the following properties hold true.
For local alternatives
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm024.gif?pub-status=live)
the distributions of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm063.gif?pub-status=live)
, where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm064.gif?pub-status=live)
, converge in distribution to a (W,A) with the following properties.
- The distribution of A is independent of c.
- The conditional distribution of W given A is normal with expectation cA and variance A.
- The distribution of A satisfies all the assumptions of our main theorem.
Many popular estimators for the cointegrating relationships have these asymptotically mixed normal properties: It is well established that the standard estimators for the cointegrating relationships are normal only after conditioning: cf. Ahn and Reinsel (1990), Phillips and Hansen (1990), Phillips and Ouliaris (1990), Saikkonen (1991), Johansen (1988, 1991), Stock and Watson (1993), and also Hamilton (1994, Sect. 19.3). Testing linear restrictions on these parameters with Wald-type tests (cf. Davidson, 1998) therefore will be covered by our theory, at least for one-dimensional restrictions.
A testing problem satisfying the preceding conditions is called locally asymptotically mixed normal (LAMN) (cf. Le Cam and Yang, 1990; Jeganathan, 1991, 1995). Apart from the economic applications quoted later, this type of model has also received much attention in the statistical literature (cf. the references cited previously).
4. Other cases. Asymptotically mixed normal families do not only occur in connection with unit root tests or cointegration. Other cases of interest in economics are given in Park and Phillips (2001) and Aït-Sahalia (1999).
So let us now assume that we have given a testing problem satisfying the preceding assumptions. We might be tempted to use the generalized t-test φT for our testing problem. This means we reject when the absolute value or, equivalently, the square of
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm065.gif?pub-status=live)
is larger than the critical value. Furthermore, let us assume that this critical value has an absolute value larger than one, i.e.,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm025.gif?pub-status=live)
Let us additionally assume that the cumulative distribution function of the limiting distribution of our test statistic is continuous at U. This is equivalent to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm026.gif?pub-status=live)
Hence the cumulative distribution function of the limiting distribution of our test statistic is continuous in U.
THEOREM 3. Suppose Assumptions A1–A4 and also (25) and (26) are satisfied. Then the tests
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm066.gif?pub-status=live)
defined earlier are inadmissible. There exist tests
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm067.gif?pub-status=live)
depending on Wn and An alone that have a strictly better asymptotic power function than φn. For alternatives (24) we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm027.gif?pub-status=live)
where for at least one c ≠ 0 the left-hand side is smaller than (and not equal to) the right-hand side. Moreover, we also have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm028.gif?pub-status=live)
Remark 3. In many cases, it is not that surprising that the tests φn are inadmissible. As an example take, e.g., one of the preceding models with non-Gaussian error terms ut, vt, when their distribution is known up to a finite-dimensional parameter. Then estimators based on parametric models for ut, vt should be more accurate, and hence the corresponding tests would yield more power. However, it is interesting to observe that there exists a test better than φn depending on (Wn,An) alone! Therefore there exists some way of computing a test statistic with (Wn,An) other than the t-test statistic, which defines a test with a uniformly better asymptotic power function!
Remark 4. Unfortunately, the proof is nonconstructive. I only assert the existence of such a test. The proof does not provide a way to construct this dominating test.
Remark 5. Here we will prove pointwise convergence in (27) and (28). With some technical refinements, the proof can be modified to show that the convergence in (27) is uniform on bounded sets of C.
Proof of Theorem 3. The proof can be found in the Appendix.
APPENDIX
Proof of Theorem 1.
We give an indirect proof. Let us assume that the test φ is from our class and has the same (or worse) power function as ψ.
Let us first deal with the extreme cases. So let us assume that the parameters describing the test φ are zero except for U. Then U must be nonzero (because we did assume that not all of the parameters vanish), and we can conclude that φ must be a trivial test, either accepting for all (A,W) if U is positive or rejecting in every case if U is negative. Hence we can easily see that—as ψ must have the same or better power function as φ—ψ must be trivial, too.
PROPOSITION 1. Let us assume that the parameter a for our test φ is nonzero. Then the test ψ = 1 on the set {(A,W) : A < a} Q0 almost surely.
For the proof of our proposition the following lemma is helpful.
LEMMA 1. There exist κ1,C1 > 0, so that the test φ rejects on the set
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm068.gif?pub-status=live)
or there exist κ2,C2 < 0, so that the test φ rejects on the set
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm069.gif?pub-status=live)
Let us now prove the lemma. First, let us look at the trivial cases. If μ + ν = 0, the lemma is trivial. Let us now assume that μ + ν is nontrivial. Then, at least one of the sets {λ : λ > 0} and {λ : λ < 0} is not a μ + ν null set. Let us assume without limitation in generality that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm070.gif?pub-status=live)
If this is true, then there exist α,β > 0 such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm029.gif?pub-status=live)
Consequently using (13), we may write our criterion function u as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-11217-mediumThumb-S0266466608080031frm030.jpg?pub-status=live)
With
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm031.gif?pub-status=live)
So let us define
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm071.gif?pub-status=live)
and let us define C1 later on. We will, however, without limitation of generality, assume that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm072.gif?pub-status=live)
We now want to show that our test rejects if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm032.gif?pub-status=live)
To do so, we have to show that for all values of (W,A) satisfying (A.4) u(A,W) becomes bigger than the critical value. Hence, we have to find lower bounds for all the terms (A.2). Subsequently we will show that all terms—except the last one—are bounded from below by polynomials in W + A, at least for the values of (W,A) that are of interest to us.
Because (A.4) implies that W > 0 and W > A we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm073.gif?pub-status=live)
and, as c′ ≥ 0,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm074.gif?pub-status=live)
Now observe that for W − κ1 A > 1 for z with z < α
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm075.gif?pub-status=live)
and, because of our definition of κ1,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm076.gif?pub-status=live)
To analyze the first integral in (A.2) let us distinguish two cases: λ < 0 and λ ≥ 0. For the first case, observe that for z < 0,W > 0
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm077.gif?pub-status=live)
Hence, for −α ≤ λ < 0
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-20425-mediumThumb-S0266466608080031ffm078.jpg?pub-status=live)
On the other hand, for W > 0 and λ > 0
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm079.gif?pub-status=live)
because the expression inside the integral is nonnegative and all the bounds of the integrals are nonnegative. Because the integrand in the last term on the right-hand side of (A.2) is nonnegative, we have for W > 0
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm080.gif?pub-status=live)
where p(.) is a polynomial (of third order) representing our estimates for the first three terms of (A.2). Let us fix the constant C1 > 0 later on. For any nonnegative C1, if W > κ1 A + C1, then for λ ∈ (2α,β), W − (λ/2)A ≥ (β/2)A, and hence W + A = W − (λ/2)A + (1 + λ/2)A ≤ (W − (λ/2)A)(1 + (2/β)(1 + λ/2)). Hence, for all W > κ1 A + C1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm081.gif?pub-status=live)
with γ = 1/(1 + (2/β)(1 + κ1 /2)) and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm082.gif?pub-status=live)
If W > κ1 A + C1, then obviously W + A > C1. Because an exponential function grows faster than any polynomial, we can find a C1 such that for all W + A > C1 the right-hand side of the preceding inequality is strictly bigger than U—our critical value. Hence, our test φ will reject.
Now we can proceed to prove Proposition 1. Let us assume that the proposition does not hold true and ψ = 0 on the set B ⊂ {(A,W) : A < a} with Q0(B) > 0. Because with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm083.gif?pub-status=live)
Bn ↑ B and consequently Q0(Bn) ↑ Q0(B) > 0, we can find an n so that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm084.gif?pub-status=live)
Now let us consider the power functions of our tests φ, ψ for probability measures Qλ, where we choose λ depending on the behavior of the test φ. Lemma 1 guarantees that the test φ = 1 on the set {(A,W) : W > κ1 A + C1} (in this case choose λ → ∞) or φ = 1 on the set {(A,W) : W < κ2 A + C2} (in this case choose λ → −∞). We will discuss the first case only, because the second one is perfectly analogous. As the power function of ψ is not worse than the one of φ for all values of the alternative, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm033.gif?pub-status=live)
The fact that φ = 1 on Bn implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm034.gif?pub-status=live)
We can easily see that for λ → ∞
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm085.gif?pub-status=live)
and hence
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm086.gif?pub-status=live)
On the other hand
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-74235-mediumThumb-S0266466608080031ffm087.jpg?pub-status=live)
Now we can easily see that exp(−(λ2/2)(1/2n) + C1λ + λκ1 a + (κ12/2)) → 0 for λ → ∞. Because exp(−(κ12/2) + λ(κ1(A − a) − (λ2/2)(A − a))) ≤ 1, one can conclude that the preceding estimates guarantee that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm088.gif?pub-status=live)
Therefore (A.6) implies that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm089.gif?pub-status=live)
which contradicts (A.5). Hence our assumption that there exists a nontrivial set B must be wrong, which is just the proposition we wanted to prove.
Now that our main tools are established, we can proceed with the proof of the theorem. We will show that φ = ψ. So let us start with an extreme case, namely, that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm090.gif?pub-status=live)
Then the proposition we just proved shows that ψ rejects on (A < a), too. Hence, we may conclude that φ ≤ ψ. Moreover, the power function of ψ is better than the power function of φ on the null; hence ∫(ψ − φ) dQ0 ≤ 0. Therefore we can conclude that ψ = φQ0 almost surely.
Next we investigate the general case, namely, where at least one of the parameters b,c,μ,ν is nontrivial. For this purpose we introduce the following notation. For each test ρ let
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm091.gif?pub-status=live)
and define the functional Ln by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm092.gif?pub-status=live)
It is easily seen that our assumptions (14) and (15) guarantee that all power functions of tests are three times differentiable. Hence our functionals are well defined. The first integral on the right-hand side of the preceding definition is finite because of the definition of the integrand (o(λ2)) and (7). Moreover, because our functionals only depend on the power functions, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm093.gif?pub-status=live)
Moreover, we can easily see that with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-74310-mediumThumb-S0266466608080031ffm094.jpg?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170216052738-13684-mediumThumb-S0266466608080031ffm095.jpg?pub-status=live)
then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm096.gif?pub-status=live)
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm097.gif?pub-status=live)
Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm035.gif?pub-status=live)
We will now analyze the behavior for n → ∞.
Let us first analyze ∫[u<U](φ − ψ)(un − U) dQ0 = ∫[u<U] ψ(U − un) dQ0. We can apply the dominated convergence theorem (U − un is monotonically falling and on the event [u < U] between 0 and U). We therefore have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm036.gif?pub-status=live)
For the second term, observe that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm037.gif?pub-status=live)
with
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm098.gif?pub-status=live)
and c(A,W) does not depend on n. Moreover, assumptions (14) and (15) for the integral in conjunction with Remark 2 guarantee the absolute integrability of c with respect to Q0. For the second term on the right-hand side of (A.9) we may use the monotone convergence theorem and conclude that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm099.gif?pub-status=live)
converges to
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm100.gif?pub-status=live)
(or ∞, if the integral is not finite). Hence, we may conclude that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm101.gif?pub-status=live)
where again the integral on the right-hand side may be infinite (as the integrand is nonnegative, this is not a serious problem). Because in (A.8) the limit is finite, we may add (A.8) to the preceding equation and conclude that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm102.gif?pub-status=live)
However equation (A.7) guarantees that all the terms of the preceding sequence are zero; hence the limit has to be zero, too. Therefore
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm103.gif?pub-status=live)
which implies that ψ = 0 on [u < U] and ψ = 1 on [u > U]. So ψ and φ can differ only on the set [u = U], which was just the result we wanted to establish.
Proof of Theorem 3.
Assumption A3 states that the distributions of (WT,AT) under PδT converge weakly to Qc. Hence,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm104.gif?pub-status=live)
because we did assume that the measure of the boundary of the set {(W,A) : W2/A > U} has Q0-measure, and hence Qc-measure 0. Hence the limiting power function of the test φT is the power function of the test φ = I {(W,A) : W2/A > U} with respect to the measures Qc. Ploberger (2004) establishes that this test is not within the complete class described in the previous section, and hence our main theorem establishes that this test cannot be admissible: Hence there must exist a test ψ = ψ(A,W), such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm038.gif?pub-status=live)
and for at least one c the left-hand side in (A.10) is strictly smaller than the right-hand side. To prove our theorem, we have to establish the existence of tests ψn = ψn(An,Wn), such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm105.gif?pub-status=live)
for all c. The test ψ is a measurable function of R2 to the interval [0,1]. Hence, Lusin's theorem (cf. Rudin, 1974, p. 56) guarantees the existence of continuous functions φn, such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm106.gif?pub-status=live)
Because 0 ≤ ψ ≤ 1, without limitation of generality we can assume that 0 ≤ φn ≤ 1, too (otherwise replace φn by max(0,min(φn,1))). Hence, we can interpret φn(An,Wn) as tests. By applying Chebyshev's inequality, we can easily see that for every ε > 0 and all M there exists a K = K(ε), such that sup|c≤M Qc([|W| > K]) < ε. Because for |W| ≤ K,|c| ≤ M exp(cW − c2A/2) ≤ exp(KM), we have for arbitrary ε > 0 and arbitrary M
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm107.gif?pub-status=live)
and therefore (as |∫ψ dQc − ∫φn dQc| ≤ Qc([ψ ≠ φ]))
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031frm039.gif?pub-status=live)
On the other hand, φn are continuous functions. So our fourth assumption and Prohorov's theorem guarantee the relative compactness of the set of distributions of the (Wn, An) under Pn(δn), where δn = Dn−1 c + δ0 and |c| ≤ M. Moreover, we did assume that the distribution of (Wn, An) under Pn(δn) converges (for all c) to the distributions to the Qc; hence it is easy to see that the set containing the
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm108.gif?pub-status=live)
and the Qc, both for |c| ≤ M, is compact. Therefore, it can easily be seen that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm109.gif?pub-status=live)
This limiting relation and (A.11) together imply that there exist sequences Mn ↑ ∞, mn → ∞ such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170216044546821-0776:S0266466608080031:S0266466608080031ffm110.gif?pub-status=live)
which is just the result we wanted to show.