ON TESTING FOR SERIAL CORRELATION WITH A WAVELET-BASED SPECTRAL DENSITY ESTIMATOR IN MULTIVARIATE TIME SERIES

Pierre Duchesne

doi:10.1017/S0266466606060312

ON TESTING FOR SERIAL CORRELATION WITH A WAVELET-BASED SPECTRAL DENSITY ESTIMATOR IN MULTIVARIATE TIME SERIES

Published online by Cambridge University Press: 23 May 2006

Pierre Duchesne

Show author details

Pierre Duchesne: Affiliation:
Université de Montréal

Article contents

Abstract
1. INTRODUCTION
2. PRELIMINARIES
3. THE TEST STATISTIC AND ITS ASYMPTOTIC NULL DISTRIBUTION
4. ADAPTIVE CHOICE OF THE FINEST SCALE
5. SIMULATION RESULTS
6. CONCLUSION
APPENDIX
References

Rights & Permissions

Abstract

A new one-sided test for serial correlation in multivariate time series models is proposed. The test is based on a comparison between a multivariate spectral density estimator and the spectral density under the null hypothesis of no serial correlation. Duchesne and Roy (2004, Journal of Multivariate Analysis 89, 148–180) considered a multivariate kernel-based spectral density estimator. However, when the spectral density exhibits irregular features (because of strong autocorrelation or seasonality, among other factors), it is expected that a multivariate wavelet-based spectral density estimator will capture more effectively the local behavior of the spectral density. We consider a test based on a wavelet spectral density estimator, which represents a generalization of a test proposed by Lee and Hong (2001, Econometric Theory 17, 386–423). The asymptotic distribution of the new test is established under the null hypothesis, which is N(0,1). We propose and justify a suitable data-driven method to choose the smoothing parameter of the wavelet estimator (called the finest scale in that context). The new test should be powerful when the spectral density contains peaks or bumps. This is confirmed in a simulation study, where kernel-based and wavelet-based estimators are compared.The author thanks the co-editor Pentti Saikkonen and two referees for their constructive remarks and suggestions. Many thoughtful comments of the referees led to significant improvements of the paper. This work was supported by grants from the National Science and Engineering Research Council of Canada and the Fonds québécois de la recherche sur la nature et les technologies du Québec (Canada).

Type: Research Article
Information: Econometric Theory , Volume 22 , Issue 4 , August 2006 , pp. 633 - 676

DOI: https://doi.org/10.1017/S0266466606060312 [Opens in a new window]
Copyright: © 2006 Cambridge University Press

1. INTRODUCTION

In this paper, we consider tests of serial correlation designed for vector time series models. In univariate time series, many tests for serial correlation have been proposed (e.g., Durbin and Watson, 1950, 1951; Box and Pierce, 1970; Ljung and Box, 1978; Robinson, 1991a; Hong, 1996; Andrews and Ploberger, 1996; Paparoditis, 2000a, 2000b; Lee and Hong, 2001). Testing for serial correlation in multivariate time series appears equally important, because real applications often involve handling several variables, in addition to their potentially complex interactions. Statistical inference and testing procedures in a multivariate setting are discussed in Chitturi (1974, 1976), Hosking (1980, 1981a, 1981b), Li and McLeod (1981), Lütkepohl (1993), Duchesne and Roy (2004), and Paparoditis (2005), among others.

The approach in Duchesne and Roy (2004) relies on a normalized version of a quadratic distance between two multivariate spectral densities. Using a kernel-based spectral density estimator for a d-dimensional process x, the resulting test statistic provides a generalization of the so-called multivariate portmanteau statistic considered in Hosking (1980). See also Chitturi (1974) and Li and McLeod (1981). The test statistic of Duchesne and Roy (2004) is based on a kernel function. The truncated uniform or rectangular kernel delivers a normalized version of Hosking's test. Also, when d = 1, the test reduces to the Hong (1996) test for serial correlation.

The spectral density may exhibit various forms of irregularities for many time series encountered in practice. For example, strong serial dependence or seasonality may produce peaks or bumps in the spectral density at seasonal frequencies. It is well known that, in univariate time series, kernel estimators tend to underestimate the mode of the spectral density (e.g., Priestley, 1981; Lee and Hong, 2001). To circumvent that problem, Lee and Hong (2001) considered wavelet-based spectral density estimation for univariate time series. The first objective of this paper is to extend the Lee and Hong (2001) approach to vector time series models. First, we give the wavelet representation of the multivariate spectral density. Then, we estimate that representation directly. The test is constructed using a normalized metric similar to the one studied in Duchesne and Roy (2004). However, the comparison is made between the multivariate wavelet-based spectral density estimator and the spectral density under the null hypothesis of noncorrelation. With the multivariate wavelet-based estimator of the spectral density, we expect to obtain a more accurate estimation of the spectral density characterized by irregular features, such as peaks, kinks, or spikes, because in such circumstances this wavelet-based estimator offers more powerful procedures than do kernel-based methods. However, if the spectral density displays regular behavior, the kernel-based procedure is expected to perform well. The new wavelet-based test should be an interesting complement to the test procedure of Duchesne and Roy (2004).

A fixed lag order p, say, must be determined a priori to compute the Box–Pierce/Hosking test statistics. This is true for many popular portmanteau procedures. For example, in vector autoregressive (VAR) models, the Hosking (1980) test statistic relies on the first p autocorrelation matrices. In real applications, a frequent procedure with Box–Pierce/Hosking type test statistics consists of performing individual tests at the 5% nominal level with many values of p: for example, computing the test statistic for each p = 6,12,18. Statistically, this procedure is not recommended, because computing several test statistics at the 5% level does not give a statistical procedure with an overall nominal level of 5%. Obviously, Bonferroni's procedure or similar adjustments could be performed, but because the individual tests based on different p's are not usually independent, only an upper bound for the overall level is obtained (the exact level being unknown). The Duchesne and Roy (2004) procedure relies on a kernel-based spectral density estimator, and it permits a data-dependent choice of the smoothing parameter. Consequently, as a second objective of the present paper, we propose and justify a suitable data-driven method for choosing the finest scale of the new wavelet-based estimator; the finest scale represents the smoothing parameter in the wavelet estimation context. As in kernel-based spectral density estimation, it seems highly desirable, in practical applications, to have an objective choice of the finest scale. Lee and Hong (2001) considered an algorithm drawn from Walter (1994) but without justification. In particular, the choice of the finest scale was not taken into account in the asymptotic analysis. In this paper, we state the precise conditions under which the asymptotic distribution of the new multivariate test will remain unaffected by a data-dependent method. Furthermore, we establish the integrated mean squared error (MSE) of the wavelet spectral density estimator, and finally we discuss a particular implementation: a plug-in method of the parametric type. Because our test statistic, say,

, where

is the data-dependent finest scale estimator, possesses a rigorously established asymptotic distribution, this allows us to perform hypothesis testing at a chosen nominal level.

The organization of the paper is as follows. In Section 2, some preliminaries are given. The wavelet-based test statistic is introduced in Section 3. It is shown to admit a standard normal distribution asymptotically, under the null hypothesis of noncorrelation. In Section 4, general results on data-dependent methods are presented. The integrated MSE of the spectral density estimator is established for the case when the process admits a general dependence structure. As an application of this result, we propose and justify a particular data-driven method using the (parametric) plug-in technique. In Section 5, we present the results of a small Monte Carlo experiment conducted to study the exact level and power of the tests for finite samples. We compare the power of the kernel-based and wavelet-based estimators using data-dependent techniques for the choice of the smoothing parameter and the choice of the finest scale. We conclude in Section 6 with some remarks, and the Appendix contains the proofs of our results.

2. PRELIMINARIES

Let

be a multivariate second-order stationary process of dimension d. We assume that x is of mean μ, with a regular covariance matrix E [(x_t − μ)(x_t − μ)^{[top ]}] = Σ.

2.1. Temporal and Spectral Notations

The autocovariance at lag j will be denoted by

where

. If we write Γ_x(j) = [Γ_x,pq(j)]_p,q=1^d, the multivariate spectral density f(ω) of x is defined by

A sufficient condition for f(ω) to be a multivariate spectral density is the absolute summability of the cross- and autocovariances (see, e.g., Fuller, 1996, pp. 169–170):

In general, f(·) is complex valued. We denote (·) the complex conjugate of f(·) and f*(·) the transposed complex conjugate of f(·), that is, f*(·) = ^{[top ]}(·) (for a matrix A, A* will denote the transposed conjugate of A, i.e., A* = ^{[top ]}). It is well known that f is a Hermitian matrix, meaning that f*(ω) = f(ω).

Whenever the existence of fourth-order moments is required, we shall suppose that process x is fourth-order stationary, and the fourth-order moments and cumulants will be denoted, respectively, by

where

. Imposing the existence of the fourth-order moments is equivalent to the existence of the fourth-order cumulants. If process x is Gaussian, it is well known that the fourth-order cumulants will vanish.

2.2. Wavelet Analysis of the Spectral Density

Wavelet analysis is a mathematical technique generalizing the Fourier analysis; it can be used to decompose a function g(·) in the

space. Using the multiresolution analysis first developed by Mallat (1989) and Meyer (1992), any square-integrable function g admits the following decomposition:

where

corresponds to a cutoff point resolution level and the wavelet coefficients are given by

The function ψ(·) is called the mother wavelet, and the function φ(·) represents the father wavelet. Given {φ(·),ψ(·)}, we obtain {φ_jk(·),ψ_jk(·)} using the relations

where

. The system {φ_jk(·),ψ_jk(·)} constitutes a complete orthonormal basis for the

space. The Fourier transforms of the functions ψ(·) and φ(·) are defined by

where i denotes the complex number

(note that following the traditional wavelet literature,

is the Fourier transform of ψ(·), not an estimator; because ψ(·) is not a random quantity, this notation should not be cumbersome). The mother and father wavelets satisfy the following assumption.

Assumption A. The mother wavelet

is orthonormal such that

. The father wavelet

satisfies

; it is chosen such that the modula of its Fourier transform

is continuous on

, or

if |z| > π.

The main motivation for the wavelet appellation is explicated by the property

. The function ψ is well localized, and by appropriate scaling such localization can be made arbitrarily fine. The integer j in φ_jk(·) or ψ_jk(·) is called the scale parameter (or a resolution level), and the integer k represents a translation parameter. Priestley (1996) gives useful comparisons between Fourier and wavelet analysis; he provides a heuristic interpretation of j as a frequency parameter and k as related to time.

By Assumption A, the Fourier transforms of the functions ψ(·) and φ(·) exist. Note that, according to Proposition 2.17 of Hernández and Weiss (1996), Assumption A implies that

We assume the following smoothness condition on

Assumption B. (i) The mother wavelet satisfies

, for some

, and C ∈ (0,∞). (ii)

, where b(·) is a real-valued function.

In Assumption B(i), we require that

possess some regularity at z = 0 and sufficiently fast decay when z → ∞. Examples of wavelets that satisfy Assumption B(ii) are the Franklin and Meyer wavelets. In fact, most commonly used wavelets satisfy this assumption. The Franklin wavelet ψ_F(·) is defined via its Fourier transform and is given by

Franklin wavelets are of unbounded support and correspond to infinite impulse response filters. For fundamental notions of signal processing, see Vidakovic (1999). The Meyer wavelet is another example of a wavelet satisfying Assumption B, denoted as ψ_M(·); it has compact support in the frequency domain and fast decay in the time domain. The Fourier transform of the Meyer wavelet is given by

The function ν(·) satisfies ν(x) = 0, x ≤ 0 and ν(x) = 1, x ≥ 1 with the additional property ν(x) + ν(1 − x) = 1 for x ∈ [0,1]. See Daubechies (1992, pp. 116–117) for additional details on the Meyer (1986) construction. Meyer's ν-polynomials are given in Vidakovic (1999, p. 66), up to order five. Lee and Hong (2001) adopted a second-order ν-polynomial, which is defined by ν(x) = x²(3 − 2x), x ∈ [0,1]. Hernández and Weiss (1996, Sect. 2.2) give additional examples of wavelets satisfying Assumption B(ii). Vidakovic (1999), among others, also gives other examples of wavelet functions.

We shall now discuss the wavelet representation of the multivariate spectral density f given in (1). Because f is a 2π-periodic function, it is not square integrable. However, a 2π-periodic basis {Φ_jk(·),Ψ_jk(·)} can be constructed, using the so-called periodization technique:

which produces an orthonormal basis for the L₂ [−π,π] space. Note that using the definition of the Fourier transform,

and by the periodization technique and a change of variable,

Similar relations hold for

. For more details, see Lee and Hong (2001, p. 392).

Let the cutoff point j₀ in (2) equal zero. Assumption A and relations (3), (4), (7), and (8) imply that Φ₀₁(ω) = (2π)^−1/2 and β₀₁ = (2π)^−1/2Γ_x(0). Consequently, f can be written as

The quantity A_jk is called the wavelet coefficient; it corresponds to a d × d matrix. It is given by

where

is the Fourier transform of

denotes the complex conjugate of

Using classical Fourier analysis, the Fourier coefficients reduce to the autocovariances. It is well known that they depend on the global behavior of the spectral density between [−π,π]. See Brockwell and Davis (1991, Sect. 11.6), among others. However, the wavelet coefficient A_jk depends on the local behavior of the spectral density, because Ψ_jk(ω) reflects the local behavior of f in an interval of width 2π/2^j centered at k/2^j. This explains why wavelet analysis represents an appropriate tool when the spectral density admits irregular features; the wavelet coefficients should describe more adequately these local characteristics. Naturally, the spectral density relies on the theoretical autocovariance function. In the next section, we discuss estimation of representation (9) in finite samples.

2.3. Wavelet-Based Spectral Density Estimator

Given that x₁,…,x_n is a realization of length n of process x, the sample autocovariance at lag j, 0 ≤ | j| ≤ n − 1, is defined by

where

represents the sample average. A natural estimator for the wavelet coefficient A_jk is given by

called the empirical wavelet coefficient. The second equality is obtained using the fact that

We consider the following wavelet-based spectral density estimator:

where J is the finest scale. Estimator (11) has been considered in Hong (2001) in the context of wavelet-based estimation of heteroskedasticity and of autocorrelation consistent covariance matrices; he has shown that

produces a consistent estimator of f(0). The parameter J ≡ J_n is nonstochastic. It plays a role similar to that of the smoothing parameter in kernel-based estimation. Intuitively, if J is large, in view of (9) and (11), the bias should be small. However, because many

will be included in the estimator, the variance should be higher. Inversely, a small J should provide a less variable estimator, at the price of a larger bias.

The advocated spectral density estimator (11) appears to be linear. We do not consider threshold shrinkage as in Neumann (1996), among others. In fact, Neumann (1996) shows that a nonlinear thresholding wavelet estimator can attain a near optimal convergence rate for any spectral density in the balls of the so-called Besov space. Recall that, given an interval I, the Besov space B_p,q^m represents a set of functions f in L_p(I) that have smoothness m. Neumann (1996) shows that linear wavelet estimators can attain the optimal rate of convergence if p ≥ 2. In our typical applications, the spectral density is square integrable on [−π,π] with first-order continuous derivatives; therefore, it corresponds to a continuous function. The kind of irregularities that we may encounter in our applications will be due principally to cycles, bumps, and periodicities in the spectral density. See Wei (1990, pp. 248–249) for visual illustrations. Although from an analytic point of view, this provides a heuristic argument in favor of the linear wavelet estimator, considering estimator (9) allows us to develop an asymptotic theory. In the next section, a test statistic based on a suitable quadratic norm is studied; it offers an elegant formulation in a multivariate framework and a well-established asymptotic distribution under the null hypothesis. Furthermore, in Section 4 we can derive the integrated MSE over the frequencies [−π,π] of the estimator

. From that latter result, a suitable J can be found to balance the squared bias and the variance according to the integrated MSE criterion.

3. THE TEST STATISTIC AND ITS ASYMPTOTIC NULL DISTRIBUTION

We now introduce the new test statistic. The hypothesis of interest states that x corresponds to a white noise against the alternative of serial correlation of unknown form. More precisely, the null and alternative hypotheses can be written as

We can formulate H₀ and H₁ with the spectral density f(ω) of x. The hypothesis H₀ becomes f(ω) = f₀(ω) = (2π)⁻¹Γ_x(0), ∀ω ∈ [−π,π]. Under the null hypothesis, the spectral density reduces to the constant matrix (2π)⁻¹Γ_x(0). Under the alternative hypothesis, f is not identically equal to f₀.

To obtain our test statistic, we first define a distance measure. We consider the following normalized quadratic distance:

The distance measure Q²(f₁;f₂) can be interpreted as an integrated measure over the frequencies [−π,π]. See Duchesne and Roy (2004) for more details. In the following proposition, we compare the true spectral density f of x with the spectral density f₀ of x under the null hypothesis.

PROPOSITION 1. Let Q²(f;f₀) be the distance measure given by (12), where f corresponds to the wavelet representation of the spectral density given by (9) and f₀ = (2π)⁻¹Γ_x(0). Then, we have

Proof. First note that

Define δ_jk = 1 if j = k, and δ_jk = 0 elsewhere. Using the orthonormality relation

we obtain the desired result. █

When the traditional representation of the spectral density given by (1) is used, Duchesne and Roy (2004) have shown that

We can interpret the distance measure given by (14) as based on an infinite number of lags, whereas the distance measure (13) using the wavelet representation of the spectral density depends on an infinite number of resolution levels and, consequently, an infinite degree of approximation.

Our test statistic will be defined as a distance measure between

. By direct calculation, it can be shown that

using

and because C_x(0) − Γ_x(0) = O_P(n^−1/2).

Because the covariance matrix Γ_x(0) is usually unknown, we replace that quantity by the sample covariance matrix C_x(0). It may be shown that

The new test statistic proposed is essentially interpreted as a standardized version of

. It is defined by

The quantities d²(2^J+1 − 1) and 4d²(2^J+1 − 1) denote essentially the mean and variance of

. Using the previous approximations, it is easily shown that

The test W_n is motivated by the distance measure Q²(f;f₀), which satisfies Q²(f;f₀) ≥ 0, with Q²(f;f₀) = 0 if and only if f ≡ f₀. Consequently, (15) rejects the null hypothesis for large values of W_n, and the test statistic is one sided.

To establish the asymptotic distribution of W_n under the null hypothesis, we need the following assumption.

Assumption C. Let

be a strong white noise that is an independent and identically distributed (i.i.d.) time series of dimension d, whose mean is E(x_t) = μ and with a regular covariance matrix Σ. We assume that the fourth moments of x_t are finite.

A random sample x₁,x₂,…,x_n of sample size n is obtained. We now state the main result of this section. The symbol →_d stands for convergence in distribution.

THEOREM 1. Under Assumptions A, B, and C, J → ∞, 2^2J/n → 0, the test statistic W_n defined by (15) follows a normal distribution asymptotically, that is, W_n →_d N(0,1).

Note that to obtain Theorem 1, we do not assume that process x is Gaussian. The test statistic W_n can be used to test if x₁,…,x_n are independent when process x is Gaussian. In general, the test statistic W_n allows us to check for the hypothesis H₀ of no serial correlation with non-Gaussian observations. Because W_n is a one-sided test statistic, it rejects the null hypothesis at level α if W_n > z_1−α, where z_1−α denotes the quantile of order 1 − α of the standard normal distribution.

Our test statistic reduces to the Lee and Hong (2001) test when d = 1, because

become scalar quantities and the tr(·) of a scalar reduces to that scalar. Lee and Hong established the consistency of their test, and an asymptotic power analysis shows that there is no optimal choice of the wavelet function, because the asymptotic power does not depend on ψ(·). This contrasts strongly with the multivariate procedure of Duchesne and Roy (2004), because a local power analysis makes it possible to derive the asymptotic power, which is a function of the kernel; the maximization of that power in a certain class of kernel functions leads to an optimal choice of

: the Daniell kernel.

For the validity of the asymptotic normal distribution in Theorem 1, the finest scale needs to go to infinity at a slower rate than n. In some sense, that parameter plays the same role as does the bandwidth in kernel-based spectral density estimation. However, the interpretation is fundamentally different. When testing for serial correlation using the kernel-based spectral density estimator, if the kernel possesses a compact support, then the bandwidth admits an easy interpretation; it corresponds to a lag order. For example, for the multivariate serial correlation test of Duchesne and Roy (2004) based on the uniform truncated kernel, the bandwidth determines the number of lags to be considered in the test statistic. Here, the interpretation of J corresponds to the finest resolution level, and this parameter determines the degree of approximation of the spectral density to be estimated; this is not a lag order. In view of this, the selection of J represents an important practical consideration (note that for practical reasons, the same can be said for the choice of the bandwidth in kernel-based estimation; see Robinson, 1991b, p. 1332). Lee and Hong (2001) have considered the data-driven method of Walter (1994) for choosing the finest scale. However, it seems unknown whether the asymptotic distribution of their test statistic remains unchanged when specifying J with Walter's procedure, because, to our knowledge, no formal result on the rate of Walter's

has been derived in the literature. In the next section, we discuss the adaptive selection of J using the plug-in method. We show, in particular, that with our data-driven method, the asymptotic distribution of our test statistic, say,

, for a stochastic

obtained with the plug-in method, is still normal under the null hypothesis. This makes our procedure fully operational in practice.

4. ADAPTIVE CHOICE OF THE FINEST SCALE

4.1. A General Result for the Estimation of the Finest Scale

As in a kernel-based spectral density estimation, the wavelet-based spectral density estimator given by (11) relies on a “smoothing parameter,” the finest scale J. The choice of the finest scale may affect the power of the test statistic. However, it seems difficult to select an optimal J to maximize the power. Furthermore, because J does not correspond to a lag order, it may be hard to obtain specific knowledge of the alternative hypothesis. Therefore, it appears highly desirable to choose J via suitable automatic methods. These methods will reveal some valuable information on the unknown alternatives. Let W_n ≡ W_n(J). Theorem 2, which follows, does not concentrate on a particular data-driven method. It gives appropriate conditions on the data-driven J, denoted

, under which

under the null hypothesis. In particular, a standard application of Slutsky's theorem allows us to conclude that

under H₀, for

satisfying certain conditions.

THEOREM 2. Let

be a data-dependent J such that

Under Assumptions A, B, and C, J → ∞, 2^2J/n → 0, the test statistic W_n ≡ W_n(J) defined by (15) is such that

Intuitively, because J → ∞, an interpretation of the condition (17) stipulates that

as n → ∞, with the specified convergence rate. According to Theorem 2, if

converges to J at a suitable rate, then the estimation of J will have no impact on the asymptotic distribution of the test statistic. Condition (17) in Theorem 2 does not impose overly severe restrictions. For example, if 2^J ∝ n^2ν, ν > 0, then the condition becomes

. Theorem 2 will be of interest if we can find a data-driven method satisfying the condition

. Before finding such data-dependent

, we establish, in the next section, the integrated MSE of (11).

4.2. Derivation of the Integrated Mean Squared Error of the Wavelet-Based Spectral Density Estimator

In this section, we derive the asymptotic integrated MSE of

over the entire frequency domain [−π,π]. Although this result is of interest in its own right, it will also suggest, in Section 4.3, a suitable method for using the data to select the finest scale automatically. We consider the following definition for the MSE:

When ω = 0, the MSE defined in (18) is similar to the MSE considered in Andrews (1991) and Hong (2001). The integrated MSE is then obtained as

The matrix

corresponds to a d² × d² nonstochastic positive semidefinite (psd) weight matrix. We need the following assumption.

Assumption D. Assume that the dependence structure of process x is such that [sum ]_j∥Γ_x(j)∥² ≤ C and suppose that the following cumulant condition is satisfied:

where p,q,r,s ∈ {1,…, d}. We suppose that there exists φ ∈ (0,1) such that ∥Γ_x(h)∥ ≤ Cφ^|h| and we assume that [sum ]_j| j|^q∥Γ_x(j)∥ ≤ C, where q ∈ [1,∞) is as in Assumption B.

The cumulant condition of Assumption D allows for autocorrelation. Fourth-order stationary linear processes with absolutely summable coefficients and innovations whose fourth moments exist satisfy the cumulant condition. See Hannan (1970, p. 211). Finally, when process x is Gaussian, the fourth-order cumulants are zero; the cumulant condition is also satisfied in that latter case.

Define the generalized derivative of f(ω),

Note that when q is an even integer, f^(q)(ω) = (−1)^q/2 d^qf(ω)/dω^q. Put

The constant λ_q characterizes the smoothness of

at zero. Because the Fourier transform of the Meyer wavelet is flat in a neighborhood of 0, λ_q ≡ 0, q > 0. For the Franklin wavelet defined in (5), we have q = 2 and λ₂ = π⁴/45. As will be shown subsequently, the integrated MSE contains two conflicting factors, a factor corresponding to the asymptotic variance and another for the squared bias. The asymptotic bias is a function of the generalized derivative and of the factor λ_q. These quantities are related to the smoothness of the spectral density and to the smoothness of

at z = 0. We now state the theorem in which the integrated MSE is derived.

THEOREM 3. Suppose that Assumptions A, B, and D hold, λ_q ∈ (0,∞), J → ∞, 2^J/n → 0 as n → ∞. Then

Theorem 3 gives the integrated MSE over [−π,π] in the vector case of the wavelet spectral density estimator. Other related results include the MSE at a given frequency ω of a kernel-based spectral density estimator, derived in Parzen (1957) in the scalar case and in Hannan (1970, pp. 280, 283) and Andrews (1991, p. 825) in the multivariate case. The integrated MSE of a scalar kernel-based spectral density estimator is given in Robinson (1991b, pp. 1343–1344). Finally, Hong (2001) gives the MSE of the multivariate wavelet spectral density estimator at the frequency ω = 0. Then, Theorem 3 represents an extension of the pointwise result of Hong (2001).

To have power under a large class of alternatives, it seems intuitively preferable to construct test statistics based on a measure relying on all frequencies ω ∈ [−π,π]. Consequently, in testing for serial correlation, the test procedure W_n(J) is based on a distance measure that exploits the wavelet spectral density estimator over the entire frequency domain [−π,π] and not only on a single frequency. Similarly, in Theorem 3, it seems preferable to derive the formula for the asymptotic integrated MSE, because it will reveal valuable information about the global behavior of the spectral density, not only the local behavior at a single frequency such as ω = 0. For example, for seasonal time series, the spectral density is significant at the typical seasonal frequencies, which are nonzero. The integrated MSE in Theorem 3, which relies on all frequencies, should reveal more information on seasonal behavior than the MSE based solely on the single zero frequency.

4.3. On the Estimation of the Finest Scale with a Plug-In Method of the Parametric Type

In this section, we discuss applying Theorem 3 to select the finest scale according to the integrated MSE criterion. The approach advocated here is basically the same as the one Andrews (1991) proposed in a different context. To obtain the optimal J, it suffices to minimize the asymptotic integrated MSE of

. Computing the derivative of the asymptotic integrated MSE with respect to J and setting the derivative equal to zero, this yields 2^J₀+1 = [2qλ_qα₀(q)n]^1/(2q+1), where

Note that we could have α₀(q) = 0 (e.g., under the null hypothesis). However, to satisfy the conditions of Theorem 2, J must be such that J → ∞. This caveat can be easily fixed if we introduce the following slight modification:

where l is an arbitrary fixed (with respect to n) lower bound. The introduction of (21) is similar to the modification of the cross-validation method considered in Hong and Shehadeh (1999, pp. 94–95). Then, we define

This J₀′ is nevertheless unfeasible because α₀^(l)(q) is unknown under the alternative hypothesis. However, we can produce some sort of estimator, say,

for α₀^(l)(q). That is, an estimator is “plugged in” in place of the unknown quantity. This gives a data-driven finest scale

satisfying

, which can be alternatively written as

where c = (2qλ_q l)^1/(2q+1). The following corollary states that the proposed plug-in method satisfies the hypotheses of Theorem 2.

COROLLARY 1. Suppose that Assumptions A, B, C, and D hold,

is given as in (23),

. Then

In Corollary 1, we require

at a rate faster than n^−1/(2q+1). This ensures that the use of

rather than ϑ(q) has no impact on W_n(J) asymptotically. The condition on

seems rather easy to satisfy. For example, we do not require the probability limit p limit

, where α₀^(l)(q) is as in (21). When and only when ϑ(q) = α₀^(l)(q) will

in (23) converge to the J₀′ in (22).

In summary, the proposed method gives a rule to select the resolution J based on the minimization of an integrated MSE criterion. In a hypothesis testing context, a sensible criterion consists of finding J such that it maximizes the power. The optimal J that maximizes power and the J given in (22) can be expected to differ. However, this is a technically complicated issue in the present context, and it seems likely that a higher order asymptotic analysis is needed. However, our Monte Carlo experiments in Section 5 suggest that (23) delivers reasonably large power for the test W_n in finite samples. It thus seems preferable to have an automatic method that tries to find a J suitably adapted to the shape of the estimated spectral density, rather than an ad hoc procedure.

4.4. Particular Implementation of the Proposed Parametric Plug-In Method

We shall now discuss the practical estimation of the quantity

. In general, expression (20) necessitates integration of a function of the matricial spectral density and of its generalized derivative of order q; in practice, multivariate estimators

should be plugged in. A judicious choice of the matrix

may simplify the procedure considerably. As in Andrews (1991, p. 834), we consider a matrix

that gives weight only to the diagonal elements of f. Consequently, α₀(q) becomes

where f_aa^(q)(ω) and f_aa(ω) denote the ath diagonal elements of f^(q) and f, respectively. A common choice of υ_a is 1 for a = 1,…, d.

Plug-in estimators may be parametric or nonparametric. Parametric plug-in estimators yield a less variable finest scale parameter than nonparametric estimators, but at the price of an asymptotic bias in the estimation of the optimal finest scale parameter. This happens because of the approximate nature of the specified parametric model. However, with our conditions, this bias does not affect the consistency or the rate of convergence of the multivariate wavelet-based spectral density estimator. Consequently, we advocate here a parametric AR(p^(a)) model for {x_t(a)}, a = 1,2,…, d:

where we set the initial values x_t(a) ≡ 0 if t ≤ 0 and var{ε_t(a)} = σ_ε²(a). Suppose that

are appropriate estimators in (24). For concreteness, we consider q = 2 here. We have

where

. We also use the fact that the generalized spectral derivative

. The estimator

incorporates the information of f_aa(·) over [−π,π] rather than at 0 only. This is analogous to the work of Robinson (1991b), who considers data-driven choices of the bandwidth in multivariate kernel-based spectral density estimation over the entire frequency domain [−π,π]. We can use one-dimensional numerical integrations to compute

. Interestingly,

satisfies the conditions of Corollary 1 with q = 2 because, for parametric AR(p) approximations,

(Andrews, 1991). Note that in practice, we can employ the Akaike information criterion/Bayesian information criterion (AIC/BIC) methods to determine p^(a) (for a description of these methods, see Priestley, 1981, p. 372).

To summarize, our procedure for testing multivariate serial correlation can be described as follows: (i) Find

by the autoregression in (24). If a wavelet basis that satisfies q = 2 is adopted (such as a Franklin wavelet with a Fourier transform (5)), simplifications occur, and formula (25) can be used. (ii) Compute the test statistic

in (15). (iii) Reject the null hypothesis of no serial correlation if the test statistic

, where z_1−α is the 1 − α quantile of N(0,1).

5. SIMULATION RESULTS

In the previous sections, we have studied a new class of test statistics that have interesting properties when the spectral density possesses irregular features. However, from a practitioner's point of view, it is natural to inquire about the finite-sample properties of the wavelet-based test statistic, in particular its exact level and power. The power comparison between the kernel-based test statistic and the wavelet-based test statistic seems of particular interest. To partially answer these questions, we conduct a small Monte Carlo experiment. For each multivariate i.i.d. process described subsequently, we examine the empirical frequencies of rejection of the null hypothesis when the latter is true by using tests with two different nominal levels (5 and 10%) for each of two series lengths (n = 100, 200). The kernel-based test statistic described in Duchesne and Roy (2004) is given by

where

are given by

where

is a kernel function and p_n is the smoothing parameter. We choose the Daniell kernel given by

, which displays optimality in a given class of sufficiently smooth kernels. Usually, the exact alternative hypothesis and, therefore, the optimal value of p_n are unknown. In that case, considering the automatic choice of p_n seems particularly relevant. The multivariate version of the cross-validation procedure of Robinson (1991b) for determining the bandwidth of a kernel spectrum estimator is employed. We choose that particular nonparametric method because, to our knowledge, in the context of multivariate spectral density estimation, it represents the only data-dependent technique discussed in the statistical literature. For the smoothing parameter p_n, we retain the value of m that minimizes the pseudo-log-likelihood defined by

where P(·) denotes the periodogram,

a leave-two-out type smooth periodogram, and ζ_j = 2πj/n,j = 1,…, n correspond to the Fourier frequencies. The optimization was performed for the integer values m = 2,3,…,20.

The power properties of the tests are also investigated for cases where the process can be formulated as a vectorial autoregressive process of order p, denoted VAR(p) for p = 1,4,8,12. Furthermore, because of its popularity in vector time series modeling, a vectorial autoregressive moving average VARMA(1,1) process is considered. The specific stochastic processes are described in Section 5.1. The power results are size adjusted. Indeed, for a given alternative, the number of rejections of the null hypothesis is calculated using the exact (empirical) critical values obtained from the level study. This practice allows us to compare the wavelet-based and kernel-based tests on an equal basis.

5.1. Description of the Experiment

For our investigations of the exact levels of the test statistics, we consider five multivariate i.i.d. processes of dimension d = 2,3,4:

with

where I_d is the d × d identity matrix and 1_d = (1,…,1)^{[top ]}. We let ρ = 0.1 and σ = 0.5. The multivariate-t distribution with ν degrees of freedom corresponds to the law of the random variable x ≡ (χ_ν²/ν)^−1/2z, where z ∼ N(0, Σ) and χ_ν² is a chi-squared random variable, z and χ_ν² being independent. This distribution is related to the so-called multivariate Pearson type VII distribution. See Johnson (1987, pp. 117–118) for more details. If x ∼ t_ν, E(xx^{[top ]}) = [ν/(ν − 2)]Σ with kurtosis equal to

= 2/(ν − 4). Consequently, we obtain

= 2 and

= ½ for the t₅ and t₈ distributions, respectively. In robustness studies, the contaminated normal distribution appears to be a useful distribution. It allows us to simulate random variables that deviate from the normal distribution as a result of possible outliers. The kurtosis of the distributions CN₁ and CN₂ is

= 0.028 and

= 0.059, respectively. Note that in unreported simulation experiments, we also considered σ = 2.0 (giving

= 0.323 and

= 0.479, respectively) but the empirical results were largely similar to the case σ = 0.5. To study the level of the new wavelet-based test statistic for a chosen distribution, B = 10,000 independent realizations are generated for each value of n, and the computations are made in the following way. Note that all the codes are written in the FORTRAN 77 language with the NAG library.

Step 1. The Gaussian, chi-squared, and uniform random variables are generated using the subroutines G05EAF and G05EZF, the function G05DHF, and the function G05CAF, respectively; these subroutines/functions are taken from the NAG library.

Step 2. For each realization, the test statistic T_n is computed using the Daniell kernel. The data-driven method of Robinson (1991b) has been implemented to determine with the data the smoothing parameter p_n. The test statistic W_n is computed using the Franklin and Meyer wavelets given by (5) and (6) for a nonstochastic finest scale J, such that 2^J+1 = 3n^1/5. For the nonstochastic finest scale, denoted J₁, we obtain J₁ ≡ 2 when n = 100 and J₁ ≡ 3 when n = 200. Because λ₂ = π⁴/45 for the Franklin wavelet and λ_q ≡ 0 for the Meyer wavelet, the parametric plug-in method as described in Section 4.3 is considered for W_n based on ψ_F only. We use the NAG subroutine D01GAF for the numerical integration. We fit parametric AR(p^(a)) models, for p^(a) = 1,2,…,20, a = 1,2, using the NAG subroutine G02DAF for least-squares estimation. The autoregressive order is chosen using the AIC criteria. We then modify the AIC method because we need a data-dependent

such that

as n → ∞. The data-driven proposal (23) with c = 3 is considered. We denote by

the resulting data-driven proposal. More details on the automatic methods

are provided in Section 5.4.

Step 3. Finally, for each series of length n, for each test statistic we obtain from the B realizations the empirical frequencies of rejection of the null hypothesis of noncorrelation.

To study the power, we consider five alternatives:

where {a_t} is i.i.d. The definitions of the parameters Φ_d, Θ_d, A_d, B_d, and C_d, d = 2,3,4 are given in Table 1. As in the level study, we consider the normal (N), the multivariate-t (t₅ and t₈), and the contaminated normal distribution (CN₁ and CN₂) for the distribution of a_t.

Model coefficients of the data generating processes in the power study of the serial correlation tests

The alternatives are chosen according to the shapes of their spectral and cross-spectral densities. First, the VARMA(1,1) alternative admits spatially homogeneous spectral and cross-spectral densities: one spectral density is dominated by low frequencies and the other by high frequencies. The VAR(1) alternative displays spatially homogeneous spectral and cross-spectral densities; the spectral densities are dominated by low frequencies. The third alternative is a VAR(4), which could arise from quarterly data: here, the spectral and cross-spectral densities possess local features. Finally, the VAR(8) and VAR(12) alternatives exhibit spectral densities and cross-spectral densities with many spikes and distinctive local features. The VAR(12) alternative could arise from monthly economic data. The power analysis is conducted in a way similar to the level study. The difference relies on the first step, which is replaced by the following step.

Step 1′. Set the initial values x_t = 0 for t ≤ 0. The white noise is generated independently using the same subroutines/functions from the NAG library. However, to reduce the effect of the initial values, 2n + 1 observations are simulated for each alternative to obtain the realization {x_t}_t=1²ⁿ⁺¹. We keep the last n observations.

In step 3, we generate B = 10,000 independent realizations in the level study and only B = 1,000 independent realizations in the power study. For the nominal level α and B independent realizations, the standard error of the empirical levels is given by

, which reduces to 0.218% for the nominal level 5% and 0.300% for 10%. The results are given in Tables 2, 3, 4, 5, 6, & 7.

5.2. Discussion of the Level Study

In Table 2, the results for the level study are presented for the normal, multivariate-t, and contaminated normal distributions. Overall, the empirical results for normal and nonnormal processes are very similar in our experiments. As the dimension d increases, the empirical levels improve slightly for the kernel and wavelet test statistics. If we look more specifically at the wavelet test W_n, it appears that the tests

admit reasonable levels at the 10% level. The levels of the wavelet test statistics exhibit some overrejection at the 5% level but are rather comparable to those of the kernel-based test statistic. For n = 200, the wavelet test statistic presents empirical levels closer to the nominal level 5%. The levels of the kernel-based test statistic

are satisfying and in accordance with the simulation results of Duchesne and Roy (2004).

Empirical levels at the 5% and 10% nominal levels for tests for serial correlation using the kernel method (Daniell kernel) and the wavelet method (Franklin and Meyer wavelets)

We now compute descriptive statistics for

. Let

. For the bivariate normal distribution we obtain

for the two sample sizes under investigation. Similar results are obtained for dimensions d = 3,4. It can be shown empirically that W_n(J) exhibits overrejection of the null hypothesis when J is fixed and very small (e.g., W_n(J) with J ≡ 0), suggesting the importance of the requirement J → ∞ in our asymptotic theory. More precisely, large overrejections occurred for the wavelet-based tests when J ≡ 0, and underrejections occurred when J was fixed and such that J > 4. This differs from the test statistic T_n(p_n), which delivers reasonable levels when the smoothing parameter p_n is fixed and small (see, e.g., the simulation experiments described in Duchesne and Roy, 2004). Note that to establish the asymptotic distribution of the kernel-based test statistic T_n(p_n), p_n must satisfy p_n → ∞. The descriptive results for

for nonnormal processes are largely similar to the results obtained with the normal process.

5.3. Discussion of the Power Study

Tables 3 and 4 report the power against VAR(1) and VARMA(1,1) alternatives, respectively. The results for normal and nonnormal processes suggest that the kernel-based test statistic is more powerful in these situations. The differences between the kernel and the wavelet approaches seem to increase as d increases in our experiment. Because the spectral and cross-spectral densities of these alternatives possess spatially regular features, the kernel-based test is expected to perform reasonably well in such situations. The power of the wavelet-based test (Franklin wavelet) based on

is slightly inferior to the power of the kernel-based test with a cross-validated choice of p_n. The differences seem fairly small when n = 200, particularly when d = 2,3. The test statistics W_n(J₁,ψ_F) and W_n(J₁,ψ_M) behave very similarly. Under VARMA(1,1) and VAR(1) alternatives, for the chosen dimensions, we obtain for the processes under consideration

when n = 100 and

when n = 200.

Size-adjusted power against VAR(1) at 10% and 5% levels for tests of serial correlation

Size-adjusted power against VARMA(1,1) at 10% and 5% levels for tests of serial correlation

Table 5 reports the power against the VAR(4) alternative. The results suggest that the choice of the underlying distribution for the innovation has a negligible impact on the empirical powers. The wavelet tests W_n(J₁,ψ_F), W_n(J₁,ψ_M), and

dominate the kernel-based test statistic T_n(p_n) when the alternative is a VAR(4) process. This illustrates that the wavelet test is more powerful than the kernel-based method when the spectral and cross-spectral densities have local distinctive features, due, for example, to a seasonal economic phenomenon.

Size-adjusted power against VAR(4) at 10% and 5% levels for tests of serial correlation

Finally, Tables 6 and 7 report the power against VAR(8) and VAR(12) alternatives. Under these two alternatives, the wavelet test clearly outperforms the kernel-based test. This may be related to the many local and irregular features in the spectral density of the VAR(8) and VAR(12) processes. For the particular VAR(8) alternative, the differences become even larger when the dimension d increases. For both sample sizes, the wavelet tests

display a better power than T_n with the cross-validated choice of p_n. Comparing the wavelet-based tests in Table 6, W_n(J₁,ψ_F) appears more powerful than W_n(J₁,ψ_M) when n = 100, but their power is comparable when n = 200. The VAR(12) alternative highlights the need for a data-driven finest scale, because the power of

appears much higher than that of W_n(J₁,ψ_F) and W_n(J₁,ψ_M), based on the deterministic rule 2^J+1 ∝ n^1/5. In this situation, the deterministic values J₁ ≡ 2 (n = 100) and J₁ ≡ 3 (n = 200) are not appropriate. Much larger values of J are typically necessary, and the proposed method

adapts very well, at least in our simulations.

Size-adjusted power against VAR(8) at 10% and 5% levels for tests of serial correlation

Size-adjusted power against VAR(12) at 10% and 5% levels for tests of serial correlation

From our study, it seems that the cross-validated method does not easily find an appropriate estimator of p_n for the VAR(p) alternatives, p ∈ {4,8,12}. This may be due to the slow convergence rate of that nonparametric technique. The problem is more pronounced in large dimensions in several situations, likely due, in part, to the curse of dimensionality. We also compute some descriptive statistics of the proposed data-dependent method. Under Gaussian innovations and d = 2, we obtain

for the VAR(4) alternative when n = 100,200, whereas, for the VAR(12) alternative, the data-dependent method

delivers the values

. Results are similar for d = 3,4. Again, this illustrates that the plug-in method reveals some information on the true alternative under consideration. Intuitively, we see that a larger resolution level (or a higher level of approximation) is needed when the spectral density possesses more complicated local features.

5.4. Discussion of the Implemented Plug-In Method

To implement the parametric plug-in method

described in Section 4.4, a lower bound l in formula (23) must be specified. The reason is that our asymptotic analysis requires J → ∞ but in some situations we could have α₀(q) = 0: an important example of this is under the null hypothesis. This lower bound ensures that J tends to infinity asymptotically, even under the null hypothesis. The plug-in method based on autoregressive approximating models, as presented in Section 4.4, provides an estimator of α₀(q) that tends to 0 in probability under the null hypothesis, that is,

. Given that the proposed method satisfies this property, it is expected that any reasonable lower bound l will be largely selected when the stochastic process is uncorrelated. On the other hand, more wavelet coefficients are typically needed under strong dependence. Consequently, a larger J should be necessary in this kind of situation, and the lower bound should be less systematically retained. Interestingly, because J is an integer, the proposed method does not rely too strongly on the choice of c or l. It seems that the minimal requirement is to choose a reasonable bound ensuring that J is such that J increases with the sample size n. Easy calculations show that any c in the interval (2^k/n^1/(2q+1),2^k+1/n^1/(2q+1)) leads to the same lower bound, that is,

, where we used the relation J = [lceil ]log₂(cn^1/(2q+1)) − 1[rceil ], [lceil ]a[rceil ] denoting the smallest integer larger than a.

In unreported results, the wavelet-based test statistic (15) based on a fixed J displayed reasonable empirical levels when J ∈ {1,2,3} and J ∈ {2,3,4} for the sample sizes n = 100 and n = 200, respectively. This means that, if the lower bound is not too small or too large,

should lead to similar and reasonable results. However, from an econometric point of view, to consider fixed values of J does not represent a satisfactory solution: (1) from our asymptotic theory, J is such that J → ∞ and (2) from the discussion given in Section 5.2, empirical experiments suggest that of J that are too low tend to display overrejections of the null hypothesis. These considerations indicate that a value of c or l that is too low (large) is expected to deliver overrejection (underrejection) of a specified nominal level.

To assert empirically the impact of the lower bound l in our simulation experiments, we report in Table 8 the number of times (in percentages) that

exceeds the lower bound l (and such that

is strictly larger than J₁ after rounding) for the parametric plug-in method described in Section 4.4, where

represents an estimator of (25), satisfying (23), based on the Franklin wavelet (5). Recall that for the Franklin wavelet, q = 2, λ_q ≡ λ₂ = π⁴/45, and l and c are related by the relation c = (2qλ_q l)^1/(2q+1). We have chosen l and c such that 2^J+1 = cn^1/5 ≡ 3n^1/5; this arbitrary choice of c gives

when n = 100 and

when n = 200. Note that the nonstochastic finest scale J₁ described in step 2 of Section 5.1 delivers J₁ = 2 when n = 100 and J₁ = 3 when n = 200. From Table 8, we observe that under the null hypothesis and for the VAR(1) and VARMA(1,1) alternatives, the lower bound is more often retained. This is particularly true under the null hypothesis of no serial correlation where the lower bound is largely selected, as expected. In fact, from our empirical experiments, the fixed values J = 2 and J = 3 should deliver reasonable levels for sample sizes n = 100 and n = 200, respectively. Note, in passing, that as a function of the dimension d, the lower bound is slightly more often exceeded for a larger dimension, at least in our simulations. The VARMA(1,1) and VAR(1) alternatives offer spatially homogeneous spectral and cross-spectral densities. In these cases, the dependence is explained by few wavelet coefficients and a low value of the finest scale J. Consequently, the lower bounds are often retained. Note that in unreported simulation results the fixed values J = 0,1 displayed higher power than the nonstochastic finest scale J₁ or than the proposed plug-in method

for these particular alternatives. A smaller lower bound would increase the power of the wavelet-based test, but the empirical levels were not well controlled for these fixed values of J. However, under seasonal alternatives, for example, for VAR(q) alternatives with q = 4,8,12, the lower bound is more often exceeded. Under the VAR(4) alternative, a J larger than the lower bound appears necessary to explain the dependence, particularly for a smaller sample size. For n = 200, the method selects the bound J = 3 most of the time. From the discussion in Section 5.3, for both sample sizes,

displayed high power for these choices of the finest scale. For the VAR(8) and more particularly for the VAR(12) alternatives, the lower bound is selected less systematically. Given that the spectral densities and cross-spectral densities exhibit many spikes and distinctive local features, more wavelet coefficients are necessary and a larger J is selected. Interestingly, for the VAR(12) alternative, a larger power is offered by the plug-in method, and

exceeds the bound J = 3 more than 99% of the time when n = 200, suggesting the limited impact of the lower bound, at least for this alternative.

Number of times (in percentages) that exceeds l for the parametric plug-in method described in Section 4.4, where is an estimator of (25), satisfying (23), based on the Franklin wavelet (5); l and c are related by c = (2qλq l)1/(2q+1); q = 2; c = 3; λ2 = π4/45

6. CONCLUSION

In this paper, new tests of serial correlation are proposed for cases where there is no information on the true alternative hypothesis. Because wavelet analysis represents an effective tool when the spectral and cross-spectral densities exhibit peaks and other local features, we have used a suitable distance measure to develop a test statistic based on a comparison between a multivariate spectral density estimator calculated with the wavelet method and the true spectral density under the null hypothesis of absence of correlation in the process. This new one-sided test admits an asymptotic normal distribution under the null hypothesis. We have proposed and justified a data-driven method for the choice of the finest scale, and we have given some practical implementations. In particular, we have described a parametric plug-in method.

In the simulation experiment, the level and power of the new test were investigated in finite samples for vector time series of dimension d = 2,3,4. We have compared the wavelet test with the kernel-based test statistic of Duchesne and Roy (2004), which relies on a distance between a multivariate spectral density estimator calculated with the kernel method and the true spectral density under the null hypothesis of absence of correlation in the process. For the kernel-based test, we applied the cross-validation technique described in Robinson (1991b) for choosing p_n. For the wavelet test, we considered our parametric plug-in method. We have discussed in detail the particular implementation of the proposed method, including the practical choice of a lower bound satisfying the requirements of our asymptotic theory. The level of the wavelet test appeared reasonably well controlled at the nominal levels 5% and 10% with series of 100 and 200 observations using the proposed data-driven method, even if it did tend to overreject slightly at the 5% nominal level.

In conclusion, when the multivariate spectral density exhibits regular features, the kernel-based test of Duchesne and Roy (2004) may be recommended. However, it is very common that the local properties of the spectral density will be unknown to the experimenter. So, as the multivariate spectral density is expected to present irregular features due to quarterly, monthly, or other kinds of seasonal data, the new wavelet test would seem to be a useful complement for the practitioner.

Some avenues of further research would include constructing a test statistic based on a nonlinear wavelet spectral density estimator. It seems likely that the methods used in this paper can be extended to derive the asymptotic distribution of the test statistic deduced from that alternative approach, but the technicalities involved would be far from trivial. In finite samples, it would be particularly interesting to compare the advantages of the nonlinear wavelet estimator over those of the linear wavelet estimator proposed in this paper. These challenging considerations are left for future studies.

APPENDIX

Proof of Theorem 1. We adapt the proof of Lee and Hong (2001), which is valid for a univariate process. We will only give the delicate arguments of the generalization to the multivariate case. For simplicity, we assume that E(x_t) = 0, and we consider

. In fact, there is no loss of generality to assume that the process has a zero mean. It can be shown that if a process is such that E(x_t) = μ, C_x(j) given by (10), then the asymptotic distribution of W_n(J) is unchanged.

We consider the standardized process y_t = Γ_x^−1/2(0)x_t with zero mean and such that var(y_t) = I_d. Then C_y(j) = Γ_x^−1/2(0)C_x(j)Γ_x^−1/2(0), and we let

Our approach differs slightly from that of Lee and Hong (2001) because we show in a first step the asymptotic normality of the test statistic where C_x(0) is replaced by Γ_x(0). In a final step, we show that replacing Γ_x(0) by C_x(0) has no impact asymptotically on the limiting distribution. We can write

where b_J(h,m) = a_J(h,m) + a_J(−h,−m) + a_J(h,−m) + a_J(−h,m) and

where

Compare Priestley (1981, eqn. (6.19), p. 392). Lee and Hong (2001, Lem. A.1) give several useful properties of the quantities b_J(h,m). Noting that we can write

where 〈x,y〉 = x^{[top ]}y, we decompose (A.1) as

where

The quantity q is such that q → ∞, q/2^J → ∞, and q²/n → 0. Note that E(∥y_t∥²) = d and E(〈y_t−m,y_t−h〉) = δ_hm d. Using Lemma A.1(v) of Lee and Hong (2001), and following an argumentation similar to their Propositions 1 and 2, it is possible to show that

because

where we used Assumption B(ii) (see also Lee and Hong, 2001, App. B). Note that the constant τ is as in Assumption B. More precisely, to study

, we proceed as in Lee and Hong (2001, pp. 406–411), and we make intensive use of inequalities involving vector random variables (see also Duchesne and Roy, 2004). First, we consider

. Note that

Using the inequality |tr(AB^{[top ]})| ≤ [tr(AA^{[top ]})]^1/2[tr(BB^{[top ]})]^1/2 and using the usual Cauchy–Schwarz inequality, it follows that

Because

we deduce that

. Using the same argument as Lee and Hong (2001, p. 408), we find that

and consequently

. Similarly,

. We now consider

. Similar calculations give

because according to Lemma A.1(ii) of Lee and Hong (2001),

Setting ν = 0, this shows that

, because q²/n → 0 and 2^2J/n → 0. To study

, we introduce the following matrix:

Straightforward calculations give

An argument similar to that of Lee and Hong (2001, p. 409) allows us to deduce that

. Finally, we study

. Let

It is easy to show that

. We introduce the decomposition z_t = z_1t + z_2t + z_3t, where

where I(A) denotes the indicator function of the set A. Using the inequality tr[(A + B + C)(A + B + C)^{[top ]}] ≤ 4[tr(AA^{[top ]}) + tr(BB^{[top ]}) + tr(CC^{[top ]})], it follows that

where C_int = E [tr(z_itz_it^{[top ]})], i = 1,2,3. Proceeding as in Proposition 2 of Lee and Hong (2001), it follows that

Collecting the results, it follows that

and consequently,

. We have shown that

. The asymptotic normality comes from the statistic

. Let

and write

, where

where S_t−q−1(m) is given by (A.5). It is possible to show that 4d²(2^J+1 − 1)/v_n² → 1. To establish the asymptotic normality of

, we use the martingale central limit theorem of Brown (1971) as described in Taniguchi and Kakizawa (2000). The result follows if we can show these two conditions:

It is possible to show that E(U_nt⁴) = O(t²2^2J), which is sufficient for (i). For (ii), it is enough to establish

where

. It is possible to write

where

If we proceed as in Lemma A.2 of Lee and Hong (2001), it follows that

This shows (ii), and because

this establishes the asymptotic normality of W_n(J). █

Proof of Theorem 2. Let

First, we show that

. We can write

We have to establish two results.

To show (i) is immediate because

. We now show (ii). Let

We decompose

where

We now study

. To show that

, we show, using the definition of convergence in probability, that

. Using the identity A = (A ∩ B) ∪ (A ∩ B^c), where A and B are two sets and B^c denotes the complement of B, and because in general Pr(A ∪ B) ≤ Pr(A) + Pr(B), it follows that

where M > 0 and ε > 0 are arbitrarily large and small numbers, respectively. Because by hypothesis the relation

holds, we deduce that

. Also, given formula (A.3),

, we have that

Furthermore,

using the Cauchy–Schwarz inequality. Using the fact that

and also that

(cf. Lee and Hong, 2001, App. B), we conclude that

. Using Markov's inequality, this shows that

. Similarly, using the fact that

. This shows that

. An argument similar to the one leading to relation (A.6) gives that

. Because

, this completes the proof of the theorem. █

Proof of Theorem 3. First we note that

The first term is associated with the variance, whereas the last term is the squared bias term.

We now study the bias term. We have that

Direct algebra gives

Using the orthonormality of the wavelet basis, the first term is equal to

We have that

Replacing (A.8) in (A.7), using the psd property of

and noting that |b_J(h,m)| ≤ C(J + 1), we deduce that

Because [sum ]_h|h|∥Γ_x(h)∥ ≤ C, this shows that

For the second term, we obtain that

We now evaluate separately the terms corresponding to r = 0 and r ≠ 0. In the former case,

In the case r ≠ 0, because

by Assumption D, it follows that

We now calculate the variance term. Straightforward calculations give

The first term is easily shown to be O(n⁻¹). From Hannan (1970, p. 313), the second term written in matrix notation is equal to

where

where K_dd is the commutation matrix, Cum is a certain matrix containing the fourth-order cumulants, and the function w_n(u,m,h) is defined in Hannan (1970). By Assumption D, |A_3n| ≤ C(J + 1)/n. With a change of variables and proceeding as in Parzen (1957), we obtain |A_2n| ≤ C(J + 1)/n. We now study A_1n. We have that

A truncation argument on j allows us to show that

Multiplying by

and taking the tr(·) gives

Collecting the results, we obtain Theorem 3. █

Proof of Corollary 1. The result follows immediately from Theorem 2 because the assumption of the corollary implies

, where the nonstochastic finest scale J is given by

The latter satisfies the conditions of Theorem 2. █

References

REFERENCES

Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858.Google Scholar

Andrews, D.W.K. & W. Ploberger (1996) Testing for serial correlation against an ARMA(1,1) process. Journal of the American Statistical Association 91, 1331–1342.Google Scholar

Box, G.E.P. & D.A. Pierce (1970) Distribution of residual autocorrelations in autoregressive integrated moving average time series models. Journal of the American Statistical Association 65, 1509–1526.Google Scholar

Brockwell, P.J. & R.A. Davis (1991) Time Series: Theory and Methods, 2nd ed. Springer-Verlag.

Brown, B.M. (1971) Martingale central limit theorems. Annals of Mathematical Statistics 42, 59–66.Google Scholar

Chitturi, R.V. (1974) Distribution of residual autocorrelations in multiple autoregressive schemes. Journal of the American Statistical Association 69, 928–934.Google Scholar

Chitturi, R.V. (1976) Distribution of multivariate white noise autocorrelations. Journal of the American Statistical Association 71, 223–226.Google Scholar

Daubechies, I. (1992) Ten Lectures on Wavelets. SIAM.

Duchesne, P. & R. Roy (2004) On consistent testing for serial correlation of unknown form in vector time series models. Journal of Multivariate Analysis 89, 148–180.Google Scholar

Durbin, J. & G.S. Watson (1950) Testing for serial correlation in least squares regression, part I. Biometrika 37, 409–428.Google Scholar

Durbin, J. & G.S. Watson (1951) Testing for serial correlation in least squares regression, part II. Biometrika 38, 159–178.Google Scholar

Fuller, W.A. (1996) Introduction to Statistical Time Series, 2nd ed. Wiley.

Hannan, E. (1970) Multiple Time Series. Wiley.

Hernández, E. & G. Weiss (1996) A First Course on Wavelets. CRC Press.

Hong, Y. (1996) Consistent testing for serial correlation of unknown form. Econometrica 64, 837–864.Google Scholar

Hong, Y. (2001) Wavelet-Based Estimation for Heteroskedasticity and Autocorrelation Consistent Variance-Covariance Matrices. Working paper, Department of Economics and Department of Statistical Science, Cornell University.

Hong, Y. & R.D. Shehadeh (1999) A new test for ARCH effects and its finite-sample performance. Journal of Business & Economic Statistics 17, 91–108.Google Scholar

Hosking, J. (1980) The multivariate portmanteau statistic. Journal of the American Statistical Association 75, 602–608.Google Scholar

Hosking, J. (1981a) Equivalent forms of the multivariate portmanteau statistic. Journal of the Royal Statistical Society, Series B 43, 261–262 (Correction published in 1989, vol. 51, p. 303).Google Scholar

Hosking, J. (1981b) Lagrange-multiplier tests of multivariate time series models. Journal of the Royal Statistical Society, Series B 43, 219–230.Google Scholar

Johnson, M.E. (1987) Multivariate Statistical Simulation. Wiley.

Lee, J. & Y. Hong (2001) Testing for serial correlation of unknown form using wavelet methods. Econometric Theory 17, 386–423.Google Scholar

Li, W.K. & A.I. McLeod (1981) Distribution of the residual autocorrelations in multivariate ARMA time series models. Journal of the Royal Statistical Society, Series B 43, 231–239.Google Scholar

Ljung, G.M. & G.E.P. Box (1978) On a measure of lack of fit in time series models. Biometrika 65, 297–303.Google Scholar

Lütkepohl, H. (1993) Introduction to Multiple Time Series Analysis, 2nd ed. Springer-Verlag.

Mallat, S.G. (1989) A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 674–693.Google Scholar

Meyer, Y. (1986) Ondelettes, fonctions splines et analyse graduées. Lectures given at the University of Torino, Italy.

Meyer, Y. (1992) Wavelets and Operators. Cambridge Studies in Advanced Mathematics 37. Cambridge University Press.

Neumann, M.H. (1996) Spectral density estimation via nonlinear wavelet methods for stationary non-Gaussian time series. Journal of Time Series Analysis 17, 601–633.Google Scholar

Paparoditis, E. (2000a) Spectral density based goodness-of-fit tests for time series analysis. Scandinavian Journal of Statistics 27, 143–176.Google Scholar

Paparoditis, E. (2000b) On some power properties of goodness-of-fit tests for time series models. In C.A. Charalambides, M.V. Koutras, & N. Balakrishnan (eds.), Probability and Statistical Models with Applications, 333–348. Chapman and Hall.

Paparoditis, E. (2005) Testing the fit of a vector autoregressive moving average model. Journal of Time Series Analysis 26, 543–568.Google Scholar

Parzen, E. (1957) On consistent estimates of the spectrum of a stationary time series. Annals of Mathematical Statistics 28, 329–348.Google Scholar

Priestley, M.B. (1981) Spectral Analysis and Time Series, vol. 1: Univariate Series. Academic Press.

Priestley, M.B. (1996) Wavelets and time-dependent spectral analysis. Journal of Time Series Analysis 17, 85–103.Google Scholar

Robinson, P.M. (1991a) Testing strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. Journal of Econometrics 47, 67–84.Google Scholar

Robinson, P.M. (1991b) Automatic frequency domain inference on semiparametric and non-parametric models. Econometrica 59, 1329–1363.Google Scholar

Taniguchi, M. & Y. Kakizawa (2000) Asymptotic Theory of Statistical Inference for Time Series. Springer-Verlag.

Vidakovic, B. (1999) Statistical Modeling by Wavelets. Wiley.

Walter, G. (1994) Wavelets and Other Orthogonal Systems with Applications. CRC Press.

Wei, W.W.-S. (1990) Time Series Analysis, Univariate and Multivariate Methods. Addison-Wesley.

Model coefficients of the data generating processes in the power study of the serial correlation tests

Empirical levels at the 5% and 10% nominal levels for tests for serial correlation using the kernel method (Daniell kernel) and the wavelet method (Franklin and Meyer wavelets)

Size-adjusted power against VAR(1) at 10% and 5% levels for tests of serial correlation

Size-adjusted power against VARMA(1,1) at 10% and 5% levels for tests of serial correlation

Size-adjusted power against VAR(4) at 10% and 5% levels for tests of serial correlation

Size-adjusted power against VAR(8) at 10% and 5% levels for tests of serial correlation

Size-adjusted power against VAR(12) at 10% and 5% levels for tests of serial correlation

Article contents

ON TESTING FOR SERIAL CORRELATION WITH A WAVELET-BASED SPECTRAL DENSITY ESTIMATOR IN MULTIVARIATE TIME SERIES

Abstract

1. INTRODUCTION

2. PRELIMINARIES

2.1. Temporal and Spectral Notations

2.2. Wavelet Analysis of the Spectral Density

2.3. Wavelet-Based Spectral Density Estimator

3. THE TEST STATISTIC AND ITS ASYMPTOTIC NULL DISTRIBUTION

4. ADAPTIVE CHOICE OF THE FINEST SCALE

4.1. A General Result for the Estimation of the Finest Scale

4.2. Derivation of the Integrated Mean Squared Error of the Wavelet-Based Spectral Density Estimator

4.3. On the Estimation of the Finest Scale with a Plug-In Method of the Parametric Type

4.4. Particular Implementation of the Proposed Parametric Plug-In Method

5. SIMULATION RESULTS

5.1. Description of the Experiment

5.2. Discussion of the Level Study

5.3. Discussion of the Power Study

5.4. Discussion of the Implemented Plug-In Method

6. CONCLUSION

APPENDIX

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests