Published online by Cambridge University Press: 23 September 2005
The sampling window method of Hall, Jing, and Lahiri (1998, Statistica Sinica 8, 1189–1204) is known to consistently estimate the distribution of the sample mean for a class of long-range dependent processes, generated by transformations of Gaussian time series. This paper shows that the same nonparametric subsampling method is also valid for an entirely different category of long-range dependent series that are linear with possibly non-Gaussian innovations. For these strongly dependent time processes, subsampling confidence intervals allow inference on the process mean without knowledge of the underlying innovation distribution or the long-memory parameter. The finite-sample coverage accuracy of the subsampling method is examined through a numerical study.The authors thank two referees for comments and suggestions that greatly improved an earlier draft of the paper. This research was partially supported by U.S. National Science Foundation grants DMS 00-72571 and DMS 03-06574 and by the Deutsche Forschungsgemeinschaft (SFB 475).
This paper considers nonparametric distribution estimation for a class of random processes that exhibit strong or long-range dependence. Here we classify a real-valued stationary time process
as long-range dependent (LRD) if its autocovariance function r(k) = Cov(Yt,Yt+k) can be represented as
for some 0 < α < 1 and function L1 : (0,∞) → (0,∞) that is slowly varying at infinity, that is, limx→∞ L1(λx)/L1(x) = 1 for all λ > 0. Time series satisfying (1) often find application in astronomy, hydrology, and economics (Beran, 1994; Montanari, 2003; Henry and Zaffaroni, 2003).
For comparison, we note that weakly dependent processes are usually characterized by rapidly decaying, summable covariances (Doukhan, 1994). However, (1) implies that the sum of covariances
diverges under long-range dependence. This feature of strongly dependent data often complicates standard statistical inference based on the sample mean Yn of a stretch of observations Y1,…,Yn. For one reason, the variance of a size n sample mean Yn decays to zero at a rate that is both slower than O(n−1) and unknown in practice (Beran, 1994). The usual scaling factor
used with independent or weakly dependent data then fails to produce a limit distribution for Yn under long-range dependence. Even if properly standardized, the sample mean can have normal in addition to nonnormal limit laws across various types of strongly dependent processes (Davydov, 1970; Taqqu, 1975). As a consequence, statistical approximations of the unknown sampling distribution of Yn are necessary under long-range dependence, without making stringent assumptions on the underlying process or the strength of the dependence α,L1 in (1).
For weakly dependent data, the moving block bootstrap of Künsch (1989) and Liu and Singh (1992) provides accurate nonparametric estimates of the sample mean's distribution. However, the block bootstrap has been shown to break down for a class of LRD processes where the asymptotic distribution of Yn can be nonnormal (cf. Lahiri, 1993). These processes are obtained through transformations of certain Gaussian series (Taqqu, 1975, 1979; Dobrushin and Major, 1979). Although the bootstrap rendition of Yn fails for transformed-Gaussian LRD processes, Hall, Jing, and Lahiri (1998) have shown that their so-called sampling window procedure can consistently approximate the distribution of the normalized sample mean for these same time series. This procedure is a subsampling method that modifies data-blocking techniques developed for inference with weakly dependent (mixing) data (Politis and Romano, 1994; Hall and Jing, 1996). With the aid of subsampling variance estimators, Hall et al. (1998) also developed a Studentized version of the sample mean along with a consistent, subsample-based estimator of its distribution.
In this paper, we establish the validity of the sampling window method of Hall et al. (1998) for a different category of LRD processes: linear LRD processes with an unknown innovation distribution. The subsampling method is shown to correctly estimate the distribution of normalized and Studentized versions of the sample mean under this form of long-range dependence, without knowledge of the exact dependence strength α or innovation structure. The results illustrate that subsampling can be applied to calibrate nonparametric confidence intervals for the mean E(Yt) = μ of either a transformed-Gaussian LRD process (Hall et al., 1998) or a linear LRD series. That is, the same subsampling procedure allows nonparametric interval estimation when applied to two major examples of strongly dependent processes considered in the literature (Beran, 1994, Ch. 3).
The rest of the paper is organized as follows. In Section 2, we frame the process assumptions and some distributional properties of Yn. Main results are given in Section 3, where we establish the consistency of subsampling distribution estimation for the sample mean under linear long-range dependence. In Section 4, we report a simulation study on the coverage accuracy of a subsampling confidence interval procedure for the LRD process mean μ. A second numerical study also considers subsampling estimators for the distribution of the Studentized sample mean. In Section 5, we discuss the validity of the subsampling method for weakly dependent linear processes. Proofs of the main results are provided in Section 6.
We suppose that the observed data
represent a realization from a stationary, real-valued LRD process
that satisfies the following assumption.
Assumption L. For independent identically distributed (i.i.d.) innovations
with mean E(εt) = 0 and 0 < E(εt2) < ∞, it holds that
where
and the real sequence
is square summable
such that the autocovariance function r(k) = Cov(Yt,Yt+k) admits a representation as in (1).
Assumption L encompasses two popular models for strong dependence: the fractional Gaussian processes of Mandelbrot and van Ness (1968) and the fractional autoregressive integrated moving average (FARIMA) models of Adenstedt (1974), Granger and Joyeux (1980), and Hosking (1981). For FARIMA processes in particular, we permit greater distributional flexibility through possibly non-Gaussian innovations. Note that a LRD FARIMA(0,d,0) series, d ∈ (0,½), admits a casual moving average representation
involving the gamma function Γ(·). More general FARIMA series, for which (1) holds with α = 1 − 2d and constant L1(·) = C1 > 0, follow from applying an autoregressive moving average (ARMA) filter to a process from (2) (cf. Beran, 1994).
We remark that the results of this paper also hold by stipulating the long-range dependence through regularity conditions, as in Theorem 2.2 of Hall et al. (1998), on the spectral density f of the process {Yt} for which limx→0 f (x)/{|x|α−1L1(1/|x|)} > 0 exists finitely. Under certain conditions, this behavior of f at the origin is equivalent to (1) and serves as an alternative description of long-range dependence (Bingham, Goldie, and Teugels, 1987). However, our assumptions here on the LRD linear process are fairly mild, requiring i.i.d. innovations to only have a bounded second moment.
In the following discussion, for any two nonzero real sequences {sn} and {tn}, we write sn ∼ tn if limn→∞ sn /tn = 1. With the proper scaling dn, the asymptotic distribution of the normalized sample mean is known to be normal for Assumption L processes (cf. Davydov, 1970): as n → ∞,
where Z represents a standard normal variable and
denotes convergence in distribution. However, setting confidence intervals for μ, based on Yn and its large-sample normal distribution, becomes complicated for linear LRD processes. The covariance decay rate in (1) implies that the variance of Yn converges to 0 as follows:
for L(·) = 2{(2 − α)(1 − α)}−1L1(·), which is slower than the usual O(n−1) rate associated with weakly dependent data. Consequently, the correct scaling dn = {n2−αL(n)}1/2 ∼ {Var(nYn)}1/2 for n(Yn − μ)/dn to have a normal limit depends on the unknown quantities α,L(·) from (4).
With additional assumptions on the linear process, unknown quantities in dn could in principle be estimated directly for interval estimation of μ based on a normal approximation (3). For example, by assuming a constant function L1(·) = C1 in (1) (along with additional regularity conditions on f), estimates
of α,C1 could be obtained through various periodogram-based techniques (Bardet, Lang, Oppenheim, Philippe, Stoev, and Taqqu, 2003). However, after substituting such estimates directly into dn from (3), the resulting Studentized mean
may fail to adequately follow a normal distribution. To illustrate, we conducted a small numerical study of the coverage accuracy of confidence bounds for the mean μ of several LRD FARIMA processes, set with a normal approximation for a Studentized mean Gn. For these processes, (1) holds with a function L1(·) = C1. We obtained a version Gn by estimating α and C1 in dn through popular log-periodogram regression (Geweke and Porter-Hudak, 1983) using the first
Fourier frequencies (Hurvich, Deo, and Brodsky, 1998). The coverage probabilities in Table 1 suggest that the normal distribution may not always appropriately describe the behavior of a Studentized sample mean obtained through such plug-in estimates in (3). (The LRD processes in Table 1 involve filtered FARIMA(0,d,0) series, but other simulation results indicate that a plug-in version Gn may produce better confidence intervals with unfiltered FARIMA(0,d,0) series.) We remark that, with Gaussian LRD processes, Beran (1989) developed a modified normal distribution for Yn after Studentization with a periodogram-based estimate of α. However, the approach given was globally parametric in requiring the form of the spectral density f to be known on the entire interval [0,π], which is a strong assumption.
In Section 3, we show that the sampling window method produces consistent, nonparametric estimates of the finite-sample distribution of the sample mean from strongly dependent linear processes. Subsampling distribution estimators for Yn can then be used to calibrate nonparametric confidence intervals for the process mean μ. Under linear long-range dependence, an advantage of this approach over traditional large-sample theory is that the subsampling confidence intervals may be constructed without making restrictive assumptions on the behavior of f near zero and without estimating the covariance parameter α. Another benefit of the subsampling method is its applicability to other formulations of long-range dependence involving nonlinear processes. That is, the subsampling technique has established validity with transformed-Gaussian LRD series as treated in Hall et al. (1998). For these series, n(Yn − μ)/dn may have a nonnormal limit distribution, and a normal approximation for the sample mean might break down.
We briefly present the subsampling estimator of the sampling distribution of the normalized sample mean Tn = n(Yn − μ)/dn, as prescribed in Hall et al. (1998). Denote the distribution function of Tn as
. To capture the underlying dependence structure, the subsampling method creates several small-scale replicates of Y1,…,Yn through data blocks or subsamples. Let 1 ≤ [ell ] ≤ n be the block length and denote
as the ith overlapping data block, 1 ≤ i ≤ N = n − [ell ] + 1. Treating each block as a scaled-down copy of the original time series, define the analog of Tn on each block
as T[ell ]i = (S[ell ]i − [ell ]Yn)/d[ell ], 1 ≤ i ≤ N, where
represents a block sum.
The sampling window estimator
of the distribution Fn(x) is given by
, where I {·} denotes the indicator function. The subsampling estimator
is simply the empirical distribution of the subsample versions T[ell ]i of Tn. Hall et al. (1998) establish the consistency of
in estimating Fn with transformed-Gaussian LRD series. The following result extends the consistency of the subsampling estimator
to include a large class of LRD linear processes. Let
denote convergence in probability.
THEOREM 1. If Assumption L holds and [ell ]−1 + n−(1−δ)[ell ] = o(1) for some δ ∈ (0,1), then
Note that the correct scaling dn,d[ell ] for centered sums n(Yn − μ), (S[ell ]i − [ell ]μ) to have a normal limit in (3) depends on unknown quantities α,L(·). In practice, these scaling factors need to be consistently estimated, and
must be appropriately modified, to set confidence intervals for μ. We next give a modified subsampling approach for accomplishing this.
Following the setup in Hall et al. (1998), we first replace dn in Tn with a data-based construct involving two subsampling variance estimates. To describe the estimate of dn, let m1n, m2n ∈ [1,n] denote integers such that for some θ ∈ (0,1) we have
as n → ∞, implying further that m1n2/m2n ∼ n holds. For m ∈ [1,n], define
where
. Here
represents a subsampling variance estimator of Var(Sm1). Next define
with the smoothing parameters m1n,m2n. We use
as an estimator of dn2 to obtain a Studentized version of the sample mean as
.
To calibrate confidence intervals for μ based on T1n, a subsampling estimator
of the distribution function F1n of T1n can be constructed as follows. For each length [ell ] block
, let
denote the subsample version of
found by replacing (Y1,…,Yn) and n with
and [ell ] in the definition of
. Analogous to values m1n,m2n used in
, each version
requires subsample smoothing parameters m1[ell ],m2[ell ] that satisfy (5) with [ell ] rather than n. Let
denote the subsample replicates of T1n. The subsampling estimator of F1n is then given by
.
We show that the preceding subsampling estimator successfully approximates the distribution F1n of the Studentized sample mean for long-memory linear processes. Hall et al. (1998) give an analogous result for transformed-Gaussian LRD series.
THEOREM 2. In addition to the conditions of Theorem 1, assume m1n,m2n satisfy (5) and
Then, as n → ∞: (a)
; (b)
, a standard normal variable; (c)
.
Condition (6) represents a weakened version of a similar assumption used by Hall et al. (1998, Thm. 2.5) and implies that the combination of subsampling variance estimators in
can consistently estimate dn2 under long-range dependence. For LRD fractional Gaussian and FARIMA processes, the function L(·) is constant in (4) and so easily satisfies (6) (Beran, 1994). Examples of other slowly varying functions that fulfill (6) include
based on an arbitrary slowly varying
, such as
. However, condition (6) is still restrictive in not permitting general slowly varying functions such as L(x) = log(x).
In the next section, we outline a procedure for constructing confidence intervals for the mean μ based on the subsampling result in Theorem 2.
Let [lfloor ]·[rfloor ] denote the integer part function. For β ∈ (0,1), let
denote the [lfloor ]Nβ[rfloor ]th order statistic of the N possible subsample versions T1[ell ],i, 1 ≤ i ≤ N, of T1n. Here
represents the β-percentile of the subsampling estimator
taken as an estimate of the same percentile of F1n. Using
, we set approximate one-sided lower and upper 100(1 − β)% confidence bounds for μ as
, respectively. These subsampling bounds have asymptotically correct coverage under Theorem 2, namely, limn→∞ P(μ > L1−β,n) = limn→∞ P(μ < U1−β,n) = 1 − β. An approximate two-sided 100(1 − β)% subsampling confidence interval for μ is then (L1−β/2,n,U1−β/2,n).
The subsampling confidence intervals for μ require the selection of subsample lengths [ell ],mkn and mk[ell ], k = 1,2. These are important for the finite-sample performance of the subsampling method. Although best block sizes are unknown, we can modify some proposals made in Hall et al. (1998). In subsampling from transformed-Gaussian type LRD series, Hall et al. (1998) proposed block lengths [ell ] = Cn1/2, C = 1,3,6,9. This size n1/2 block choice is based on the intuition that subsamples from LRD series should generally be longer compared to blocks for weakly dependent data, for which
is usually optimal (Künsch, 1989; Hall, Horowitz, and Jing, 1995; Hall and Jing, 1996). That is, a jump in the order of appropriate blocks seems reasonable under long-range dependence, analogous to the sharp increase from length [ell ] = 1 blocks (no blocking) for i.i.d. data to length [ell ] = O(n1/3) blocks for weakly dependent data. Plausible smoothing parameters satisfying (5) are m1n = [lfloor ]n(1+θ)/2[rfloor ], m2n = [lfloor ]nθ[rfloor ] for θ ∈ (0,1), and subsample versions mk[ell ], k = 1,2, can be analogously defined with [ell ]. Hall et al. (1998) recommend a value of θ near 1 to achieve a smaller bias for the two subsample variance estimators
combined in
.
We performed a simulation study of the subsampling confidence intervals under linear long-range dependence, investigating various block lengths [ell ]. We describe the simulation setup and results in Section 4.
Sections 4.1 and 4.2, respectively, describe the design and results of a simulation study to examine the performance of subsampling confidence intervals with LRD linear processes. In Section 4.3, we present two examples of subsampling distribution estimation, in addition to confidence intervals, for a linear and a nonlinear LRD time series.
Let
represent a FARIMA(0,d,0) series from (2) based on d = (1 − α)/2 ∈ (0,½) and i.i.d. innovations
.
To study the coverage accuracy of the subsampling method, we considered FARIMA processes
, constructed by combining one of the following ARMA filters (specified by φ,ϑ coefficients), α values, and innovation distributions:
where χ12 and t3 represent chi-square and t distributions with 1 and 3 degrees of freedom. The preceding framework allows for LRD linear processes {Yt} exhibiting various decay rates α in (1) with Gaussian or non-Gaussian innovations. The non-Gaussian innovations may exhibit skewness (e.g., χ12 − 1) or heavier tails (e.g., t3). From each LRD FARIMA model, we generated size n time stretches
as follows.
A sample
from a non-Gaussian FARIMA(0,d,0) process was generated by truncating the moving average expression in (2) after the first M = 1,000 terms and then using n + M innovations εt to build an approximate truncated series (for details, see Bardet, Lang, Oppenheim, Philippe, and Taqqu, 2003, p. 590). Samples
from a Gaussian series were simulated by the circulant embedding method of Wood and Chan (1994) with FARIMA(0,d,0) covariances (Beran, 1994). Under Filter 3, the desired FARIMA realization is given by
. For FARIMA series involving Filters 1 and 2, generating
innovations as before in the appropriate ARMA model yielded
. We considered sample sizes n = 100,400,900.
We report here the coverage accuracy of subsampling confidence intervals for the LRD process mean μ = 0 based on a data set
generated as in Section 4.1. In the subsampling procedure of Section 3.3, we used block sizes [ell ] = Cn1/2, C ∈ {0.5,1,2} and θ = 0.8. These [ell ] lengths are smaller overall than those considered in Hall et al. (1998), where subsampling intervals performed poorly in numerical studies with overly large C values (e.g., 6,9). Tables 2 and 3 provide coverage probabilities of lower and upper approximate 90% one-sided confidence intervals appearing, respectively, in parenthetical pairs (·,·). Table 2 corresponds to FARIMA series with normal and chi-square innovations. Table 3 provides results for t-innovations that have unbounded (third and higher) moments. All coverage probabilities were approximated by an average over 1,000 simulation runs for each considered LRD process.
To summarize our numerical findings:
(1) Subsampling coverage accuracy generally improves with increasing sample size and weaker dependence (increasing α).
(2) Overall, the subsampling method seemed to perform similarly across the innovation processes considered.
(3) Coverage inaccuracy is most apparent under the strongest dependence α = 0.1, in the form of undercoverage. Processes under Filter 1 (large, positive autoregressive parameter) also produced instances of overcoverage, most apparent with the smallest block [ell ] = 0.5n1/2. To a larger extent, this latter behavior in coverage probabilities also appeared in Table 1 with the plug-in approach involving direct estimation of α.
(4) The subsampling method performed reasonably well across the block sizes [ell ] considered. However, optimal block lengths may depend on the strength of the underlying long-range dependence; C = 1,2 values appeared best when α = 0.5,0.9 whereas C = 0.5,1 seemed better for α = 0.1. These findings appear consistent with the simulation results in Hall et al. (1998) with subsampling other LRD processes.
(5) Other simulation studies showed that intervals using a normal approximation for T1n, based on Theorem 2(b), exhibit extreme undercoverage and perform worse than intervals based on the subsampling distribution estimator for T1n. This is because the finite-sample distribution of T1n can exhibit heavy tails and may converge slowly to its asymptotic normal distribution; see also Figure 1.
With subsampling techniques, theoretical investigations of block choices have received much attention for weakly dependent data, and clearly more research is needed to determine theoretically optimal block lengths [ell ] under long-range dependence. The block sizes in the simulation study appear to be effective for the considered LRD processes, and similar lengths appear to be appropriate for other types of LRD processes considered in Hall et al. (1998). Results from other simulation studies indicate that smaller order block sizes (e.g., n1/3) generally result in overcoverage under long-range dependence, whereas blocks that are excessively long (e.g., [ell ] = 9n1/2) produce undercoverage. Compared to [ell ], the choice of θ appears to be less critical, and repeating the study with θ = 0.9 as in Hall et al. (1998) or θ = 0.5,0.7 led to only slight changes overall.
In theory, the nonparametric subsampling estimators can be applied for inference on the sample mean of different LRD processes, including the linear series considered here and transformed Gaussian processes in Hall et al. (1998). Subsampling confidence intervals for μ require a good subsample-based approximation of the distribution of the Studentized sample mean from Section 3.2. However, the variety of long-range dependence can influence greatly the distribution of the sample mean, leading to both normal (e.g., linear series) and nonnormal limit laws. Because the type of LRD time series could be unknown in practice, we conducted a further numerical study of subsampling distribution estimators of the Studentized sample mean T1n in situations where T1n has a normal and a nonnormal limit.
We applied the subsampling method to two LRD series: a mean zero (linear) Gaussian process Zt with Var(Zt) = 1 and spectral density f (x), 0 < |x| ≤ π, given by
and a nonlinear, transformed Gaussian Yt = G(Zt) series, using the third Hermite polynomial G(x) = x3 − 3x. The covariances
satisfy (1) with α = 0.1 and nonconstant (up to a scalar multiple) L(x) = log log(x); these covariances can be written as a sum of FARIMA(0,d = 0.45,0) covariances (i.e.,
) plus an additional regularly varying component. The process Yt also exhibits slowly decaying covariances because G(·) has Hermite rank 3 and here 0 < 3α < 1 (Taqqu, 1975; Beran, 1994). Because of the limit law of the sample mean, the asymptotic distribution of the Studentized sample mean T1n is normal under the Zt process (e.g., Theorem 2) and nonnormal for the nonlinear series Yt (Taqqu, 1975, 1979; Hall et al., 1998).
For the preceding two series, we can compare the exact distribution F1n(x) of the subsample-Studentized sample mean T1n and its subsampling estimator
. For each series type, Figure 1 provides the exact distribution F1n of T1n at sample sizes n = 100,400,900 and θ = 0.8. In each case, the distribution F1n was calculated through simulation (using 15,000 runs) and appears as a thick line in Figure 1. Using a block length [ell ] = n1/2, five subsampling estimates
of each distribution F1n were computed from five independent size n samples from {Zt} or {Yt}; these estimates appear as dotted lines in Figure 1.
In each instance in Figure 1, the finite-sample distribution of T1n exhibits heavy tails. This indicates that confidence intervals for the process mean E(Zt) = 0 = E(Yt) set with T1n and a normal approximation to its distribution would be inappropriate. (As stated previously, a normal approximation of T1n is expected to break down for LRD series Yt.) However, the subsampling estimates appear to adequately approximate the exact distribution of the Studentized sample mean T1n, particularly for larger n. The coverage probabilities listed in Figure 1 additionally suggest that the subsampling method, based on
, leads to reasonable confidence intervals of the means of both the linear and nonlinear LRD processes.
We comment briefly on the subsampling method applied to linear time processes under weak or short-range dependence. A stationary time series {Yt},
can be generally called short-range dependent (SRD) if the process autocovariances decay fast enough to be absolutely summable,
. Such covariance summability does not hold for LRD processes satisfying (1).
For weakly dependent time series fulfilling a mixing condition, subsampling techniques have been developed for inference on the distribution of a variety of statistics, including the sample mean (Carlstein, 1986; Künsch, 1989; Politis and Romano, 1994; Hall and Jing, 1996). However, the sampling window method of this paper applies to linear time processes that may exhibit either short-range dependence or long-range dependence. In particular, we require no mixing assumptions on the process {Yt} under weak dependence.
THEOREM 3. Suppose m1n,m2n satisfy (5), [ell ]−1 + n−1[ell ] = o(1) and Assumption L holds after replacing condition (1) with a condition of weak dependence:
with
. Defining
, the convergence results of both Theorem 1 and Theorem 2 remain valid.
With the convention that we define
under short-range dependence, both (4) and the scaling dn2 = n2−αL(n) in (3) are correct for short-range dependence; that is,
. The same subsampling method can applied to distribution estimation of the sample mean, in addition to interval estimation, under both SRD and LRD classifications of a linear time series.
In the following discussion, let σn2 = n2 Var(Yn). Denote the supremum norm
for a function
and let Φ denote the standard normal distribution function. Unless otherwise specified, limits in order symbols are taken letting n → ∞.
We first state a useful result concerning moments of the sample mean Yn from a LRD linear process. Lemma 1(a) follows from the proof of Theorem 18.6.5 in Ibragimov and Linnik (1971) and bounds sums of consecutive filter coefficients in terms of the standard deviation of nYn; part (b) of Lemma 1 corresponds to Lemma 4 of Davydov (1970).
LEMMA 1. Suppose Assumption L holds. For all
,
(a) and for all
,
(b) E{[n(Yn − μ)]2k} ≤ Ak(σn2)k for some Ak > 0, if E(|ε0|2k) < ∞ for a given
.
For the proof of Theorem 1, we define
. Note that
differs from the sampling window estimator
from Section 3.1 by centering subsample sums with [ell ]μ rather than [ell ]Yn. We also require the following result for LRD linear processes with bounded innovations. For these series, Lemma 2(a) shows that standardized subsample sums based on well-separated blocks are asymptotically uncorrelated, whereas Lemma 2(b) establishes the convergence of
. We defer the proof of Lemma 2 to Section 6.2.
LEMMA 2. Suppose the conditions of Theorem 1 hold with bounded innovations, that is, P(|εt| ≤ B) = 1 for some B > 0. Then, as n → ∞,
(a) for any nonnegative integers a,b and 0 < ε < 1,
where
is a standard normal variable. For a > 0, E(Za) = (a − 1)(a − 3)…(1) for even a; 0 otherwise.
(b)
.
Proof of Theorem 1. We note that (4) and the assumption [ell ]−1 + n−1+δ[ell ] = o(1) imply
because L is positive and xγL(x) → ∞, x−γL(x) → 0 as x → ∞ for any γ > 0 (Ibragimov and Linnik, 1971, App. 1). We can bound
and
for each ε > 0. From (7), we find P([ell ]|Yn − μ|/d[ell ] > ε) = o(1) by Chebychev's inequality using σn2 ∼ dn2 from (4). From the continuity of Φ, it follows that ∥Fn − Φ∥∞ = o(1) by (3) and also that
as ε → 0. Hence, it suffices to show
or, equivalently as a result of Φ's continuity,
for each
. We will prove
for
.
Let E(εt2) = τ2 > 0. For each
, define variables
, where τb2 = E(ε0,b2). (We may assume τb2 > 0 w.l.o.g. in the following discussion.) For each
, write
to denote the analogs of
with respect to Y1,b,…,Yn,b. Note that both series {Yt} and {Yt,b} involve the same linear filter with i.i.d. innovations of mean 0 and variance τ2 and hence have the same covariances; in particular, we may set dn,b2 = dn2 for
. For any
,
Letting D[ell ],b = (S[ell ]1,b − S[ell ]1)/d[ell ],
where P(|D[ell ],b| > ε) ≤ Var(S[ell ]1 − S[ell ]1,b)/(ε2d[ell ]2), and we deduce
by the i.i.d. property of innovations. Hence, for any
,
using (4), A2n,b(x) = o(1) as n → ∞ by Lemma 2(b), and |F[ell ](x) − Φ(x)|,∥F[ell ],b − Φ∥∞ = o(1) as n → ∞ by (3). Because
as ε → 0, the proof of Theorem 1 is finished. █
Proof of Theorem 2. Let m denote mkn, k ∈ {1,2}, and define
for Nm = n − m + 1. Using Hölder's inequality, (4), and (7), we can show that
With the truncated variables from the proof of Theorem 1, let
with
; again the processes {Yt} and {Yt,b} have the same covariances for all
. In a fashion similar to (8), we find by Hölder's inequality,
using σm2 = Var(Sm1) = Var(Sm1,b) and Var(Sm1 − Sm1,b) = 2σm2{1 − τb−1τ−1E(ε0ε0,b)} from before.
Applying Lemma 2(a) and the bound on E{(Sm1,b − mμ)4} from Lemma 1(b) with (4),
holds for each
. Then
follows from using (8)–(10) to deduce
and then applying limb→∞ τb−1E(ε0ε0,b) = τ. Because
and (5) and (6) imply
the convergence
now follows. From this and Theorem 1, we find the convergence of
in probability as in Hall et al. (1998). █
Proof of Theorem 3. From Corollary 6.1.1.2 of Fuller (1996), we have
. Lemmas 1 and 2 and the same proofs of Theorems 1 and 2 (including Lemma 2) apply with the convention that α = 1,
under short-range dependence. The one modification is that (7) still holds if [ell ]−1 + [ell ]/n = o(1). █
Proof of Lemma 2(b). The result follows from Theorem 2.4 of Hall et al. (1998) and its proof after verifying that the conditions required are met: the process {Yt} has all moments finite by assumption; n(Yn − μ)/σn converges to a (normal) continuous distribution by (3) which is uniquely determined by its moments; ([ell ]2σn2)/(n2σ[ell ]2) = o(1) by (4) and (7); and Lemma 2(a) holds. █
Proof of Lemma 2(a). It suffices to consider only positive
, because Lemma 1(b) with (3) implies that E[(S[ell ]1*)a] → E(Za) as n → ∞ for any nonnegative a.
We establish some additional notation. For
, write the standardized subsample sum
using
and set E(εt2) = 1 throughout the proof because of standardization; we suppress here the dependence of dk(i) on [ell ] in our notation. Write the nonnegative integers as
. For
, denote integer vectors
; write a set
and define with
,
where indices in the sum
extend over integer m-tuples
with distinct components kj ≠ kj′ for 1 ≤ j ≠ j′ ≤ m. We will later use that
holds for
so that certain sums Δi,a,b(s(m),t(m)) are finitely defined. We omit the proof of (11) here (in light of showing (14) to follow, which uses similar arguments).
Because the innovations are i.i.d. with E(εt) = 0, it holds that
for
and integers
unless, for each 1 ≤ j ≤ a + b, there exists some j′ such that kj = kj′, implying less than [lfloor ](a + b)/2[rfloor ] distinct integer values among (k1,…,ka+b). Using this and
for any
, we rewrite (12) as a sum over collections of integer indices (k1,…,ka+b) with 1 ≤ m ≤ [lfloor ](a + b)/2[rfloor ] distinct values:
where the sum [sum ](W1,…,Wm) is taken over all size m partitions (W1,…,Wm) of
. The indicator function I {·} in the bracketed sum in the preceding expression signifies that, for a given partition (W1,…,Wm), we sum terms in (12) over integer indices
satisfying kj = kj′ if and only if j,j′ ∈ Wh, h = 1,…,m. We can more concisely write (12) as
in terms of vectors
as a function of a partition (W1,…,Wm). By the nature of the partitions (W1,…,Wm), the vectors (s(m),t(m)) in (13) are elements of Ba,b,m for some 1 ≤ m ≤ [lfloor ](a + b)/2[rfloor ] (any set Wh in a partition (W1,…,Wm) has at least two elements by the restriction m ≤ [lfloor ](a + b)/2[rfloor ] so that min1≤j≤m(sj + tj) ≥ 2 follows).
To help identify the most important terms in the summand (13), we define a count
as a function of
and also a special indicator function
. Now to show Lemma 2(a), it suffices to establish for any
that
If either a or b is odd, so that E(Za)E(Zb) = 0, Lemma 2(a) follows immediately from (13) and (14). In the case that a,b are even, we find that the dominant component in (13) involves the sum over partitions (W1,…,Wm) with m = (a + b)/2 and corresponding (s(m),t(m)) that satisfy
or equivalently sj,tj ∈ {0,2}, st + tj = 2 for 1 ≤ j ≤ m = (a + b)/2; in this instance, Ψa,b(s(m),t(m)) = 1 (by E(ε02) = 1) and a size m = (a + b)/2 partition (W1,…,W(a+b)/2) of {1,…,a + b} is formed by a size a/2 a partition of {1,…,a} and a size b/2 partition of {a + 1,…,b + a}; there are exactly (a − 1)(a − 3)…(1) × (b − 1)(b − 3)…(1) such partitions possible. So for even a,b with m = (a + b)/2, it holds that
which with (13) and (14) implies that Lemma 2(a) follows for a,b even.
We now focus on proving (14) by treating two cases:
. For
, define subsample sum covariances
. Although we cannot assume that
under long-range dependence, it holds that
by applying Lemma 1(a). Similarly, if
with min1≤j≤m max{sj,tj} ≥ 2, then
follows from (15), where ω[ell ] = o(1).
Case 1.
. We show that (14) holds with an induction argument on m. Consider first the possibility m = 1, for which (s(m),t(m)) = (s1,t1) = (a,b). If
, then a = b = 1; in this case,
. For 0 < ε < 1 and large n such that nε/2 > [ell ], the growth rate in (1) with (4) gives
defining Mn,ε = sup{L(tn) : ε/2 < t < 2} and using (7) with Mn,ε /L(n) → 1 by Taqqu (1977, Lem. A1). If
, then either a = s1 > 1 or b = t1 > 1 for
, and (14) follows from (16) and O(ω[ell ]a+b−2m) = o(1).
Now assume that, for some
, (14) holds whenever
with
and m ≤ min{T,[lfloor ](a + b)/2[rfloor ]}. We let
with
in the following discussion and show that (14) must hold by the induction assumption.
If
, then min1≤j≤m max{sj,tj} ≥ 2 holds and (14) follows from (16) and O(ω[ell ]a+b−2m) = o(1) because a + b > 2m, which we verify. If a,b are even, then m < (a + b)/2 must hold by
; if a or b is odd, then a + b > 2m follows because (s(m),t(m)) ∈ Ba,b,m and max1≤j≤m{sj + tj} > 2 must hold (the alternative by (s(m),t(m)) ∈ Ba,b,m with
is that sj,tj ∈ {0,2} for 1 ≤ j ≤ m, so that
are even, a contradiction).
Consider now the possibility that
. Because m = T + 1 > 1 with a,b > 1 necessarily, say (w.l.o.g.) that components sm = tm = 1 in s(m) = (s1,…,sm), t(m) = (t1,…,tm). Using
, we can algebraically rewrite the sum
where (s0(m−1),t0(m−1)) ∈ Ba−1,b−1,m−1 for s0(m−1) = (s1,…,sm−1), t0(m−1) = (t1,…,tm−1), and (sj(m−1),tj(m−1)) ∈ Ba,b,m−1 for sj(m−1) = s0(m−1) + ej(m−1), tj(m−1) = t0(m−1) + ej(m−1), writing the jth coordinate vector
. Note that
because m − 1 < (a + b)/2. By the induction assumption on terms Δi,a,b(sj(m−1),tj(m−1)), j > 0, in (18) along with (11) and (17), we find that (14) holds whenever
with m = T + 1. This completes the induction proof and the treatment of Case 1.
Case 2.
. Here a,b are even, and the components of (s(m),t(m)) satisfy sj ∈ {0,2}, tj = 2 − sj for 1 ≤ j ≤ m = (a + b)/2. By (15),
follows for any
. Using this and algebra similar to (18), we can iteratively write Δi,a,b(s(m),t(m)) as a sum by parts equal to
where 1 ≤ h ≤ m − 1 = (a + b − 2)/2. With an argument as in (16), we find a uniform bound |Rh,i,a,b(s(m),t(m))| ≤ (a + b)ω[ell ]2/2 = o(1) for each
, 1 ≤ h ≤ m − 1. Hence, (14) follows for Case 2. The proof of Lemma 2(a) is now complete. █