1 Introduction
Conditionally heteroscedastic processes are frequently used to model the evolution of stock prices, exchange rates and interest rates. Starting with the seminal papers by Engle [Reference Engle15] on autoregressive conditional heteroscedastic models (ARCH) and Bollerslev [Reference Bollerslev3] on generalized ARCH, numerous variants of these models have been proposed for modelling financial time series; see, for example, Francq and Zakoïan [Reference Francq and Zakoïan20] for a detailed overview. More recently, integer-valued GARCH models (INGARCH) which mirror the structure of GARCH models have been proposed for modelling time series of counts; see, for example, Fokianos [Reference Fokianos, Subba Rao, Subba Rao and Rao16].
In this paper, we prove existence and uniqueness of a stationary distribution under a time-homogeneous dynamic. As our main result, we show absolute regularity of the observable process under the semi-contractive condition (1.5) rather than a more common fully contractive condition on the volatility function. In conjunction with standard conditions (A1) and (A3) given in Section 2, this results in an atypical decay rate for the coefficients of absolute regularity,
Our technique allows us to obtain this strong result even for nonstationary models with a nonhomogeneous dynamic, under uniform (in t) versions of our regularity conditions. This opens a wide range of applications for modelling real data sets.
The results hold for general GARCH processes obeying the model equations
Here (εt)t is a sequence of independent and identically distributed (i.i.d.) random variables, where εt is independent of all lagged random variables and \[\mathbb{E}\varepsilon _t^2 = 1\]. A general INGARCH process is characterized by the model equations
where $${{\cal F}_s} = \sigma (({Y_s},{\lambda _s}),({Y_{s - 1}},{\lambda _{s - 1}}), \ldots )$$ and, analogously to the GARCH case,
Here {Q(λ) : λ ≥ 0} is a family of distributions on the nonnegative integers. An important aspect is that such models allow for a feedback mechanism in the hidden process which often makes a parsimonious parametrization possible. Absolute regularity (β-mixing) with a geometric decay rate of the coefficients of standard (linear) GARCH(p, q) processes was shown in the doctoral thesis of Boussama [Reference Boussama4]. Geometric β-mixing for nonlinear GARCH(1,1) specifications can be found in [Reference Carrasco and Chen6, Proposition 5] and [Reference Francq and Zakoïan19, Theorem 3]. Properties of INGARCH processes have already been studied under a fully contractive condition,
where
and a 1, …, ap and b 1, …, bq are nonnegative constants such that $$\sum\nolimits_{i = 1}^p {a_i} + \sum\nolimits_{j = 1}^q {b_j} < 1$$. Neumann [Reference Neumann27] showed, in the case of p = q =1, that condition (1.4) implies that the bivariate process ((λt, Yt))t has a unique stationary distribution and that a stationary version of the count process (Yt)t is absolutely regular with mixing coefficients βn = O (ρn) for some ρ < 1. It was also shown that the intensity process (λt)t is not strongly mixing in general (see Remark 3 in that paper for a simple counterexample), but ergodic. Franke [Reference Franke21] showed in the case of p, q ≥ 1 that there exists a stationary distribution. Moreover, he proved τ-weak dependence as defined in [Reference Dedecker, Doukhan, Lang, León, Louhichi and Prieur8], again with an exponential decay of the coefficients of weak dependence. Also, under a fully contractive condition, Fokianos et al. [Reference fokianos, Rahbek and Tjøstheim18] analysed linear and nonlinear versions of INGARCH(1,1) processes. Since the verification of geometric ergodicity turned out to be unclear with conventional Markov chain theory, these authors proved ergodicity for a perturbed version of the original process. As the perturbations can be chosen arbitrarily small, this result could be used to derive the asymptotic distribution of parameter estimates.
We will cover both GARCH and INGARCH models, and we want to stress that we impose a contractive condition considerably weaker than (1.4),
where c 1, …, cq are nonnegative constants with c 1 + · · · + cq < 1. This allows us to consider, for example, threshold models where the function f is specified as
Such a specification was proposed in the framework of integer-valued time series by Woodard et al. [Reference Woodard, Matteson and Henderson32]. Furthermore, our semi-contractive condition also allows us to consider functions f with
and with only Lip(h) < 1. Note that well-established threshold models in financial mathematics, such as those proposed for example by Glosten, Jagannathan and Runkle [Reference Glosten, Jagannathan and Runkle22],
or by Francq and Zakoïan [Reference Francq and Zakoïan20, p. 250],
even fulfil the fully contractive condition (1.4).
To unify our notation, we use the expression (λt)t for the hidden process in what follows, that is, $$\sigma _t^2$$ will be replaced by λt in the case of a GARCH process. It is worth noting at this point that, although the bivariate process ((Yt, λt))t is a Markov chain of order p ⊦ q, the process (Yt)t does not share this property, except for the case when q = 0, which is not of primary interest here.
We show as our main result that the coefficients of absolute regularity of the observable process (Yt)t satisfy (1.1). Recall that $$\beta _n = \mathop {\sup }\nolimits_k \beta ({\cal F}_{ - \infty }^k, {\cal F}_{k + n}^\infty )$$ with $${\cal F}_k^l = \sigma (Y_s :k \le s \le l)$$, where, for any couple of σ-fields $${\cal A}$$ and $${\cal B}$$,
where the supremum is taken over partitions of Ω, (Ai)1≤i≤ℓ, and (Bj)1≤i≤m subject to $$A_i \in {\cal A}$$ for 1 ≤ i ≤ ℓ, and $$B_j \in {\cal B}$$ for 1 ≤ j ≤ m. This subexponential rate is quite unusual and it is a consequence of the fact that we only impose a semi-contractive rather than a fully contractive condition.
To prove this result, we construct a coupling of two versions of the bivariate process ((Yt, λt))t, both started independently at time 0 with the stationary distribution. These two versions, $${(({\widetilde Y_t},{\widetilde \lambda _t}))_t}$$ and $${(({\widetilde Y_{t'}},\widetilde \lambda _t^'))_t}$$, are defined on a sufficiently rich probability space \[({\widetilde\Omega}, {\widetilde{\cal F}},{\widetilde {\mathbb P}})\]. In the context of Markov chains, such a coupling typically leads to a coalescence of the two versions at some random time τ and \[{\widetilde {\mathbb P}}(\tau > n)\] then serves as an estimate of βn. In our case, since (Yt)t is not a Markov chain, it can well happen that $$\widetildeY_\tau = \widetildeY_{\tau ‘}$$ at some time τ, but that afterwards these two processes diverge again. This follows from the fact that the accompanying hidden processes $${({\widetilde \lambda _t})_t}$$ and $$(\widetilde\lambda _{t'} )_t$$ can still attain different values at time τ, which means that the observable processes may diverge again with positive probability. In view of this, we have to use \[\mathbb{P}(\widetildeY_m \ne \widetildeY_{m'}\] for any m ≥ n) as an upper estimate for βn. When the two processes reach a state with
then we have p subsequent hits and the contractive condition begins to take effect which eventually leads to the result that both processes coalesce with a (conditional) probability exceeding $$1 - O(\rho ^{\sqrt n } )$$. To reach such a state with the crucial property (1.7), the two processes need several trials, beginning at certain stopping times τ1, τ2, … . Because of the condition of
in (1.7), each of these trials covers in order $$\sqrt n$$ time points. This means that, up to time n, there can be in order at most $$\sqrt n$$ such trials. Such a number of successive trials ensures that a state with (1.7) is reached before time n with a probability exceeding $$1 - O(\rho ^{\sqrt n } )$$. This might give some insight as to why we obtain the unusual rate of $$\rho ^{\sqrt n }$$ for the coefficients of absolute regularity. The desired uniqueness of the stationary law follows as a by-product of the successful coupling. The result on absolute regularity can be extended to nonstationary GARCH-type processes; a uniform (in t) version of our semi-contractive condition will ensure this.
The paper is organized as follows. In the next section we fix and discuss our assumptions. Our main results are based on a coupling technique which is introduced in Subsection 2.1. To make the main ideas of our proofs easily accessible, we present the consequences of this coupling for a simple special case in Subsection 2.2. The main results are formulated in Subsection 2.3. A few applications in statistics are mentioned in Subsection 2.4. All proofs are deferred to a final Section 3.
2 Assumptions and main results
We assume that the process (Yt)t, which is defined on some probability space $$(\Omega, {\cal F},P)$$, obeys the model equations
where $${\cal F}_s = \sigma ((Y_s, \lambda _s ),(Y_{s - 1}, \lambda _{s - 1} ), \ldots )$$ and {Q(λ) : λ ∈ [0,∞)} is some family of univariate distributions. Note that assumption (2.1a) is correctly formulated since it follows from (2.1b) that λt is $${\cal F}_{t - 1}$$-measurable.
The canonical domain of the function f is different in the two cases of GARCH and INGARCH models. To unify notation, we define f in both cases on ℝp × [0,∞)q, e.g. by a linear interpolation in the INGARCH case. Recall that (λt)t denotes the volatility process in the case of GARCH(p, q) models ((1.2a)–(1.2b))and the intensity process in the INGARCH(p, q) case ((1.3a)–(1.3b)). Here, the distribution of an observable random variable Yt conditioned on the past is Q(λt), where the parameter λt itself is random, depending on lagged variables Yt −1, …, Yt −p and previous values λt −1, …, λt −q of the (typically hidden) accompanying process (λt)t.
Possible examples we have in mind are linear or nonlinear GARCH(p,q) processes, with λt being the conditional variance of the observable variable Yt, or integer-valued GARCH processes, where Q(λ) is often chosen to be a Poisson distribution with intensity parameter λ. Existence of a one-sided version of these processes, i.e. t ∈ ℕ, is guaranteed since we can construct such processes iteratively. We will show that there exists a stationary distribution which implies by Kolmogorov’s extension theorem (see, e.g. [Reference Durrett14]) that a stationary two-sided version, i.e. t ∈ ℤ, also exists. In the proof of our main result, we also use some Markov chain techniques. The process (Zt)t with $$Z_t = (Y_t^2, \ldots, Y_{t - p + 1}^2, \sigma _t^2, \ldots, \sigma _{t - q + 1}^2 )$$ for a GARCH(p,q) model obeying (1.2a) and (1.2b) as well as Zt = (Yt, …, Yt − p + 1, λt, …, λt −q+1) in the INGARCH(p,q) case according to (1.3a) and (1.3b) has this property. In the following it turns out to be convenient to drop the first component of the random vector Zt and we also define $$X_t = (Y_{t - 1}^2, \ldots, Y_{t - p + 1}^2, \sigma _t^2, \ldots, \sigma _{t - q + 1}^2 )$$ as well as Xt =(Yt −1, …, Yt −p+1, λt, …, λt −q+1), respectively.
We impose the following conditions.
(A1) (Geometric drift condition.) There exist positive constants a 1, …, ap −1, b 0, …, bq −1, κ < 1, and a 0 < ∞ such that, for
V((\kern1pt y_1,\ldots,y_{p-1};\,\lambda_0,\ldots,\lambda_{q-1})) = \sum_{i=1}^{p-1} a_i y_i + \sum_{j=0}^{q-1} b_j\lambda_j,the condition\begin{equation} \mathbb E( V(X_t) \mid X_{t-1} ) \leq \kappa V(X_{t-1}) + a_0 \end{equation}is fulfilled with probability 1.(A2) (Semi-contractive condition.) The function f is measurable and there exist nonnegative constants c 1, …, cq with c 1 + · · · + cq < 1 such that
\begin{equation} |\,f(\kern1pt y_1,\ldots,y_p;\, \lambda_1,\ldots,\lambda_q) - f(\kern1pt y_1,\ldots,y_p;\, \lambda_1',\ldots,\lambda_q')| \leq \sum_{i=1}^q c_i |\lambda_i-\lambda_i'| \end{equation}for all y 1, …, yp ∈ ℝ, $$\lambda _1, \ldots, \lambda _q, \lambda _{1'}, \ldots, \lambda _{q'} \ge 0$$.(A3) (Similarity condition.) There exists some constant δ ∈ (0,∞) such that
$${\kern 1pt} TV{\kern 1pt} (Q(\lambda ),\,Q(\lambda ')) \le 1{ - ^{ - \delta |\lambda - \lambda '|}}\quad {\rm{for}}\,{\rm{all}}\,\lambda ,\lambda ' \ge 0,$$where $${\kern 1pt} TV{\kern 1pt} (Q_1, Q_2 ) = \mathop {\sup }\nolimits_{A \in {\cal B}} |Q_1 (A) - Q_2 (A)|$$ denotes the total variation distance between probability measures Q 1 and Q 2.
Remark 2.1. In the case in which p = q = 1, Xt reduces to λt. Condition (A1) follows from the following drift condition which is frequently used in the context of linear and nonlinear GARCH-type models; see, e.g. [Reference Lindner and Mikosch26] and [Reference Franke21].
(A1′) There exist constants ā 0 ∈ [0,∞), and ā 1, …, āp, ā1, $${\bar b_1}, \ldots ,{\bar b_q} \in [0,1)$$, with $$\sum\nolimits_{i = 1}^p {\bar a_i} + \sum\nolimits_{j = 1}^q {\bar b_j} < 1$$ such that
• in the GARCH(p,q) case,
\begin{equation} \sigma_t^2 \leq \bar{a}_0 + \bar{a}_1 Y_{t-1}^2 + \cdots + \bar{a}_p Y_{t-p}^2 + \bar{b}_1 \sigma_{t-1}^2 + \cdots + \bar{b}_q \sigma_{t-q}^2, \end{equation}• in the INGARCH(p,q) case,
\begin{equation} \lambda_t \leq \bar{a}_0 + \bar{a}_1 Y_{t-1} + \cdots + \bar{a}_p Y_{t-p} + \bar{b}_1 \lambda_{t-1} + \cdots + \bar{b}_q \lambda_{t-q}. \end{equation}
Remark 2.2. Condition (A2) is the essential difference to the fully contractive condition imposed in, e.g. [Reference Neumann27] and [Reference Truquet29]. Here, we only assume Lipschitz continuity of f with respect to lagged values λt −1, …, λt −q. This includes the case of threshold models where the thresholds are set on the lagged variables of the observable process, $$Y_{t - 1}^2, \ldots, Y_{t - p}^2$$ or $$Y_{t - 1}, \ldots, Y_{t - p}$$, respectively.
Remark 2.3. With the standard specification for GARCH models, we have
that is, λt takes the role of the conditional volatility $$\sigma _t^2$$. Let pλ be the density of an $${\cal N}(0,\lambda )$$ distribution. If the volatilities satisfy λt ≥ ω then we obtain, for 0 < ω ≤ λ ≤ λ ',
that is, the similarity condition (A3) is fulfilled with δ =1/ω. (In order to prove the third inequality in the above display, note that 1+u ≤ eu for all u ≥ 0, which implies that λ'/λ = 1 + (λ' − λ)/λ ≤ e|λ'−λ|/λ.)
While a normal distribution seems to be the dominating choice for the distribution of the innovations in GARCH models, there exist quite a few proposals for their integer-valued counterparts, the INGARCH models. For the sake of an easy description, let \[(\mathcal{P}_t (\lambda ))_{\lambda \geqslant 0}, {\kern 1pt} t \in \mathbb{Z}\], be a sequence of independent standard Poisson processes.
1. Poisson seed. If Q(λ)=Poisson(λ) then Yt can be expressed as $$Y_t = {\cal P}_t (\lambda _t )$$.
2. Mixed Poisson seed. Here we have the specification $$Y_t = {\cal P}_t (\lambda _t Z_t )$$, where Zt is a nonnegative random variable. The special case of a Bernoulli-distributed random variable Zt leads to the so-called zero-inflated Poisson model in [Reference Lambert23]; it takes into account additional unobserved data.
3. Compound Poisson seed. Let (Zt,i)t,i≥0 be a double sequence of i.i.d. nonnegative random variables. In this case, Yt is given by $$Y_t = \sum\nolimits_{i = 1}^{{\cal P}_t (\lambda _t )} Z_{t,i}$$. This process is integer valued if \[\mathbb{P}(Z_{t,i} \in \mathbb{N}_0 ) = 1\].
In cases 1 and 3, the similarity assumption (A3) is fulfilled with δ = 1; see [Reference Adell and Jodrá1]. Regarding case 2, let QMP(λ) denote the mixed Poisson distribution with intensity parameter λ. Then,
where \[\delta = \mathbb{E}Z_t\].
Remark 2.4. For two probability measures Q 1 and Q 2 on $${\cal B}$$, let d 1 = dQ 1/d(Q 1 +Q 2) and d 2 = dQ 2/d(Q 1 +Q 2) be the respective densities with respect to the dominating measure Q 1 +Q 2. Then
Furthermore, using the method of maximal coupling as described, for example, in [Reference Den Hollander9, p. 15], we can construct, with the aid of an additional randomization, random variables X 1 and X 2 such that
• X 1 ~ Q 1, X 2 ~ Q 2,
• P(X 1 = X 2) = Δ.
Indeed, let U be a random variable with a uniform distribution on [0,1]. If U ≤ Δ then we choose
where $$F(x) = \int_{( - \infty, x]} d_1 \wedge d_2 (Q_1 + Q_2 )$$. Here and below, H −1 denotes the generalized inverse of a generic distribution function H, that is, H −1(t) = inf{x : H(x) ≥ t}. (This function is sometimes denoted by H←.) This definition makes sense no matter if the distribution H is a continuous or discrete distribution. If U > Δ then we set
where $$G_i (x) = \int_{( - \infty, x]} (d_i - d_1 \wedge d_2 )(Q_1 + Q_2 )$$ for I = 1, 2.
2.1 Definition of the coupling
We use a coupling approach to prove stationarity and absolute regularity of the GARCH-type process. In the case of a stationary Markov chain \[(Z_t )_{t \in \mathbb{N}_0 }\] defined on some probability space \[(\Omega, \mathcal{F},\mathbb{P})\], one usually constructs, on an appropriate probability space \[(\widetilde\Omega, \widetilde\mathcal{F},\widetilde\mathbb{P})\], two versions \[(\widetildeZ_t )_{t \in \mathbb{N}_0 }\] and \[(\widetildeZ'_t )_{t \in \mathbb{N}_0 }\] of this chain which are started at t = 0 independently, both with their stationary distribution. If one succeeds to construct a coupling such that \[\widetilde\mathbb{P}(\widetildeZ_m \ne \widetildeZ'_m\] for any m ≥ n) tends to 0 as n → ∞, then the inequality
provides an upper bound for the mixing coefficient. However, since a Markov process in discrete time is always strongly Markovian, it actually suffices to derive an upper estimate for \[\widetilde\mathbb{P}(\widetildeZ_n \ne \widetildeZ'_n )\] and we can conclude that the original process \[(Z_t )_{t \in \mathbb{N}_0 }\] on \[(\Omega, \mathcal{F},\mathbb{P})\] is absolutely regular with coefficients satisfying \[\beta _n \leqslant \widetilde\mathbb{P}(\widetildeZ_n \ne \widetildeZ'_n )\]. In our case, the process (Yt)t is not a Markov chain. Once we have constructed a coupling of $${(({\widetilde Y_t},{\widetilde \lambda _t}))_t}$$ and $${((\widetilde Y{'_t},\widetilde \lambda {'_t}))_t}$$, we have to stick to the estimate (2.2). (Even if $$\widetildeY_n = \widetildeY_{n'}$$$$\widetildeY_{n + 1} = \widetildeY_{n + 1} ‘$$, it could well happen that $$\widetilde\lambda _n \ne \widetilde\lambda _{n'}$$, which means that we cannot achieve $$\widetildeY_{n + 1} = \widetildeY_{n + 1} '$$ with a conditional probability of 1.) This means that we are required to find a construction where the two versions hit at some time and stay together afterwards (they coalesce).
Suppose that pre-sample values $${\widetilde Y_0}, \ldots ,{\widetilde Y_{1 - p}},\,{\widetilde \lambda _0}, \ldots ,{\widetilde \lambda _{1 - q}}$$ and $$\widetilde Y_0^', \ldots ,\widetilde Y_{1 - p}^',\widetilde \lambda _0^', \ldots ,\widetilde \lambda _{1 - q}^'$$ are given. The values of $${\widetilde \lambda _1}$$ and $$\widetilde\lambda _{1'}$$ arise as a result of the model equation (2.1b),
Note that the conditional distribution of $${\widetilde Y_1}$$ given the past has to be $$Q({\widetilde \lambda _1})$$ and that of $$Q(\widetilde\lambda _{1'} )$$. We couple the two Markov chains in such a way that $$\widetildeY_t = \widetildeY_{t'}$$ with a maximum conditional probability. According to Remark 2.4, we utilize a sequence \[(U_t )_{t \in \mathbb{N}}\] of i.i.d. random variables with a uniform distribution on the interval [0,1], also independent of $$(\widetildeY_0, \widetildeY_{0'}, \widetilde\lambda _0, \widetilde\lambda _{0'} ),(\widetildeY_{ - 1}, \widetildeY_{ - 1} ‘,\widetilde\lambda _{ - 1}, \widetilde\lambda _{ - 1} ‘)$$, etc.
Let
If $${U_1} \le {\bar q_1}$$ then we define
where
If $${U_1} \le {\bar q_1}$$ then we set
where
We iterate this process in the same way.
Let
Furthermore, denote by Ft, Gt, and $$G_{t'}$$ the distribution functions of the densities $$(q_t \wedge q_{t'} )$$, $$(q_t - (q_t \wedge q_{t'} ))$$, and $$(q_{t'} - (q_t \wedge q_{t'} ))$$, respectively. On the basis of given values
we set
as well as
and
2.2 A first glimpse at the consequences of the coupling
To communicate the main ideas involved in the proofs in a transparent way, we first consider the special case of an INGARCH(1,1) process and present a sketch of the major steps in the proofs of the results. For definiteness, we assume that $$Y_t |{\cal F}_{t - 1} \sim {\kern 1pt} Poisson{\kern 1pt} (\lambda _t )$$.
Note that TV(Poisson(λ), Poisson(λ′)) ≤ 1−e−|λ−λ′|. . To see this, assume without loss of generality that λ ≤ λ ′. If Y ∼Poisson(λ) and W ∼Poisson(λ ′ − λ) are independent, then Y ′ = Y +W ∼Poisson(λ′). It follows that P(Y ≠ Y ′) = P(W = 0) = 1 – e −|λ−λ′|, which implies that the similarity condition (A3) is satisfied with δ = 1.
Let $${{\cal G}_t} = \sigma (({\widetilde Y_t},{\widetilde Y_{t'}},{\widetilde \lambda _t},{\widetilde \lambda _{t'}}),({\widetilde Y_{t - 1}},{\widetilde Y_{t - 1}}',{\widetilde \lambda _{t - 1}},{\widetilde \lambda _{t - 1}}'), \ldots )$$ denote the σ-field of the t-past of both versions of the processes. Suppose that τ is some stopping time and that, for some reason, $$|\widetilde\lambda _{\tau + 1} - \widetilde\lambda _{\tau + 1} ‘| \le K$$. Note that $${\widetilde \lambda _{\tau + 1}}$$ and $$\widetilde\lambda _{\tau + 1} ‘$$ are both $${\cal G}_\tau$$-measurable, where
Then, according to the maximal coupling explained above,
If, in addition, $$\widetildeY_{\tau + 1} = \widetildeY_{\tau + 1} ‘$$, then the contractive condition (A2) implies that
Therefore, for the next step, we obtain
and, if additionally $$\widetildeY_{\tau + 2} = \widetildeY_{\tau + 2} ‘$$,
Proceeding in the same way we obtain
which leads to
In what follows we sketch how (2.5) can be used to prove absolute regularity. Let \[\widetilde\mathbb{P}_\pi\] denote the probability where $$({\widetilde Y_0},{\widetilde \lambda _0})$$ and $$(\widetildeY_{0'}, \widetilde\lambda _{0'} )$$ are independent and distributed with their common stationary law π. (Its existence and uniqueness is proved in Corollary 2.1 below.) We define the stopping time
for some C < ∞ and some α > 0 whose optimal choice is explained below. We obtain, from (2.5),
It remains to derive an upper estimate for the second term on the right-hand side of (2.6). To this end, we consider subsequent trials to achieve a state with $$|\widetilde\lambda _t - \widetilde\lambda _t ‘| \le C_1$$ for some C 1 ∈ (0,∞), followed by subsequent hits $$\widetildeY_t = \widetildeY_t ‘, \ldots, \widetildeY_{t + d_n - 1} = \widetildeY_{t + d_n - 1} ‘$$, where dn =[nα]. We define a first stopping time as
(If $$((\widetilde\lambda _t, \widetilde\lambda _{t'} ))_t$$\widetilde\lambda _0 + \widetilde\lambda _{0'} \le C_1$$ then τ 1 = 0. Otherwise, τ 1 is the first arrival time of the process $$((\widetilde\lambda _t ,\widetilde\lambda _{t'} ))_t $$ at A := {(u 1, u 2) : u 1 + u 2 ≤ C 1}.) At time τ 1 we have $$|\widetilde\lambda _{\tau _1 } - \widetilde\lambda _{\tau _1 } ‘| \le C_1$$. According to (2.4), there exists some constant C2 > 0 such that
After such a successful trial with dn hits, we obtain, from the contractive property (A2),
This yields
which brings us closer to the desired result. This means that a trial which actually leads to a favourable state with (2.7) covers dn time points. Accordingly, for i > 1, we consider the following retarded return times as starting points for the next trials:
Now we are in a position to derive an upper bound for \[\widetilde\mathbb{P}_\pi (\tau ^{(n)} \geqslant n)\]. We define events
Since each trial covers dn time points we cannot get more than O(n1−α) different stopping times τ1 before time n. Let Kn = C 3n 1−α for some C 3 > 0. It follows from Lemma 3.1 that
for some η > 1 and ρ < 1, if C 3 is small enough. Therefore, and since
we obtain
for some ρ < 1. The first term on the right-hand side of (2.6) and the second term on the right-hand side of (2.8) are of the same order for the choice of $$\alpha = {\textstyle{1 \over 2}}$$, which gives the estimate
2.3 Main results
To prove our main results, we use the coupling method described in Subsection 2.1. Recall that $${(({\widetilde Y_t},{\widetilde \lambda _t}))_t}$$ and $$((\widetildeY_{t'}, \widetilde\lambda _{t'} ))_t$$ denote the two versions of the process which are coupled on a suitable probability space \[(\widetilde\Omega, \widetilde\mathcal{F},\widetilde\mathbb{P})\] according to (2.3a) and (2.3b). Moreover, we remind the reader that $${\cal G}_t = \sigma ((\widetildeY_t, \widetildeY_{t'}, \widetilde\lambda _t, \widetilde\lambda _{t'} ),(\widetildeY_{t - 1}, \widetildeY_{t - 1} ‘,\widetilde\lambda _{t - 1}, \widetilde\lambda _{t - 1} ‘), \ldots )$$. The following lemma describes the core of our coupling method.
Lemma 2.1. Suppose that (A1)–(A3) are fulfilled, and let τ be any stopping time such that $$\widetildeY_\tau = \widetildeY_{\tau ‘}, \ldots, \widetildeY_{\tau - p + 2} = \widetildeY_{\tau - p + 2} ‘$$. Then
where c = c 1 + · · · + cq.
This lemma tells us that the two processes $${({\widetilde Y_t})_t}$$ and $$(\widetildeY_{t'} )_t$$ coalesce with a conditional probability greater than or equal to exp{−δK/(1 − c)}, where
Therefore, in order to prove the desired decay rate for the coefficients of absolute regularity, we show that there exists a stopping time τ(n) such that
and that \[\widetilde\mathbb{P}(\tau ^{(n)} < n) = 1 - O(\rho ^{\sqrt n } )\] for some ρ < 1. The following main result summarizes the result of our coupling method.
Proposition 2.1. Suppose that (A1)–(A3) are fulfilled. If
then
The following two results are immediate consequences of the main Proposition 2.1.
Corollary 2.1. Suppose that (A1)–(A3) are fulfilled. Then the Markov process (Zt)t has a unique stationary distribution π.
Remark 2.5. Woodard et al. [Reference Woodard, Matteson and Henderson32] and Douc et al. [Reference Douc, Doukhan and Moulines10] also derived properties of nonlinear INGARCH(1,1) processes which are, as in our case here, Markov chains that are not necessarily irreducible. Woodard et al. [Reference Woodard, Matteson and Henderson32] used the fact that a drift condition in conjunction with the weak Feller property of the Markov kernel ensures the existence of a stationary distribution while its uniqueness follows from a so-called asymptotic strong Feller property. These properties were, for example, verified for a Poisson threshold model with an intensity function as in (1.6). Douc et al. [Reference Douc, Doukhan and Moulines10] extended these results to more general intensity functions, including among other examples the log-linear Poisson autoregression model introduced by Fokianos and Tjøstheim [Reference Fokianos and Tjøstheim17]. They also focused on the intensity process and imposed the weak Feller condition directly on it. Under an additional high-level condition on two appropriately coupled versions of the Markov chain (see their condition (A3)), they showed that the intensity process (λt)t and, as a consequence, the bivariate process ((Yt, λt))t as well possess unique stationary distributions, and that stationary versions of the processes are ergodic. In the case of a Poisson threshold model (1.6) they also imposed the condition that max{c, c ′} < 1 in order to ensure semi-contractivity.
Under the semi-contractivity condition imposed here, we cannot derive the abovementioned Feller properties in general. On the other hand, the coupling result stated in Proposition 2.1 compensates for this failure. A metric d which resembles the coupling result is given by
It follows for arbitrary z ∈ [0,∞)p + q that $$P^{Z_1 |Z_0 = z'} \Rightarrow P^{Z_1 |Z_0 = z}$$ as d(z ′, z) → 0, where ‘⇒’ denotes weak dependence. In other words, the weak Feller property holds with respect to the metric d rather than the more usual Euclidean norm. As can be seen in the proof of Corollary 2.1, we also obtain
which means that the asymptotic Feller property is also fulfilled.
The following theorem is our main result.
Theorem 2.1. Suppose that (A1)–(A3) are fulfilled. A stationary version of the process (Yt)t is absolutely regular (β-mixing) with coefficients satisfying
for some C < ∞ and ρ < 1.
At this point we would like to recall that the accompanying process (λt)t is not mixing in general. The following counterexample was already given in [Reference Neumann27, Remark 3]. In the case of an INGARCH(1,1) process, consider the specification f (y;λ) = y/2+g(λ), where g is strictly monotone and satisfies 0 < κ 1 ≤ g(λ)< 0.5 as well as |g(λ)−g(λ ′)| ≤ κ 2|λ−λ ′| for all λ, λ′ ≥ and some κ 2 < 0.5. Then our regularity conditions (A1)–(A3) are fulfilled. Using the fact that g(λ) ∈ [κ 1, 0.5), it follows from 2λt = Yt −1 + 2g(λt −1) that Yt −1 = [2λt] and, therefore, 2g(λt −1) = 2λt −[2λt]. This means that we can perfectly recover λt −1 once we know the value of λt. Iterating this argument we see that we can recover from λt the complete past of the hidden process (λt)t. Taking into account that the above choice of f excludes the case that this process is purely nonrandom, we conclude that a stationary version of (λt)t cannot be strongly mixing, and therefore also not be absolutely regular. However, exploiting once more our coupling idea we can show that λt can be expressed as
for some measurable function g. This yields ergodicity of the process (λt)t ∈ ℤ and also of the bivariate process ((Yt, λt))t ∈ ℤ as stated in the following lemma.
Theorem 2.2. Suppose that (A1)–(A3) are fulfilled. Then a stationary version of the process ((Yt, λt))t ∈ ℤ is ergodic.
Compared to absolute regularity of the process (Yt)t, the ergodicity result for the accompanying process (λt)t seems to be a bit poor. However, combined with additional structural assumptions even the property of ergodicity might prove to be sufficient for deriving asymptotic properties of statistical procedures; see, e.g. [Reference Neumann27, Section 4], [Reference Leucht and Neumann24], and [Reference Leucht, Kreiss and Neumann25].
Remark 2.6. It is possible to extend our result on absolute regularity to the case of a time-varying transition mechanism, where the function f additionally depends on time. In this case, (2.1b) has to be replaced by
and assumption (A2) by 2pt
(A2′) (Uniform semi-contractive condition.) There exist nonnegative constants c 1, …, cq with c 1 + · · · + cq < 1 such that
for all t ≥ 0, y 1, …, yp ∈ ℝ, $${\lambda _1}, \ldots ,{\lambda _q},{\lambda _{1'}}, \ldots ,{\lambda _{q'}} \ge 0$$.
We are convinced that results similar to those in our paper can be proved under these conditions and we hope to be able to report on this elsewhere.
2.4 Some applications in statistics
In what follows we discuss a couple of instances where absolute regularity yields powerful uniform limit theorems, which also indicates the relevance of the present results. Assume that a real-valued process (Yt)t ∈ ℤ is strictly stationary and strongly mixing with coefficients satisfying $$\alpha _n \le C\rho ^{\sqrt n }$$ for some C < ∞. If, in addition, \[\mathbb{E}g(Y_0 ) = 0\] and \[\mathbb{E}g^2 (Y_0 )\mathop {\ln }\nolimits^2 (|g(Y_0 )| \vee 1) < \infty\], then Doukhan et al. [Reference Doukhan, Massart and Rio12] proved the following central limit theorem in the Skorokhod space D [0,1]:
Here W is a Brownian motion and the series \[\sigma ^2 (g) = \sum\nolimits_{j = - \infty }^\infty \mathbb{E}g(Y_0 )g(Y_i )\] is assumed to converge. For the detection of changes in the mean, we refer the reader to Theorems 4.1.2 and 4.1.5 of [Reference Csorgö and Horvath7]. The same volume deals in Section 4.4 with the detection of change points for other parameters involving functional central limit theorems; Doukhan et al. [Reference Doukhan, Massart and Rio13] proved a corresponding result under β-mixing.
In the nonparametric estimation frame, the specific structure of β-mixing is also fruitful. Viennet’s [Reference Viennet31] covariance inequality gives relevant bounds for the centred moments of kernel-type estimators (and more general nonparametric estimators) without imposing the existence of uniformly bounded joint densities as this is usually done under weaker strong mixing assumptions. This inequality is written as
for projection-type estimators on the vector space spanned by {e 1, …, en}, which is an orthonormal system of \[\mathbb{L}^2 (\mathbb{R}^d, w(x)x)\]. The standard bound of such quadratic loss has order m/n under weak β-mixing assumptions. This fact was also decisive in using model selection procedures under dependence. Baraud et al. [Reference Baraud, Comte and Viennet2] proposed adaptive estimation and a selection procedure for regression models (including autoregression) under this β-mixing condition. Beyond the abovementioned covariance inequality from [Reference Viennet31], they used the Berbee coupling for β-mixing sequences.
3 Proofs
Proof of Remark 2.1. For nonnegative y 1, …, yp −1, λ 0, …, λq −1 and positive a 1, …, ap −1, b 0, …, bq −1, let
We consider, without loss of generality, only the case of an INGARCH(p,q) process since the proof in the GARCH(p,q) case is analogous. Recall that Xt = (Yt −1, …, Yt −p+1, λt, …, λt −q+1). Then
We are going to find positive constants a 1, …, ap −1, b 0, …, bq −1, κ < 1, and a 0 < ∞ such that the right-hand side of (3.1) is smaller than or equal to
We set, without loss of generality, b 0 = 1 and, accordingly, a 0 = ā 0. Condition (A1) will be fulfilled for all possible values of the involved random variables if
where the possible choice of κ becomes apparent at the end of the proof.
Let $$\bar a = \sum\nolimits_{i = 1}^p \bar a_i$$ and $$\bar b = \sum\nolimits_{j = 1}^q \bar b_j$$. We choose ε > 0 such that $$\bar a + \bar b + 2\varepsilon < 1$$ and we define
Then (3.2a) is fulfilled. Furthermore, we define recursively, for any δ ∈ (0, ε/( q − 2)),
which implies that (3.2b) holds. Then
which means that (3.2c) is satisfied. Moreover, we set, for γ ∈ (0, ε/(p − 2)),
Then (3.2d) is fulfilled. Finally,
which shows that (3.2e) is also satisfied.
Since all inequalities (3.2a)–(3.2e) are fulfilled in the strict sense, we can include a factor κ < 1 which is sufficiently close to 1 on the right-hand sides, which leaves the strict inequalities intact. This completes the proof.
Proof of Lemma 2.1. Recall that $${\widetilde \lambda _{\tau + 1}}$$ and $$\widetilde\lambda _{\tau + 1} ‘$$ are $${\cal G}_\tau$$-measurable. Therefore, it follows from the similarity condition (A3) and the maximal coupling scheme that
Now if, in addition, $$\widetildeY_{\tau + 1} = \widetildeY_{\tau + 1} ‘$$ then we obtain p consecutive hits ($$\widetildeY_\tau = \widetildeY_{\tau ‘}, \ldots, \widetildeY_{\tau - p + 2} = \widetildeY_{\tau - p + 2} ‘$$ was assumed) and the contractive property begins to take effect, which implies that
Again, by (A3),
and if, additionally, $$\widetildeY_{\tau + 2} = \widetildeY_{\tau + 2} ‘$$ then
Iterating these calculations we obtain, for all k ∈ ℕ, the following general formulae. If $$\widetildeY_{\tau - p + 2} = \widetildeY_{\tau - p + 2} ‘, \ldots, \widetildeY_{\tau + k - 1} = \widetildeY_{\tau + k - 1} ‘$$ then
where d 1,1 = 1, d 1,i = 0 if i ≥ 2, and, for k ≥ 2,
Therefore,
This leads to
where
Since $$\widetildeY_{\tau - p + 2} = \widetildeY_{\tau - p + 2} ‘, \ldots, \widetildeY_{\tau + m} = \widetildeY_{\tau + m} ‘$$ means that the contractive property takes effect at all time points from τ +1 to τ + m, in this case we obtain
With m → ∞ we conclude that
which proves the assertion.
Proof of Proposition 2.1. In view of the result of Lemma 2.1, we define a stopping time as
for some ρ ∈ (0, 1). Recall that
It follows from Lemma 2.1 that
Hence, it remains to estimate \[\widetilde\mathbb{P}(\tau ^{(n)} \geqslant n)\]. To this end, we define stopping times τ 1, τ 2, …, which serve as starting points of subsequent trials to reach a state with
Recall that
in the case of a GARCH(p,q) model. Furthermore, in the INGARCH(p,q) case we define these quantities as
Let $$W_t = (V(\widetildeX_t ) + V(\widetildeX_{t'} ))/2$$ and
where $$C_1^{(0)} \in (0,\infty )$$ is defined in the course of the proof of Lemma 3.1 below. Then there exists some $$C_2^{(0)} > 0$$ such that
Furthermore, it follows from (A1) that there exists some $$C_1^{(1)} < \infty$$ and $$C_3^{(1)} > 0$$ such that
It follows, in turn, that there exist constants $$C_2^{(1)}, C_3^{(2)} > 0$$ and $$C_1^{(2)} < \infty$$ such that
and
Proceeding in the same way we obtain
This leads to
that is, with a probability not smaller than C 4 > 0 we we reach after p steps a state with
Now the contractive condition begins to take effect and it follows from Lemma 2.1 that after $$D_n - p + 1{\kern 1pt} : = [C_5 \sqrt n ]$$ additional hits we arrive at a state with (3.6), if C 5 is large enough. This actually happens with a probability bounded away from 0. Hence, we obtain
for some C 6 > 0. This means that a trial to reach a favourable state with (3.6) covers Dn time points. Accordingly, for i > 1, we consider the following retarded return times:
Now we are in a position to derive an upper bound for \[\widetilde\mathbb{P}(\tau ^{(n)} \geqslant n)\].
We define events
Let Kn = C 7Dn. It follows from Lemma 3.1 that
which yields
This implies that
if C 7 < 1 is sufficiently small. Therefore, and since \[\widetilde\mathbb{P}(A_1^c \cap \cdots \cap A_{K_n }^c ) \leqslant (1 - C_6 )^{K_n }\], we obtain
Proof of Corollary 2.1. In order to prove existence of a stationary version of (Zt)t, it would suffice to derive this property for (Xt)t, where Xt = (Yt −1, …, Yt −p+1, λt, …, λt −q+1). It follows from the drift condition (A1) that conditions (F1) and (F3), and therefore (F2) as well, in [Reference Tweedie30] are fulfilled. If the Markov chain were weak Feller, i.e. for any bounded and continuous function φ : ℝ p+q−1 → ℝ, the map x ↦ ∫ φ(y)PX 1 | X 0 = x(dy) were continuous, then we could conclude from Theorem 2 of [Reference Tweedie30] that (Xt)t has a stationary distribution. This fact has been used in, e.g. [Reference Douc, Doukhan and Moulines10], where the weak Feller property was explicitly imposed. The Feller property can be easily shown in the case of a continuous volatility/intensity function f; however, this might fail with a discontinuous function as they appear with certain threshold models. We show below that the missing Feller property will be compensated by the coupling result in Proposition 2.1.
First we convert the coupling result in a convergence result for the conditional distributions P Z n | X 0 = x, where x is an arbitrarily chosen point in the range of X 0. Using maximal coupling as in the proof of Proposition 2.1, we construct two versions of the process, $${({\widetilde Z_t})_{t \in {_0}}}$$ and \[(\widetildeZ_{t'} )_{t \in \mathbb{N}_0 }\], where $${\widetilde X_0} = x$$ and $$\widetildeX_{0'} \sim P^{X_1 |X_0 = x}$$. We obtain
Now we can construct, on a suitable probability space $$(\widetilde{\widetilde\Omega },\widetilde{\widetilde{\cal F}},\widetilde{\widetilde{}})$$, a sequence of random vectors (ζn)n ∈ ℕ such that
and
(Given ζ1, …, ζn, the vector ζn + 1 has to be defined according to the conditional distribution of $$\widetildeZ_{n'}$$ given $${\widetilde Z_n}$$.) Since
we obtain, from (3.7),
for some K < ∞. It follows that
which means that all ζm,Y are equal for m ≥ n(ω), and therefore they are eventually equal to some random vector ζY. Furthermore, since
we obtain
Hence, it follows from (3.8) that
which implies that ζN,λ converges to some random vector ζλ with probability 1. Let $$\zeta = (\zeta _Y^T, \zeta _\lambda ^T )^T$$ and denote by $$\pi = \widetilde{\widetilde{}}^\zeta$$ the distribution of ζ. Let φ : ℝp+q → ℝ be a bounded and uniformly continuous function. Next we show that π is a stationary distribution of the Markov chain (Zt)t. Since the map y ↦ ∫ φ(z)P Z 1 | Z 0 = y(dz) is continuous in the last q arguments yp+1, …, yp+q, we obtain
which yields
Hence, π is a stationary distribution of (Zt)t.
To show uniqueness, suppose that π 1 and π 2 are two arbitrary stationary distributions. We start the processes to be coupled such that $${\widetilde Z_0} \sim {\pi _1}$$ and $$\widetildeZ_{0'} \sim \pi _2$$. (Here, it does not matter whether or not $${\widetilde Z_0}$$ and $$\widetildeZ_{0'}$$ are independent.) Since both π 1 and π 2 are stationary laws, we have
Furthermore, it follows from the geometric drift condition (A1) that \[\widetilde\mathbb{E}(V(\widetildeX_1 ) + V(\widetildeX_{1'} )) < \infty\], which implies by Proposition 2.1 that
as n → ∞. This and (3.9) imply that π 1 = π 2.
Proof of Theorem 2.1. Let π denote the stationary distribution of (Zt)t and let, for s ≤ t ≤ ∞, $${\cal F}_{s,t}^Y = \sigma (Y_s, \ldots, Y_t )$$. We start both versions of the process at time 0 independently, with $${\widetilde Z_0} \sim \pi $$ and $$\widetildeZ_{0'} \sim \pi$$. We denote by \[\widetilde\mathbb{P}_\pi\] and \[\widetilde\mathbb{E}_\pi\] the corresponding distribution and expectation, respectively. Since, by (3.10) below, λt = g(Yt −1, Yt −2, …),, we have in particular $${\cal F}_{ - \infty, 0}^Y = \sigma (Z_0, Z_{ - 1}, \ldots )$$. We obtain
Here $${\cal C}$$ denotes the σ-field generated by the cylinder sets. Proposition 2.1 yields $$\beta _n = O(\rho ^{\sqrt n } )$$, as required.
Proof of Theorem 2.2. Let ((Yt, λt))t ∈ ℤ be a stationary version of the process. We will show that there exists a measurable function \[g:\mathbb{N}_0^\infty \to [0,\infty )\] such that λt = g(Yt −1, Yt −2, …). To this end, we consider the same ‘forward iterations’ as in the proof of Lemma 2.1. We use the true values Y 0, …, Y 1−p, λ0, …, λ1−q as well as Y 0, …, Y 1−p, $$\lambda _{0'}, \ldots, \lambda _{1 - q} ‘$$ with $$\lambda _{0'} = \cdots = \lambda _{1 - q} ‘ = 0$$ as starting values. Then we define, according to the model equation (2.1b),
Iterating this scheme we obtain
Note that in all steps matching values of the process (Yt)t are used for computing λk and $$\lambda _{k'}$$, which means that the contractive property takes effect at each step. Therefore we obtain, analogously to (3.3) in the proof of Lemma 2.1,
where it follows from (3.4) that dk +1 → 0 as k → ∞. By stationarity we conclude, for fixed t ∈ ℤ, that
that is, as k → ∞, g [k](Yt −1, …, Yt −p−k+1) converges in L 1 to λt. By taking an appropriate subsequence we also get almost-sure convergence. This means that there exists some measurable function \[g:\mathbb{N}_0^\infty \to [0,\infty )\] such that
Since absolute regularity of the process (Yt)t ∈ ℤ implies strong mixing (see, e.g. [Reference Doukhan11, p. 20]), we conclude from Remark 2.6 on page 50 in combination with Proposition 2.8 of [Reference Bradley5, p. 51] that any stationary version of this process is also ergodic.
Finally, we conclude from representation (3.10) by Proposition 2.10(ii) of [Reference Bradley5, p. 54] that the bivariate process ((Yt, λt))t ∈ ℤ is also ergodic.
Lemma 3.1. Suppose that (A1) is fulfilled. Then
(i) \[\widetilde\mathbb{E}(\eta ^{\tau _1 } |\mathcal{G}_{ - 1} ) \leqslant (V(\widetildeX_0 ) + V(\widetildeX_{0'} ))/2\] if $$(V(\widetildeX_0 ) + V(\widetildeX_{0'} ))/2 > C_1^{(0)}$$, where η =2/(1 + κ) and $$C_1^{(0)} = (2a_0 + 2)/(1 - \kappa )$$.
(ii) \[\widetilde\mathbb{E}(\eta ^{\tau _{m + 1} - \tau _m } |\mathcal{G}_{\tau _m - 1} ) \leqslant \rho ^{D_n } (1 + (a_0 + \kappa C_1^{(0)} )/(1 - \kappa ))\].
Proof of Lemma 3.1. We have already defined
in the case of a GARCH(p,q) model. Furthermore, in the INGARCH(p, q) case we set analogously
Let $$W_t = (V(\widetildeX_t ) + (\widetildeX_{t'} ))/2$$.
Since $$\widetildeY_{t - 1} |{\cal G}_{t - 1} = Q(\widetilde\lambda _{t - 1} )$$, we see that
Therefore, we obtain \[\widetilde\mathbb{E}(V(\widetildeX_t )|\mathcal{G}_{t - 1} ) = \widetilde\mathbb{E}(V(\widetildeX_t )|\widetildeX_{t - 1} )\] and, analogously, \[\widetilde\mathbb{E}(V(\widetildeX_{t'} )|\mathcal{G}_{t - 1} ) = \widetilde\mathbb{E}(V(\widetildeX_t )|\widetildeX_{t - 1} ‘)\]. Hence, from the geometric drift condition (A1), we obtain
This implies that
and
In what follows we adapt the line of arguments from Nummelin and Tuominen [Reference Nummelin and Tuominen28], who derived similar bounds for stopping times in the context of a Markov chain.
Proof of (i). Let $$W_0 = x > C_1^{(0)}$$. We denote by \[\widetilde\mathbb{P}_x\] and \[\widetilde\mathbb{E}_x\] the conditional distribution and expectation, respectively, given W 0 = x. It follows from (3.12) that
which implies that
Analogously, we conclude from (3.12) that
which yields
Multiplying both sides by η and taking the expectation over W 1 under the condition that W0 = x, we obtain
Proceeding in the same way we conclude that
Adding both sides of (3.13) to (3.14) we obtain
as required.
Proof of (ii). Here we have to take into account that τm+1 is not a usual but a retarded return time. Recall that Xτ m is $${\cal G}_{\tau _m - 1}$$-measurable. Since $$X_{\tau _m } \le C_1^{(0)}$$, from (i) we obtain
Furthermore, since $$W_{\tau _m } \le C_1^{(0)}$$, from (3.11) we obtain
and, eventually,
Acknowledgements
This work has been developed within the MME-DII centre of excellence (ANR-11-LABEX-0023-01) and with the help of PAI-CONICYT MEC no. 80170072. The first author wishes to thank the University of Jena and Universidad de Valparaiso for their hospitality. The research of the second author was supported by a guest professorship of IEA at the University of Cergy-Pontoise. We thank two anonymous referees for their comments which helped improve the presentation of our results.