1. Introduction
An exemplary moderate deviation theorem is as follows (see [29, page 228]). Let $X_i$ , $1\le i\le n$ , be independent and identically distributed (i.i.d.) random variables with $\mathbb{E}(X_1)=0$ and $\text{var}(X_1)=1$ . If, for some $t_0>0$ ,
then there exist positive constants $c_1$ and $c_2$ depending on $c_0$ and $t_0$ such that
where $\Phi(z)$ is the distribution function of the standard normal, $|\text{O}(1)|\le c_2$ . However, since the pioneering work [Reference Chen15], it has been shown [Reference Barbour, Holst and Janson9] that, for the counts of rare events, Poisson distribution provides a better approximation. For example, the distribution of the number of records [Reference Dwass23, Reference Rényi30] in Example 1.1 below can be better approximated by the Poisson distribution having the same mean than by a normal distribution [Reference Deheuvels and Pfeifer22]. Moreover, a suitable refinement of the Poisson distribution can further improve the performance of the approximation [Reference Borovkov10, Reference Borovkov and Pfeifer11].
The right tail probabilities of counts of rare events are often needed in statistical inference, but these probabilities are so small that the error estimates in approximations of distributions of the counts are usually of no use because the bounds are often larger than the probabilities of interest. Hence it is of practical interest to consider their approximations via moderate deviations in Poisson approximation in a similar fashion to (1.2). However, there is not much progress in the general framework except the special cases in [Reference Chen and Choi16], [Reference Barbour, Chen and Choi8], [Reference Chen, Fang and Shao18], [Reference Tan, Lu and Xia34], and [Reference Čekanavičius and Vellaisamy13]. This is partly due to the fact that the tail behaviour of a Poisson distribution is significantly different from that of a normal distribution, and this fact was observed by Gnedenko [Reference Gnedenko25] in the context of extreme value theory. In particular, Gnedenko [Reference Gnedenko25] concluded that the Poisson distribution does not belong to any domain of attraction, while the normal distribution belongs to the domain of attraction of the Gumbel distribution.
Example 1.1. We use the distribution of the number of records to explain the difference of moderate deviations between Poisson and normal approximations. More precisely, let $\{\eta_i\colon 1\le i\le n\}$ be i.i.d. random variables with a continuous cumulative distribution function. As the value of $\eta_1$ is always a record, for $2\le i\le n$ , we say $\eta_i$ is a record if $\eta_i>\max_{1\le j\le i-1}\eta_j$ . We define the indicator random variable
that is, $I_i=1$ if a new record occurs at time i and $I_i=0$ otherwise. Our interest is in the distribution of $S_n\coloneqq \sum_{i=2}^nI_i$ , denoted by $\mathscr{L}(S_n)$ . Dwass [Reference Dwass23] and Rényi [Reference Rényi30] stated that $\mathbb{E} I_i=1/i$ , $\{I_i\colon 2\le i\le n\}$ are independent, so
We use $\text{Pn}(\lambda)$ to stand for the Poisson distribution with mean $\lambda$ , $\text{Pn}(\lambda)(A)\coloneqq \mathbb{P}(Y\in A)$ for $Y\sim\text{Pn}(\lambda)$ , and $N(\mu,\sigma^2)$ to stand for the normal distribution with mean $\mu$ and variance $\sigma^2$ .
Let $v_n\coloneqq \lambda_n+x\cdot\sigma_n$ , and we consider approximations of $\mathbb{P}(S_n\ge v_n)$ by moderate deviations based on $\text{Pn}(\lambda_n)$ [Reference Barbour, Chen and Choi8, Reference Chen, Fang and Shao18] and $N_n\sim N(\lambda_n,\sigma_n^2)$ . For $x=3$ , Figures 1, 2, and 4, respectively, are the plots of the ratios $\mathbb{P}(S_n\ge v_n)/\text{Pn}(\lambda_n)([v_n,\infty))$ , $\mathbb{P}(S_n\ge v_n)/\mathbb{P}(N_n\ge v_n)$ , and $\mathbb{P}(S_n\ge v_n)/\text{Pn}(\sigma_n^2)([v_n,\infty))$ for the range of $n\in[3,10^5]$ . As observed in [Reference Borovkov and Pfeifer11], Poisson and normal approximations to $\mathscr{L}(S_n)$ , respectively, are of order $\text{O}((\ln n)^{-1})$ and $\text{O}((\ln n)^{-1/2})$ , and hence the numerical studies confirm that approximation by the Poisson distribution is better than that by the normal distribution. In fact, it appears that the speed of convergence of $\mathbb{P}(S_n\ge v_n)/\mathbb{P}(N_n\ge v_n)$ to 1 as $n\to\infty$ is too slow to be of practical use. In the context of normal approximation to the distribution of integer-valued random variables, a common practice is to introduce a 0.5 correction, giving the ratios $\mathbb{P}(S_n\ge v_n)/\mathbb{P}(N_n\ge\lceil v_n\rceil-0.5)$ , where $\lceil x\rceil$ is the smallest integer that is not less than x. Figure 3 is the plot of the ratios and we can see that the ratios are still far away from the limit of 1. Finally, the difference between Figure 1 and Figure 4 shows that a minor change of the mean of the approximating Poisson can change the quality of moderate deviation approximation significantly, further highlighting the difficulty of obtaining sharp bounds in theoretical studies in the area.
Example 1.1 shows that the distribution of the counts of rare events often has a heavier right tail than that of the corresponding normal distribution; approximations by the moderate deviations in the normal distribution are generally inferior to those by the moderate deviations in the Poisson distribution. Example 1.2 says that the parameter of the approximating Poisson distribution suggested in [Reference Chen and Choi16], [Reference Barbour, Chen and Choi8], and [Reference Chen, Fang and Shao18] is not optimal, and some adjustment can significantly improve the quality of approximations by the moderate deviations in the Poisson distribution.
Example 1.2. With $0<p<1$ , let $W_n\sim\text{Bi}(n,p)$ , $Y_n\sim\text{Pn}(np)$ , and $Z\sim N(0,1)$ . Then, for a fixed $x>0$ ,
which systematically deviates from 1 as x moves away from 0. The systematic bias can be removed by introducing an adjustment into the approximate models: for a fixed $x>0$ ,
or equivalently, with $Y_n'\sim \text{Pn}(np(1-p))$ ,
Example 1.2 suggests that it is more suitable to approximate the right tail probabilities by looking at the number of standard variations away from the mean, which is essentially the original idea of the translated (shifted) Poisson approximation [Reference Barbour and Xia7, Reference Röllin31, Reference Röllin32]. In this paper we show that it is indeed better to approximate the right tail probabilities via the moderate deviations in the translated Poisson distribution.
Our approach does not rely on the boundedness of the Radon–Nikodým derivative as in [Reference Chen and Choi16] and [Reference Barbour, Chen and Choi8] or the tacit assumption of well-behaved tail probabilities as in [Reference Chen, Fang and Shao18]; see Remark 2.3 for more details. For the case of Poisson-binomial, we show in Proposition 3.2 that our approach works for the case when the maximum of the success probabilities of the Bernoulli random variables is not small, such as the distribution of the number of records.
The paper is organised as follows. In Section 2 we state the main results in the context of local dependence, size-biased distribution, and discrete zero-biased distribution. In Section 3 we illustrate the accuracy of our bounds with six examples. The proofs of the main results are postponed to Section 4, where we also establish Stein’s factors for Poisson moderate deviations in Lemma 4.1.
2. The main results
In this section we state three theorems on moderate deviations in Poisson approximation: the first is under a local dependent structure, the second is with respect to the size-biased distribution, and the last is in terms of the discrete zero-biased distribution.
We first consider a class of non-negative integer-valued random variables $\{X_i\colon i\in\mathcal{I}\}$ satisfying the local dependent structure (LD2) in [Reference Chen and Shao17] (see also [Reference Arratia, Goldstein and Gordon2] for its origin). For ease of reading, we quote the definition of (LD2) below.
(LD2) For each $i\in\mathcal{I}$ , there exists an $A_i\subset B_i\subset \mathcal{I}$ such that $X_i$ is independent of $\{X_j\colon j\in A_i^c\}$ and $\{X_i\colon i\in A_i\}$ is independent of $\{X_j\colon j\in B_i^c\}$ .
We set
We write
As suggested in Example 1.2, we consider $Y\sim\text{Pn}(\lambda)$ approximation to $W-a$ with $|\lambda-\sigma^2|$ being not too large and $a=\mu-\lambda$ being an integer, so that k in $\mathbb{P}(W-a\ge k)$ and $\mathbb{P}(Y\ge k)$ is in terms of the number of standard deviations of W. In principle, the constant a is chosen to minimise the error of approximation. However, our theory is formulated in such a flexible way that other choices of $\lambda$ and a are also acceptable. The three most useful choices of a are $a=0$ , $a = \lfloor{\mu-\sigma^2}\rfloor$ , and $a = \lceil{\mu-\sigma^2}\rceil$ , where $\lfloor{\cdot}\rfloor$ stands for the largest integer in $(-\infty,\cdot]$ .
Theorem 2.1. With the set-up in the preceding paragraph, assume that $\{X_i\colon i\in\mathcal{I}\}$ satisfies (LD2) and, for each i, there exists a $\sigma$ -algebra $\mathcal{F}_i$ such that $\{X_j\colon j\in B_i\}$ is $\mathcal{F}_i$ -measurable. Define
where $\text{ess\,sup}\, V$ is the essential supremum of the random variable V. Then, for integer $a<\mu$ , $\lambda=\mu-a$ , and positive integer $k>\lambda$ , we have
where, with $F(\,j)=\mathbb{P}(Y\le j)$ , $\overline{F}(\,j)=\mathbb{P}(Y\ge j)$ ,
Remark 2.1. Both ${C}_1$ and ${C}_2$ can be numerically computed in applications and they cannot generally be improved (see the proofs below). They are better than the ‘naive’ counterparts $(1-{\text{e}}^{-\lambda})/(\lambda \mathbb{P}(Y\ge k))$ derived through the total variation bounds in [Reference Barbour and Eagleson6] and [Reference Barbour, Holst and Janson9]. Figure 5 provides details of
for $\lambda=10$ , k from 10 to 43. We would like to mention that for large k and/or large $\lambda$ , the tail probabilities are so small that the calculation using MATLAB produces unstable results since accumulated computation errors often exceed the tail probabilities, and hence more powerful computational tools are needed to achieve the required accuracy or one has to resort to known approximations to the Poisson right tails and point probabilities.
Remark 2.2. Due to the discrete nature of Poisson distribution, it seems impossible to analytically simplify ${C}_1$ and ${C}_2$ at negligible costs for the diverse range of $k>\lambda$ .
Remark 2.3. If $\lambda$ is chosen reasonably close to $\sigma^2$ so that $\lambda-\sigma^2$ is bounded, then $\theta_i$ in the bound (2.1) converges to 0 when $\sigma^2$ converges to $\infty$ . Our bound does not rely on the Radon–Nikodým derivative of $\mathscr{L}(W)$ with respect to $\text{Pn}(\lambda)$ , which is the crucial ingredient in [Reference Chen and Choi16] and [Reference Barbour, Chen and Choi8]. On the other hand, the tacit assumption of [Reference Chen, Fang and Shao18] is that
in Theorem 2.1 is well-behaved and this assumption is hard to verify. The bound (2.1), although relatively crude, does not rely on this assumption and covers more general cases.
Corollary 2.1. For the sum of independent non-negative integer-valued random variables $W=\sum_{i\in\mathcal{I}}X_i$ , let $\theta_i=\max_{j}\mathbb{P}(W-X_i=j)$ , $\mu_i=\mathbb{E} X_i$ , $\mu=\sum_{i\in\mathcal{I}} \mu_i$ , $\sigma^2=\text{Var}(W)$ . For any integer $a<\mu$ , let $\lambda=\mu-a$ , $Y\sim\text{Pn}(\lambda)$ . Then, for $k> \lambda$ ,
Remark 2.4. We leave $\mathbb{P}(W-a<-1)$ in the upper bound (2.1) because the current approach cannot remove it from the bound. Nevertheless, it is no more than 1 and converges to zero exponentially fast with suitable choice of a. For the sum of independent non-negative integer-valued random variables in Corollary 2.1, if a is at least less than $\mu$ by a few $\sigma$ s, we can use [20, Theorem 2.7] to obtain
For any non-negative random variable W with mean $\mu\in (0,\infty)$ and distribution $ {\text{d}} F(w)$ , the W-size-biased distribution [Reference Arratia and Goldstein1, Reference Cochran21] is given by
or equivalently by the characterising equation
Theorem 2.2. Let W be a non-negative integer-valued random variable with mean $\mu$ and variance $\sigma^2$ , and let $a<\mu$ be an integer, $\lambda=\mu-a$ . Then, for integer $k>\lambda$ , we have
where $Y\sim\text{Pn}(\lambda)$ .
Remark 2.5. Theorem 2.2 improves [18, Theorem 3] in a number of ways, with less restrictive conditions and no unspecified constants.
The next theorem is based on the discrete zero-biased distribution defined in [Reference Goldstein and Xia26] and the approach is very similar to that in [Reference Chen, Fang and Shao19]. For an integer-valued random variable V with mean $\mu$ and finite variance $\sigma^2$ , we say that $V^\star$ has the discrete V-zero-biased distribution [26, Definition 2.1] if, for all bounded functions $g\colon \mathbb{Z}\coloneqq \{0,\pm1,\pm2,\dots\} \rightarrow \mathbb{R}$ with $\mathbb{E} |Vg(V)|<\infty$ ,
where $\Delta f(i)\coloneqq f(i+1)-f(i)$ .
Theorem 2.3. Let W be a non-negative integer-valued random variable with mean $\mu$ , variance $\sigma^2$ , let $a<\mu$ be an integer, and let $W^\star$ have the discrete W-zero-biased distribution and be defined on the same probability space as W. Set $R=W^\star-W$ and define
Then, for integer $k>\lambda$ , with $\lambda=\mu-a>0$ , we have
where $Y\sim\text{Pn}(\lambda)$ .
3. Examples
As many applications of Poisson approximation rely on size-biased distributions, we begin with a review of some facts about size biasing.
Size biasing has been of considerable interest for many decades (see [Reference Barbour, Holst and Janson9], [Reference Ross33], [Reference Arratia, Goldstein and Kochman3], and references therein). In the context of the sum of Bernoulli random variables, its size biasing is particularly simple. More precisely, if $\{X_i\colon i\in\mathcal{I}\}$ is a family of Bernoulli random variables with $\mathbb{P}(X_i\,{=}\,1)=p_i$ , then the size-biased distribution of $W=\sum_{i\in\mathcal{I}}X_i$ is
where
and I is a random element independent of $\{\{X_j^{(i)}\colon j\in\mathcal{I}\}\colon i\in\mathcal{I}\}$ having distribution $\mathbb{P}(I=i)={{p_i}/{(\mathbb{E}} W)}$ , $i\in\mathcal{I}$ . Moreover, $\{X_i\colon i\in\mathcal{I}\}$ are said to be negatively related (resp. positively related) [Reference Barbour, Holst and Janson9, page 24] if one can construct $\{\{X_j^{(i)}\colon j\in\mathcal{I}\}\colon i\in\mathcal{I}\}$ such that $X_j^{(i)}\le$ (resp. $\ge$ ) $X_j$ for all $j\ne i$ . When $\{X_i\colon i\in\mathcal{I}\}$ are negatively related, we have
where $\mu=\mathbb{E} W$ and $\sigma^2=\text{var}(W)$ . On the other hand, if $\{X_i\colon i\in\mathcal{I}\}$ are positively related, then
3.1. Poisson-binomial trials
Let $\{X_i, 1\le i\le n\}$ be independent Bernoulli random variables with $\mathbb{P}(X_i=1)=p_i\in (0,1)$ , $W=\sum_{i=1}^nX_i$ , $\mu=\mathbb{E} W$ , and $\mu_2=\sum_{i=1}^np_i^2$ . When $\tilde p\coloneqq \max_{1\le i\le n}p_i\to 0$ , the large deviation of W is investigated in [Reference Chen and Choi16] and [Reference Barbour, Chen and Choi8] with precise asymptotic order. We give two results for this particular case without the assumption $\tilde p$ being small: the first is a direct consequence of the general results in Section 2 and the second is based on our approach using a more fine-tuned analysis and well-studied properties of the tail behaviour of W.
Proposition 3.1. Recalling ${C}_1$ and ${C}_2$ in (2.2) and (2.3), for any integer $k>\mu$ we have
and, with $a=\lfloor{\mu_2}\rfloor$ and $\lambda\coloneqq \mu-a$ ,
Proof. The claim (3.4) is a consequence of Theorem 2.2 with $a=0$ and $\mu\mathbb{E} |W+1-W^s|=\sum_{i=1}^np_i^2$ , as shown in (3.2).
The bound (3.5) is a special case of Corollary 2.1. Since $\mathscr{L}(W_i)$ is unimodal, Corollary 1.6 of [Reference Mattner and Roos28] says that
On the other hand $\mathbb{E}(X_i^2)=p_i$ , and hence the upper bound (3.5) is an immediate consequence of Corollary 2.1 and (2.4).
One can also use Theorem 2.3 to obtain the same bound. More precisely, according to the construction of the discrete zero-biased distribution suggested in [Reference Goldstein and Xia26], let I be a random variable independent of $\{X_i, 1\le i\le n\}$ with distribution $\mathbb{P}(I=i)=p_i(1-p_i)/\sigma^2$ for $1\le i\le n$ . Then we can write $W^\star=W-X_I$ , giving $R=-X_I$ . We then apply (3.6) to bound $\theta_R$ as
and a routine calculation gives
Hence (3.5) follows from (2.6) and (2.4).
Proposition 3.2. Define
Then, for any integer k with $x\coloneqq (k-\mu)/\sqrt{\mu}\ge 1$ , we have
The proof relies on more information on the solutions of Stein’s equation and it is postponed to the end of Section 4. The bound (3.7) improves [18, (3.1)] in two respects: it contains no unspecified constants and it does not require $\tilde p$ to be small. For the distribution of the number $S_n$ of records, the large deviation results in [Reference Barbour, Chen and Choi8] do not apply. However, recalling that $\lambda_n=\sum_{i=2}^n(1/i)$ , we apply Proposition 3.2 with the harmonic series $\lambda_n=\sum_{i=2}^n(1/i)\ge \ln n+\gamma-1$ and the Riemann zeta function
to get the following estimate.
Corollary 3.1. For any integer k with $x\coloneqq (k-\lambda_n)/\sqrt{\lambda_n}\ge 1$ , we have
where $\gamma$ is Euler’s constant.
Remark 3.1. We conjecture that, with $a=\lfloor{\mu_2}\rfloor$ and $\lambda\coloneqq \mu-a$ , the bound in (3.5) can be significantly improved and the better estimate is likely dependent on the Radon–Nikodým derivative bound
3.2. Matching problem
For a fixed n, let $\pi$ be a uniform random permutation of $\{1,\dots,n\}$ , and let
be the number of fixed points in the permutation.
Proposition 3.3. For the random variable W defined above and any integer $k\ge 2$ , we have
Proof. In this case, the size-biased distribution $\mathscr{L}(W^s)$ can be coupled with W as follows [Reference Chatterjee, Diaconis and Meckes14]. Let I be uniformly distributed on $\{1,2,\dots,n\}$ , independent of $\pi$ , and define
Set $W^s=\sum_{j=1}^n\textbf{ 1}_{\{j=\pi^s(\,j)\}}$ . One can easily verify that $W^s$ has the size-biased distribution of W. Also, we can check that $\mathbb{E} W=\text{Var}(W)=1$ , giving $\mathbb{E} W^s=2$ . Let $\Delta=W+1-W^s$ . Using the above construction of $W^s$ , we can conclude that $\Delta$ takes values in $\{-1,0,1\}$ and $\mathbb{P}(\Delta=1\mid W)=W/n$ . Since $\mathbb{E}\Delta=0,$ we have $\mathbb{P}(\Delta=1)=\mathbb{P}(\Delta=-1)$ , and $\mathbb{E}\vert\Delta\vert=2/n$ . On the other hand, $\lambda=\mu$ allows us to get rid of the second term in (2.5). By Theorem 2.2 with $a=0$ , $\lambda=\mu=1$ , the claim follows.
3.3. Occupancy problem
The occupancy problem has a long history dating back to the early development of probability theory. General references on this subject can be found in classical treatments, e.g. [24, Vol. 1, Chapter 2] and [Reference Barbour, Holst and Janson9, Chapter 6].
The occupancy problem can be formulated as follows. Let l balls be thrown independently of each other into n boxes uniformly. Let $X_i$ be the indicator variable of the event that the ith box is empty, so the number of empty boxes can be written as $W=\sum_{i=1}^nX_i$ . Noting that $p\coloneqq \mathbb{E} X_i=(1-{{1}/{n}})^l$ , direct computation gives
Proposition 3.4. For the random variable W defined above and any integer $k>\mu$ , we have
where $Y\sim \text{Pn}(\mu)$ .
Proof. For the sake of completeness, we provide the following proof, which is essentially a repeat of [Reference Barbour, Holst and Janson9, page 23]. From the construction of W-size-biased distribution in (3.1), we can construct a coupling as follows. Let I be uniform on $\{1,\dots,n\}$ , that is, we randomly pick one box with equal probability. If the selected box is not empty, we redistribute all balls in the box randomly into the other $n-1$ boxes with equal probability $1/(n-1)$ . Define $X_{j}^{(i)}$ as the indicator of the event that the box being selected is i, and after the redistribution, box j is empty. With this coupling in mind, one can verify that $\{X_i\}$ is negatively related, so it follows from (3.2) that
3.4. Birthday problem
The classical birthday problem is essentially a variant of the occupancy problem. For this reason, we throw l balls independently and equally likely into n boxes and let $X_{ij}$ be the indicator random variable of the event that ball i and ball j fall into the same box. The number of pairs of balls going into the same boxes (i.e. the number of pairs of people having the same birthdays) can be written as $W=\sum_{i<j}X_{ij}$ . Define $p=\mathbb{E} X_{ij}={{1}/{n}}$ , so $\mu=\mathbb{E} W=\binom{l}{ 2}p$ . Chatterjee, Diaconis, and Meckes [Reference Chatterjee, Diaconis and Meckes14] gave the following construction of $W^s$ : label the balls from 1 to l, randomly choose two balls $J_1$ and $J_2$ , and move ball $J_1$ into the box that $J_2$ is in. Then W is the number of pairs of balls before the move while $W^s$ is the number of pairs of balls after the move. Let E be the event that $J_1$ and $J_2$ are from the same box. When E occurs, $W^s=W$ , so $\vert W+1-W^s\vert=1$ ; otherwise $J_1$ and $J_2$ are from different boxes with $B_1$ and $B_2$ balls respectively, giving
Hence
This, together with Theorem 2.2 and $a=0$ , gives the following proposition.
Proposition 3.5. For the random variable W defined above and any integer $k>\mu$ , we have
where $Y\sim \text{Pn}(\mu)$ .
3.5. Triangles in the Erdős–Rényi random graph
Let $G=G(n,p)$ be an Erdős–Rényi random graph on n vertices with edge probability p. Let $K_n$ be the complete graph on n vertices, and let $\Gamma$ be the set of all triangles in $K_n$ . For $\alpha\in\Gamma$ , let $X_\alpha$ be the indicator that there is a triangle in G at $\alpha$ , that is,
Therefore the number of triangles in G can be represented as $W=\sum_{\alpha\in\Gamma}X_\alpha$ . Clearly, $X_\alpha$ is independent of $X_\beta$ if $\alpha$ and $\beta$ do not share a common edge. By analysing the numbers of shared edges, we obtain (see [33, page 255])
Proposition 3.6. For the random variable W defined above and any integer $k>\mu$ , we have
where $Y\sim \text{Pn}(\mu)$ .
Proof. The following proof is a special version of the general argument in [Reference Barbour, Holst and Janson9, page 89]. Since $X_\alpha$ and $X_\beta$ are independent if $\alpha$ and $\beta$ have no common edges, a size-biased distribution of W can be constructed as follows. Let
Then
Here the union of graphs is in the sense of set operations on their vertices and edges. Let I be a random element taking values in $\Gamma$ with equal probability and independent of $\mathscr{L}(\{X_\beta^{(\alpha)},\alpha,\beta\})$ . Then we can write
Because $X_\beta^{(\alpha)}\ge X_\beta$ for all $\beta\in\Gamma$ , (3.3) implies
The claim follows from Theorem 2.2 with $a=0$ .
Remark 3.3. Since $\mu=\binom{n}{ 3}p^3$ , if $p = \text{O}(1/n)$ , then the error bound (3.10) is of the same order $\text{O}(1/n)$ .
3.6. 2-runs
Let $\{\xi_i,\dots,\xi_n\}$ be i.i.d. Bernoulli(p) random variables with $n\ge 9$ , $p<2/3$ . For each $1\le i\le n$ , define $X_i=\xi_i\xi_{i+1}$ and, to avoid edge effects, we define $\xi_{j+n}=\xi_j$ for $-3\le j\le n$ . The number of 2-runs in the Bernoulli sequence is defined as $W=\sum_{i=1}^nX_i$ . Then $\mu=np^2$ and variance $\sigma^2=np^2(1-p)(3p+1)$ .
Proposition 3.7. For any integer $k>\mu$ ,
If $a\coloneqq \lfloor{np^3(3p-2)}\rfloor$ , $\lambda=\mu-a$ , then for any integer $k>\lambda$ ,
Proof. For (3.11), we apply Theorem 2.2 with $a=0$ ,
I a uniform random variable on $\{1,\dots,n\}$ independent of $\{X_j^{(i)}\}$ , and
giving
à propos of (3.12), we make use of Theorem 2.1. To this end, let $A_i=\{i-1,i,i+1\}$ , $B_i=\{i-2,i-1,i,i+1,i+2\}$ , and $\mathcal{F}_i=\sigma\{\xi_j\colon i-2\le j\le i+3\}$ . Then Lemma 5.1 of [Reference Barbour and Xia7] with $\alpha_j=0$ or 1 for $j=i-2,\ldots,i+5$ gives
On the other hand, $\mathbb{E}(Z_i^{\prime})=5p^2$ , $|\mathbb{E}((X_i-\mu_i)Z_i)|\le \mathbb{E}(Z_i)=3p^2$ ,
and $|\lambda-\sigma^2|\lambda^{-1}\le 1\wedge (\lambda^{-1})$ , $a=\lfloor{np^3(3p-2)}\le0\rfloor$ , $\lambda\ge \sigma^2$ , and hence $\mathbb{P}(W-a<-1)=0$ and (3.12) follows from Theorem 2.1 by collecting these terms.
4. The proofs of the main results
The celebrated Stein–Chen method [Reference Chen15] is based on the observation that a non-negative random variable $Y\sim\text{Pn}(\lambda)$ if and only if $\mathbb{E}[\lambda f(Y+1)-Yf(Y)]=0$ for all bounded functions $f\colon \mathbb{Z}_+\coloneqq \{0,1,2,\dots\}\rightarrow\mathbb{R}$ , leading to a Stein identity for Poisson approximation as
where $\text{Pn}(\lambda)\{h\}\coloneqq \mathbb{E} h(Y)$ . Since f(0) plays no role in Stein’s equation, we set $f(0)=f(1)$ and $f(\,j)=0$ for $j<0$ . The following lemma plays a key role in the proofs of the main results, and it enables us to circumvent checking the moment condition (1.1), which seems to be inevitable in the existing procedure for proving moderate deviation theorems.
Lemma 4.1. For fixed $k\in\mathbb{Z}_+$ , let $h=\textbf{ 1}_{[k,\infty)}$ . With $\pi_\cdot=\text{Pn}(\lambda)(\{\cdot\})$ , $\Delta f(i)=f(i+1)-f(i)$ , and $\Delta^2f=\Delta(\Delta f)$ , the solution $f\coloneqq f_h$ of the Stein equation (4.1) has the following properties:
(i) $\|\,f\|\coloneqq \sup_{i\in\mathbb{Z}_+}\vert\,f(i)\vert={C}_{0}(\lambda,k)\text{Pn}(\lambda)\{h\}$ , where
\begin{equation*} {C}_{0}(\lambda,k)\coloneqq \frac{F(k-1)}{k\pi_k}{,} \end{equation*}(ii) $\Delta f(i)$ is negative and decreasing in $i\le k-1$ and positive and decreasing in $i\ge k$ ,
(iii)
\begin{equation*} \|\Delta f\|_{k-}\coloneqq \sup_{i\le k-1}\vert\Delta f(i)\vert ={C}_{1-}(\lambda,k)\text{Pn}(\lambda)\{h\}\end{equation*}and\begin{equation*} \|\Delta f\|_{k+}\coloneqq \sup_{i\ge k}\vert\Delta f(i)\vert= {C}_{1+}(\lambda,k)\text{Pn}(\lambda)\{h\}, \end{equation*}where\begin{equation*}{C}_{1-}(\lambda,k)\coloneqq \frac{F(k-1)}{k\pi_k}\biggl(1-\frac{F(k-2)}{F(k-1)}\cdot\frac{\lambda}{k-1}\biggr)\end{equation*}and\begin{equation*}{C}_{1+}(\lambda,k)\coloneqq \frac{F(k-1)}{k\pi_k}\biggl(1-\frac{\overline{F}(k+1)}{\overline{F}(k)}\cdot\frac{k}{\lambda}\biggr){,}\end{equation*}(iv)
\begin{equation*} \|\Delta f\|\coloneqq \sup_{i\in\mathbb{Z}_+}\vert\Delta f(i)\vert={C}_1(\lambda,k)\text{Pn}(\lambda)\{h\}\end{equation*}and\begin{equation*} \|\Delta^2 f\|\coloneqq \sup_{i\in\mathbb{Z}_+}\vert\Delta^2 f(i)\vert={C}_2(\lambda,k) \text{Pn}(\lambda)\{h\}, \end{equation*}
For $k>\lambda$ , death rates are bigger than the birth rate, so it seems intuitively obvious that $\tau_{k}^-$ is stochastically less than or equal to $\tau_{k-2}^+$ for such k. In view of representation (4.11) and $f(k)<0$ as shown in (4.6), this is equivalent to ${C}_{1-}(\lambda,k)> {C}_{1+}(\lambda,k)$ , leading to the following conjecture.
Conjecture 4.1. We conjecture that ${C}_{1-}(\lambda,k)> {C}_{1+}(\lambda,k)$ for all $k>\lambda$ , and the gap increases exponentially as a function of $k-\lambda$ .
Proof of Lemma 4.1. We build our argument on the birth–death process representation of the solution
where $Z_n(t)$ is a birth–death process with birth rate $\lambda$ , unit per capita death rate, and initial state $Z_n(0)=n$ [Reference Barbour4, Reference Barbour and Brown5, Reference Brown and Xia12]. For convenience we adopt the notation in [Reference Brown and Xia12]: for $i,j\in\mathbb{Z}_+$ , define
and
Applying Lemmas 2.1 and 2.2 of [Reference Brown and Xia12] with birth rate $\lambda$ , death rate $\beta_i=i$ , $A\coloneqq [k,\infty)$ , and $\pi(\!\cdot\!)=\sum_{l\in \cdot}\pi_l$ , we have
and for $j\in\mathbb{Z}_+$ ,
where, as in Theorem 2.1,
One can easily simplify (4.3) to get
which, together with (4.4) and the balance equations
implies
It follows from [Reference Brown and Xia12, Lemma 2.4] that for $i\ge 1$ ,
which, together with (4.7), ensures
Hence $f(k)\le f(i)\le 0$ , and combining (4.4), (4.5), and (4.6) gives
as claimed in (i).
à propos of (ii), because of (4.9) and (4.10), it remains to show that $\Delta f$ is decreasing in the two ranges. To this end, we will mainly rely on the properties of the solution (4.2). Let T be an exponential random variable with mean 1 and independent of birth–death process $Z_{i-1}$ . Then $Z_i$ can be represented as
and hence we obtain from (4.2) and the strong Markov property in the second-to-last equality that
This enables us to give another representation of (4.8) as
and so
For $i\ge k$ , using the strong Markov property again in the equalities below, we have
where the inequality follows from
Similarly, for $i\le k-2$ , $\tau_{i-1,i}$ is stochastically less than or equal to $\tau_{i,i+1}$ , so
Hence $\Delta^2 f(i)\le 0$ for $i\ge k$ and $i\le k-2$ , which concludes the proof of (ii).
In terms of (iii), we use (ii) to obtain
Likewise,
Since (iv) is clearly an immediate consequence of (iii), (2.2), and (2.3), the proof of Lemma 4.1 is complete.
Proof of Theorem 2.1. As in the proof of Lemma 4.1, we set $A=[k,\infty)$ and $h=\textbf{ 1}_A$ . Then
Define
Then it follows from (4.1) that
For the estimate of $e_1$ , from $f(0)=f(1)$ , we know that $\lambda f(0)=-\text{Pn}(\lambda)\{h\}$ , and thus
which gives
For the estimate of $e_2$ , denoting ${\tilde f}(\,j)\coloneqq f(\,j-a)$ , we have
Using Lemma 4.1(ii), we have that $\Delta^2{\tilde f}(m)$ is negative for all m except $m=a+k-1$ , which implies
and
and hence
By taking
we have from (4.14) that
where the third-to-last equality is because $\sum_{i\in\mathcal{I}}\mathbb{E}[(X_i-\mu_i)Z_i]=\sigma^2$ and $(X_i,Z_i)$ is independent of $W_i^{\prime}$ , and the last equality is due to the assumption that $\{X_j\colon j\in B_i\}$ is $\mathcal{F}_i$ -measurable. Using (4.15) in (4.16), we obtain
Now, combining Lemma 4.1(iii, iv), (4.12), (4.13), and (4.17) gives (2.1).
Proof of Corollary 2.1. Under the setting of the local dependence, the claim follows from Theorem 2.1 by taking $Z_i=Z_i^{\prime}=X_i$ .
Proof of Theorem 2.2. Recall the Stein representation (4.12) and the estimate (4.13). It remains to tackle (4.14). However,
and thus
Hence, combining (4.12), (4.13), and (4.18) completes the proof.
Proof of Theorem 2.3. Again we make use of the Stein representation (4.12) and the estimate (4.13) so that it suffices to deal with (4.14). To this end, we have
However, with $R=W^\star-W$ ,
and a similar argument for (4.15) ensures
and hence
The claim follows from combining (4.12), (4.13), and (4.19) and using Lemma 4.1(iii, iv).
Proof of Proposition 3.2. The first inequality of (3.7) is a direct consequence of [Reference Hoeffding27]. For the second inequality, let $h=\textbf{ 1}_{[k,\infty)}$ and f be the solution of the Stein identity (4.1) with $\lambda=\mu$ , setting $W_i=W-X_i$ , $Y\sim\text{Pn}(\mu)$ , the following argument is standard (see [Reference Barbour, Holst and Janson9, page 6]) and we repeat it for ease of reading:
For any non-negative integer-valued random variable U such that the following expectations exist, summation by parts gives
On the other hand, Proposition 2.1 of [Reference Barbour, Chen and Choi8] ensures that
so using Lemma 4.1(ii), we have
However, by (4.2), since $\text{Pn}(\mu)$ is the stationary distribution of $Z_i$ , $Z_{Y}(t)\sim \text{Pn}(\mu)$ , leading to
where $T_1,T_2$ are i.i.d. $\exp(1)$ random variables independent of Y. Combining (4.20), (4.21), and (4.22), we have
For the first term of (4.23), using [Reference Barbour, Holst and Janson9, Proposition A.2.1 (ii)], we obtain
For the second term of (4.23), we use the crude estimate of
(see [Reference Barbour, Holst and Janson9, Lemma 1.1.1] or Remark 2.1), so applying [Reference Barbour, Holst and Janson9, Proposition A.2.1 (ii)] again,
The bound (3.7) follows by collecting (4.23), (4.24), and (4.25).
Acknowledgements
We thank the anonymous referees for suggesting the ‘naive bound’ in Remark 2.1 and for comments leading to the improved version of the paper. We also thank Serguei Novak for email discussions about the quality of the bounds presented in the paper versus the ‘naive bound’. This work was supported in part by the Chinese Scholarship Council and in part by Australian Research Council grant DP190100613.