Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-02-06T08:03:05.348Z Has data issue: false hasContentIssue false

On moderate deviations in Poisson approximation

Published online by Cambridge University Press:  04 September 2020

Qingwei Liu*
Affiliation:
University of Melbourne
Aihua Xia*
Affiliation:
University of Melbourne
*
*Postal address: School of Mathematics and Statistics, University of Melbourne, VIC3010, Australia.
*Postal address: School of Mathematics and Statistics, University of Melbourne, VIC3010, Australia.
Rights & Permissions [Opens in a new window]

Abstract

In this paper we first use the distribution of the number of records to demonstrate that the right tail probabilities of counts of rare events are generally better approximated by the right tail probabilities of a Poisson distribution than those of the normal distribution. We then show that the moderate deviations in Poisson approximation generally require an adjustment and, with suitable adjustment, we establish better error estimates of the moderate deviations in Poisson approximation than those in [18]. Our estimates contain no unspecified constants and are easy to apply. We illustrate the use of the theorems via six applications: Poisson-binomial distribution, the matching problem, the occupancy problem, the birthday problem, random graphs, and 2-runs. The paper complements the works [16], [8], and [18].

Type
Research Papers
Copyright
© Applied Probability Trust 2020

1. Introduction

An exemplary moderate deviation theorem is as follows (see [29, page 228]). Let $X_i$ , $1\le i\le n$ , be independent and identically distributed (i.i.d.) random variables with $\mathbb{E}(X_1)=0$ and $\text{var}(X_1)=1$ . If, for some $t_0>0$ ,

(1.1) \begin{equation}\mathbb{E} {\text{e}}^{t_0|X_1|}\le c_0<\infty,\end{equation}

then there exist positive constants $c_1$ and $c_2$ depending on $c_0$ and $t_0$ such that

(1.2) \begin{equation}\frac{\mathbb{P}({{n}^{-1/2}}\sum_{i=1}^nX_i\ge z)}{1-\Phi(z)}=1 + \text{O}(1)\frac{1+z^3}{\sqrt{n}},\quad 0\le z\le c_1n^{1/6},\end{equation}

where $\Phi(z)$ is the distribution function of the standard normal, $|\text{O}(1)|\le c_2$ . However, since the pioneering work [Reference Chen15], it has been shown [Reference Barbour, Holst and Janson9] that, for the counts of rare events, Poisson distribution provides a better approximation. For example, the distribution of the number of records [Reference Dwass23, Reference Rényi30] in Example 1.1 below can be better approximated by the Poisson distribution having the same mean than by a normal distribution [Reference Deheuvels and Pfeifer22]. Moreover, a suitable refinement of the Poisson distribution can further improve the performance of the approximation [Reference Borovkov10, Reference Borovkov and Pfeifer11].

The right tail probabilities of counts of rare events are often needed in statistical inference, but these probabilities are so small that the error estimates in approximations of distributions of the counts are usually of no use because the bounds are often larger than the probabilities of interest. Hence it is of practical interest to consider their approximations via moderate deviations in Poisson approximation in a similar fashion to (1.2). However, there is not much progress in the general framework except the special cases in [Reference Chen and Choi16], [Reference Barbour, Chen and Choi8], [Reference Chen, Fang and Shao18], [Reference Tan, Lu and Xia34], and [Reference Čekanavičius and Vellaisamy13]. This is partly due to the fact that the tail behaviour of a Poisson distribution is significantly different from that of a normal distribution, and this fact was observed by Gnedenko [Reference Gnedenko25] in the context of extreme value theory. In particular, Gnedenko [Reference Gnedenko25] concluded that the Poisson distribution does not belong to any domain of attraction, while the normal distribution belongs to the domain of attraction of the Gumbel distribution.

Figure 1: Pn( $\lambda_n$ ).

Example 1.1. We use the distribution of the number of records to explain the difference of moderate deviations between Poisson and normal approximations. More precisely, let $\{\eta_i\colon 1\le i\le n\}$ be i.i.d. random variables with a continuous cumulative distribution function. As the value of $\eta_1$ is always a record, for $2\le i\le n$ , we say $\eta_i$ is a record if $\eta_i>\max_{1\le j\le i-1}\eta_j$ . We define the indicator random variable

\begin{equation*}I_i\coloneqq \textbf{ 1}\Bigl[\eta_i>\max_{1\le j\le i-1}\eta_j\Bigr],\end{equation*}

that is, $I_i=1$ if a new record occurs at time i and $I_i=0$ otherwise. Our interest is in the distribution of $S_n\coloneqq \sum_{i=2}^nI_i$ , denoted by $\mathscr{L}(S_n)$ . Dwass [Reference Dwass23] and Rényi [Reference Rényi30] stated that $\mathbb{E} I_i=1/i$ , $\{I_i\colon 2\le i\le n\}$ are independent, so

\begin{equation*}\lambda_n\coloneqq \mathbb{E} S_n=\sum_{i=2}^n\frac1i, \quad \sigma_n^2\coloneqq \text{var}(S_n)=\sum_{i=2}^n\frac1i\biggl(1-\frac1i\biggr).\end{equation*}

We use $\text{Pn}(\lambda)$ to stand for the Poisson distribution with mean $\lambda$ , $\text{Pn}(\lambda)(A)\coloneqq \mathbb{P}(Y\in A)$ for $Y\sim\text{Pn}(\lambda)$ , and $N(\mu,\sigma^2)$ to stand for the normal distribution with mean $\mu$ and variance $\sigma^2$ .

Let $v_n\coloneqq \lambda_n+x\cdot\sigma_n$ , and we consider approximations of $\mathbb{P}(S_n\ge v_n)$ by moderate deviations based on $\text{Pn}(\lambda_n)$ [Reference Barbour, Chen and Choi8, Reference Chen, Fang and Shao18] and $N_n\sim N(\lambda_n,\sigma_n^2)$ . For $x=3$ , Figures 1, 2, and 4, respectively, are the plots of the ratios $\mathbb{P}(S_n\ge v_n)/\text{Pn}(\lambda_n)([v_n,\infty))$ , $\mathbb{P}(S_n\ge v_n)/\mathbb{P}(N_n\ge v_n)$ , and $\mathbb{P}(S_n\ge v_n)/\text{Pn}(\sigma_n^2)([v_n,\infty))$ for the range of $n\in[3,10^5]$ . As observed in [Reference Borovkov and Pfeifer11], Poisson and normal approximations to $\mathscr{L}(S_n)$ , respectively, are of order $\text{O}((\ln n)^{-1})$ and $\text{O}((\ln n)^{-1/2})$ , and hence the numerical studies confirm that approximation by the Poisson distribution is better than that by the normal distribution. In fact, it appears that the speed of convergence of $\mathbb{P}(S_n\ge v_n)/\mathbb{P}(N_n\ge v_n)$ to 1 as $n\to\infty$ is too slow to be of practical use. In the context of normal approximation to the distribution of integer-valued random variables, a common practice is to introduce a 0.5 correction, giving the ratios $\mathbb{P}(S_n\ge v_n)/\mathbb{P}(N_n\ge\lceil v_n\rceil-0.5)$ , where $\lceil x\rceil$ is the smallest integer that is not less than x. Figure 3 is the plot of the ratios and we can see that the ratios are still far away from the limit of 1. Finally, the difference between Figure 1 and Figure 4 shows that a minor change of the mean of the approximating Poisson can change the quality of moderate deviation approximation significantly, further highlighting the difficulty of obtaining sharp bounds in theoretical studies in the area.

Figure 2. $N(\lambda_n,\sigma_n^2)$ without correction.

Figure 3. $N(\lambda_n,\sigma_n^2)$ with correction.

Example 1.1 shows that the distribution of the counts of rare events often has a heavier right tail than that of the corresponding normal distribution; approximations by the moderate deviations in the normal distribution are generally inferior to those by the moderate deviations in the Poisson distribution. Example 1.2 says that the parameter of the approximating Poisson distribution suggested in [Reference Chen and Choi16], [Reference Barbour, Chen and Choi8], and [Reference Chen, Fang and Shao18] is not optimal, and some adjustment can significantly improve the quality of approximations by the moderate deviations in the Poisson distribution.

Figure 4. $\text{Pn}(\sigma_n^2)$ .

Example 1.2. With $0<p<1$ , let $W_n\sim\text{Bi}(n,p)$ , $Y_n\sim\text{Pn}(np)$ , and $Z\sim N(0,1)$ . Then, for a fixed $x>0$ ,

\begin{equation*}\lim_{n\to\infty}\frac{\mathbb{P}(W_n\ge np+x\sqrt{np(1-p)})}{\mathbb{P}(Y_n\ge np+x\sqrt{np(1-p)})}= \frac{\mathbb{P} (Z\ge x )}{\mathbb{P}(Z\ge x\sqrt{1-p})},\end{equation*}

which systematically deviates from 1 as x moves away from 0. The systematic bias can be removed by introducing an adjustment into the approximate models: for a fixed $x>0$ ,

\begin{equation*}\lim_{n\to\infty}\frac{\mathbb{P}(W_n\ge np+x\sqrt{np(1-p)})}{\mathbb{P}(Y_n\ge np+x\sqrt{np})}{=}1\end{equation*}

or equivalently, with $Y_n'\sim \text{Pn}(np(1-p))$ ,

\begin{equation*}\lim_{n\to\infty}\frac{\mathbb{P}(W_n\ge np+x\sqrt{np(1-p)})}{\mathbb{P}(Y_n^{\prime}\ge np(1-p)+x\sqrt{np(1-p)})}{=}1.\end{equation*}

Example 1.2 suggests that it is more suitable to approximate the right tail probabilities by looking at the number of standard variations away from the mean, which is essentially the original idea of the translated (shifted) Poisson approximation [Reference Barbour and Xia7, Reference Röllin31, Reference Röllin32]. In this paper we show that it is indeed better to approximate the right tail probabilities via the moderate deviations in the translated Poisson distribution.

Our approach does not rely on the boundedness of the Radon–Nikodým derivative as in [Reference Chen and Choi16] and [Reference Barbour, Chen and Choi8] or the tacit assumption of well-behaved tail probabilities as in [Reference Chen, Fang and Shao18]; see Remark 2.3 for more details. For the case of Poisson-binomial, we show in Proposition 3.2 that our approach works for the case when the maximum of the success probabilities of the Bernoulli random variables is not small, such as the distribution of the number of records.

The paper is organised as follows. In Section 2 we state the main results in the context of local dependence, size-biased distribution, and discrete zero-biased distribution. In Section 3 we illustrate the accuracy of our bounds with six examples. The proofs of the main results are postponed to Section 4, where we also establish Stein’s factors for Poisson moderate deviations in Lemma 4.1.

2. The main results

In this section we state three theorems on moderate deviations in Poisson approximation: the first is under a local dependent structure, the second is with respect to the size-biased distribution, and the last is in terms of the discrete zero-biased distribution.

We first consider a class of non-negative integer-valued random variables $\{X_i\colon i\in\mathcal{I}\}$ satisfying the local dependent structure (LD2) in [Reference Chen and Shao17] (see also [Reference Arratia, Goldstein and Gordon2] for its origin). For ease of reading, we quote the definition of (LD2) below.

  1. (LD2) For each $i\in\mathcal{I}$ , there exists an $A_i\subset B_i\subset \mathcal{I}$ such that $X_i$ is independent of $\{X_j\colon j\in A_i^c\}$ and $\{X_i\colon i\in A_i\}$ is independent of $\{X_j\colon j\in B_i^c\}$ .

We set

\begin{equation*}W=\sum_{i\in\mathcal{I}} X_i,\quad Z_i=\sum_{j\in A_i} X_j,\quad Z_i^{\prime}=\sum_{j\in B_i}X_j,\quad W_i=W-Z_i{,}\quad W_i^{\prime}=W-Z_i^{\prime}.\end{equation*}

We write

\begin{equation*} \mu_i=\mathbb{E} (X_i),\quad \mu=\mathbb{E} (W),\quad \sigma^2=\text{Var}{(W)}.\end{equation*}

As suggested in Example 1.2, we consider $Y\sim\text{Pn}(\lambda)$ approximation to $W-a$ with $|\lambda-\sigma^2|$ being not too large and $a=\mu-\lambda$ being an integer, so that k in $\mathbb{P}(W-a\ge k)$ and $\mathbb{P}(Y\ge k)$ is in terms of the number of standard deviations of W. In principle, the constant a is chosen to minimise the error of approximation. However, our theory is formulated in such a flexible way that other choices of $\lambda$ and a are also acceptable. The three most useful choices of a are $a=0$ , $a = \lfloor{\mu-\sigma^2}\rfloor$ , and $a = \lceil{\mu-\sigma^2}\rceil$ , where $\lfloor{\cdot}\rfloor$ stands for the largest integer in $(-\infty,\cdot]$ .

Theorem 2.1. With the set-up in the preceding paragraph, assume that $\{X_i\colon i\in\mathcal{I}\}$ satisfies (LD2) and, for each i, there exists a $\sigma$ -algebra $\mathcal{F}_i$ such that $\{X_j\colon j\in B_i\}$ is $\mathcal{F}_i$ -measurable. Define

\begin{equation*}\theta_i\coloneqq \text{ess\,sup}\max_j\mathbb{P}(W=j\mid \mathcal{F}_i),\end{equation*}

where $\text{ess\,sup}\, V$ is the essential supremum of the random variable V. Then, for integer $a<\mu$ , $\lambda=\mu-a$ , and positive integer $k>\lambda$ , we have

(2.1) \begin{align} \biggl\vert\frac{\mathbb{P}(W-a\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert&\le{C}_2(\lambda,k)\sum_{i\in\mathcal{I}}\theta_i \{{|\mathbb{E}(X_i-\mu_i)Z_i|}\mathbb{E} (Z_i^{\prime})\notag &\quad\, +\mathbb{E} [|X_i-\mu_i|Z_i(Z_i^{\prime}-Z_i/2-1/2)]\}\notag &\quad\, +{C}_1(\lambda,k)|\lambda-\sigma^2|+\mathbb{P}(W-a<-1),\end{align}

where, with $F(\,j)=\mathbb{P}(Y\le j)$ , $\overline{F}(\,j)=\mathbb{P}(Y\ge j)$ ,

(2.2) \begin{align}C_1(\lambda,k)&\coloneqq \frac{F(k-1)}{k\mathbb{P}(Y=k)}\biggl\{1-\min\biggl(\frac{F(k-2)}{F(k-1)}\cdot\frac{\lambda}{k-1},\frac{\overline{F}(k+1)}{\overline{F}(k)}\cdot\frac{k}{\lambda}\biggr)\biggr\},\end{align}
(2.3) \begin{align}{C}_2(\lambda,k)&\coloneqq \frac{F(k-1)}{k\mathbb{P}(Y=k)}\biggl(2-\frac{F(k-2)}{F(k-1)}\cdot\frac{\lambda}{k-1}-\frac{\overline{F}(k+1)}{\overline{F}(k)}\cdot\frac{k}{\lambda}\biggr).\end{align}

Remark 2.1. Both ${C}_1$ and ${C}_2$ can be numerically computed in applications and they cannot generally be improved (see the proofs below). They are better than the ‘naive’ counterparts $(1-{\text{e}}^{-\lambda})/(\lambda \mathbb{P}(Y\ge k))$ derived through the total variation bounds in [Reference Barbour and Eagleson6] and [Reference Barbour, Holst and Janson9]. Figure 5 provides details of

\begin{equation*}\mbox{ratio }i\coloneqq {C}_i(\lambda,k)/[(1-{\text{e}}^{-\lambda})/(\lambda \mathbb{P}(Y\ge k))], \quad i=1,2,\end{equation*}

for $\lambda=10$ , k from 10 to 43. We would like to mention that for large k and/or large $\lambda$ , the tail probabilities are so small that the calculation using MATLAB produces unstable results since accumulated computation errors often exceed the tail probabilities, and hence more powerful computational tools are needed to achieve the required accuracy or one has to resort to known approximations to the Poisson right tails and point probabilities.

Figure 5. Performance of the bound.

Remark 2.2. Due to the discrete nature of Poisson distribution, it seems impossible to analytically simplify ${C}_1$ and ${C}_2$ at negligible costs for the diverse range of $k>\lambda$ .

Remark 2.3. If $\lambda$ is chosen reasonably close to $\sigma^2$ so that $\lambda-\sigma^2$ is bounded, then $\theta_i$ in the bound (2.1) converges to 0 when $\sigma^2$ converges to $\infty$ . Our bound does not rely on the Radon–Nikodým derivative of $\mathscr{L}(W)$ with respect to $\text{Pn}(\lambda)$ , which is the crucial ingredient in [Reference Chen and Choi16] and [Reference Barbour, Chen and Choi8]. On the other hand, the tacit assumption of [Reference Chen, Fang and Shao18] is that

\begin{equation*} \sup_{\lambda\le r\le k}\frac{\mathbb{P}(W\ge r)}{\mathbb{P}(Y\ge r)}\quad\text{for \textit{W} and \textit{Y}} \end{equation*}

in Theorem 2.1 is well-behaved and this assumption is hard to verify. The bound (2.1), although relatively crude, does not rely on this assumption and covers more general cases.

Corollary 2.1. For the sum of independent non-negative integer-valued random variables $W=\sum_{i\in\mathcal{I}}X_i$ , let $\theta_i=\max_{j}\mathbb{P}(W-X_i=j)$ , $\mu_i=\mathbb{E} X_i$ , $\mu=\sum_{i\in\mathcal{I}} \mu_i$ , $\sigma^2=\text{Var}(W)$ . For any integer $a<\mu$ , let $\lambda=\mu-a$ , $Y\sim\text{Pn}(\lambda)$ . Then, for $k> \lambda$ ,

\begin{align*} \biggl\vert\frac{\mathbb{P}(W-a\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert&\le {C}_2(\lambda,k)\sum_{i\in\mathcal{I}}\theta_i\biggl\{\mu_i{|\mathbb{E}[X_i(X_i-\mu_i)]|}+\frac{1}{2}\mathbb{E} [\vert X_i-\mu_i\vert X_i(X_i-1)]\biggr\}\notag &\quad\, +{C}_1(\lambda,k)|\lambda-\sigma^2|+\mathbb{P}(W-a<-1).\end{align*}

Remark 2.4. We leave $\mathbb{P}(W-a<-1)$ in the upper bound (2.1) because the current approach cannot remove it from the bound. Nevertheless, it is no more than 1 and converges to zero exponentially fast with suitable choice of a. For the sum of independent non-negative integer-valued random variables in Corollary 2.1, if a is at least less than $\mu$ by a few $\sigma$ s, we can use [20, Theorem 2.7] to obtain

(2.4) \begin{equation} \mathbb{P}(W-a<-1)\le {\exp \biggl({-\frac{(\mu-a+2)^2}{2\sum_{i\in\mathcal{I}}\mathbb{E}(X_i^2)}}\biggr)}.\end{equation}

For any non-negative random variable W with mean $\mu\in (0,\infty)$ and distribution $ {\text{d}} F(w)$ , the W-size-biased distribution [Reference Arratia and Goldstein1, Reference Cochran21] is given by

\begin{equation*} {\text{d}} F^s(w) = \frac{w {\text{d}} F(w)}{\mu}, \quad\text{$w \ge 0$,}\end{equation*}

or equivalently by the characterising equation

\begin{equation*}\mathbb{E} [Wg(W)]=\mu \mathbb{E} g(W^s) \quad \text{for all \textit{g} with\ $\mathbb{E}|Wg(W)|<\infty$.}\end{equation*}

Theorem 2.2. Let W be a non-negative integer-valued random variable with mean $\mu$ and variance $\sigma^2$ , and let $a<\mu$ be an integer, $\lambda=\mu-a$ . Then, for integer $k>\lambda$ , we have

(2.5) \begin{equation}\biggl\vert\frac{\mathbb{P}(W-a\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert \le{C}_1(\lambda,k) \{\mu\mathbb{E}\vert W+1-W^s\vert+\vert\mu-\lambda\vert \}+\mathbb{P}(W-a<-1), \end{equation}

where $Y\sim\text{Pn}(\lambda)$ .

Remark 2.5. Theorem 2.2 improves [18, Theorem 3] in a number of ways, with less restrictive conditions and no unspecified constants.

The next theorem is based on the discrete zero-biased distribution defined in [Reference Goldstein and Xia26] and the approach is very similar to that in [Reference Chen, Fang and Shao19]. For an integer-valued random variable V with mean $\mu$ and finite variance $\sigma^2$ , we say that $V^\star$ has the discrete V-zero-biased distribution [26, Definition 2.1] if, for all bounded functions $g\colon \mathbb{Z}\coloneqq \{0,\pm1,\pm2,\dots\} \rightarrow \mathbb{R}$ with $\mathbb{E} |Vg(V)|<\infty$ ,

\begin{equation*}\mathbb{E} [(V-\mu)g(V)] =\sigma^2\mathbb{E}\Delta g(V^\star),\end{equation*}

where $\Delta f(i)\coloneqq f(i+1)-f(i)$ .

Theorem 2.3. Let W be a non-negative integer-valued random variable with mean $\mu$ , variance $\sigma^2$ , let $a<\mu$ be an integer, and let $W^\star$ have the discrete W-zero-biased distribution and be defined on the same probability space as W. Set $R=W^\star-W$ and define

\begin{equation*}\theta_R=\max_{j}\mathbb{P}(W=j\mid R).\end{equation*}

Then, for integer $k>\lambda$ , with $\lambda=\mu-a>0$ , we have

(2.6) \begin{equation}\biggl\vert\frac{\mathbb{P}(W-a\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert \le{C}_2(\lambda,k)\sigma^2\mathbb{E}[|R|\theta_R]+{C}_1(\lambda,k)|\lambda-\sigma^2|\lambda^{-1}+\mathbb{P}(W-a<-1),\end{equation}

where $Y\sim\text{Pn}(\lambda)$ .

3. Examples

As many applications of Poisson approximation rely on size-biased distributions, we begin with a review of some facts about size biasing.

Size biasing has been of considerable interest for many decades (see [Reference Barbour, Holst and Janson9], [Reference Ross33], [Reference Arratia, Goldstein and Kochman3], and references therein). In the context of the sum of Bernoulli random variables, its size biasing is particularly simple. More precisely, if $\{X_i\colon i\in\mathcal{I}\}$ is a family of Bernoulli random variables with $\mathbb{P}(X_i\,{=}\,1)=p_i$ , then the size-biased distribution of $W=\sum_{i\in\mathcal{I}}X_i$ is

(3.1) \begin{equation} W^s=\sum_{j\ne I}X_j^{(I)}+1,\end{equation}

where

\begin{equation*}\mathscr{L}(\{X_j^{(i)}\colon j\in\mathcal{I}\})=\mathscr{L}(\{X_j\colon j\in\mathcal{I}\}\mid X_i=1),\end{equation*}

and I is a random element independent of $\{\{X_j^{(i)}\colon j\in\mathcal{I}\}\colon i\in\mathcal{I}\}$ having distribution $\mathbb{P}(I=i)={{p_i}/{(\mathbb{E}} W)}$ , $i\in\mathcal{I}$ . Moreover, $\{X_i\colon i\in\mathcal{I}\}$ are said to be negatively related (resp. positively related) [Reference Barbour, Holst and Janson9, page 24] if one can construct $\{\{X_j^{(i)}\colon j\in\mathcal{I}\}\colon i\in\mathcal{I}\}$ such that $X_j^{(i)}\le$ (resp. $\ge$ ) $X_j$ for all $j\ne i$ . When $\{X_i\colon i\in\mathcal{I}\}$ are negatively related, we have

(3.2) \begin{equation} \mathbb{E}\vert W+1-W^s\vert =\mathbb{E}(W+1-W^s)=\mu^{-1}(\mu-\sigma^2),\end{equation}

where $\mu=\mathbb{E} W$ and $\sigma^2=\text{var}(W)$ . On the other hand, if $\{X_i\colon i\in\mathcal{I}\}$ are positively related, then

(3.3) \begin{align} \mathbb{E}\vert W+1-W^s\vert&=\mathbb{E} \biggl\vert\sum_{j\ne I}(X_j^{(I)}-X_j)-X_I\biggr\vert\notag\\[4pt] &\le\mathbb{E}\biggl\{\sum_{j\ne I}(X_j^{(I)}-X_j)+X_I\biggr\}\notag\\[4pt] &=\mathbb{E}(W^s-W-1)+2\mu^{-1}\sum_{i\in\mathcal{I}}p^2_i\notag\\[4pt] &=\mu^{-1}(\sigma^2-\mu)+2\mu^{-1}\sum_{i\in\mathcal{I}}p^2_i.\end{align}

3.1. Poisson-binomial trials

Let $\{X_i, 1\le i\le n\}$ be independent Bernoulli random variables with $\mathbb{P}(X_i=1)=p_i\in (0,1)$ , $W=\sum_{i=1}^nX_i$ , $\mu=\mathbb{E} W$ , and $\mu_2=\sum_{i=1}^np_i^2$ . When $\tilde p\coloneqq \max_{1\le i\le n}p_i\to 0$ , the large deviation of W is investigated in [Reference Chen and Choi16] and [Reference Barbour, Chen and Choi8] with precise asymptotic order. We give two results for this particular case without the assumption $\tilde p$ being small: the first is a direct consequence of the general results in Section 2 and the second is based on our approach using a more fine-tuned analysis and well-studied properties of the tail behaviour of W.

Proposition 3.1. Recalling ${C}_1$ and ${C}_2$ in (2.2) and (2.3), for any integer $k>\mu$ we have

(3.4) \begin{equation} \biggl\vert\frac{\mathbb{P}(W\ge k)}{\text{Pn}(\mu)([k,\infty))}-1\biggr\vert\le{C}_1(\mu,k)\mu_2\end{equation}

and, with $a=\lfloor{\mu_2}\rfloor$ and $\lambda\coloneqq \mu-a$ ,

(3.5) \begin{align}& \biggl\vert\frac{\mathbb{P}(W-a\ge k)}{\text{Pn}(\lambda)([k,\infty))}-1\biggr\vert \notag \\[5pt] &\quad \le\frac{{C}_2(\lambda,k)\sum_{i=1}^np_i^2(1-p_i)}{1\vee \sqrt{(\!\sum_{i=1}^np_i\wedge(1-p_i)-1/4)\pi/2}} +{C}_1(\lambda,k)|\lambda-\sigma^2|+{{\text{e}}^{-(\lambda+2)^2/(2\mu)}}.\end{align}

Proof. The claim (3.4) is a consequence of Theorem 2.2 with $a=0$ and $\mu\mathbb{E} |W+1-W^s|=\sum_{i=1}^np_i^2$ , as shown in (3.2).

The bound (3.5) is a special case of Corollary 2.1. Since $\mathscr{L}(W_i)$ is unimodal, Corollary 1.6 of [Reference Mattner and Roos28] says that

(3.6) \begin{align} \theta_i & = {d_{\text{TV}}}(W_i,W_i+1)\notag\\[4pt] &\le 1\wedge\biggl\{\sqrt{\frac{2}{\pi}}\biggl(\frac{1}{4}+\sum_{j\ne i}p_j\wedge(1-p_i)\biggr)^{-1/2}\biggr\}\notag\\[4pt] &\le 1\wedge \biggl\{\sqrt{\frac{2}{\pi}}\biggl(\sum_{i=1}^np_i\wedge(1-p_i)-1/4\biggr)^{-1/2}\biggr\}. \end{align}

On the other hand $\mathbb{E}(X_i^2)=p_i$ , and hence the upper bound (3.5) is an immediate consequence of Corollary 2.1 and (2.4).

One can also use Theorem 2.3 to obtain the same bound. More precisely, according to the construction of the discrete zero-biased distribution suggested in [Reference Goldstein and Xia26], let I be a random variable independent of $\{X_i, 1\le i\le n\}$ with distribution $\mathbb{P}(I=i)=p_i(1-p_i)/\sigma^2$ for $1\le i\le n$ . Then we can write $W^\star=W-X_I$ , giving $R=-X_I$ . We then apply (3.6) to bound $\theta_R$ as

\begin{equation*}\theta_R=\max_{j}\mathbb{P}(W=j\mid R)\le \sqrt{\frac{2}{\pi}}\biggl(\sum_{i=1}^np_i\wedge(1-p_i)-1/4\biggr)^{-1/2},\end{equation*}

and a routine calculation gives

\begin{equation*}\mathbb{E}|R|=\sum_{i=1}^np_i^2(1-p_i)/\sigma^2.\end{equation*}

Hence (3.5) follows from (2.6) and (2.4).

Proposition 3.2. Define

\begin{equation*}M\coloneqq M(\,p_1,\dots,p_n)= \begin{cases}{\text{e}}^\mu &\text{if $0<\mu<1$,}\\[4pt] {\text{e}}^{13/12}\sqrt{2\pi} (1-\mu_2/\mu )^{-1/2} &\text{if $\mu\ge 1$.}\end{cases}\end{equation*}

Then, for any integer k with $x\coloneqq (k-\mu)/\sqrt{\mu}\ge 1$ , we have

(3.7) \begin{equation} 0>\frac{\mathbb{P}(W\ge k)}{\text{Pn}(\mu)([k,\infty))}-1>- 2M(\mu_2/\mu)\biggl(x^2+{1}+4x{\sqrt{\frac{1-{\text{e}}^{-\mu}}\mu}}\biggr).\end{equation}

The proof relies on more information on the solutions of Stein’s equation and it is postponed to the end of Section 4. The bound (3.7) improves [18, (3.1)] in two respects: it contains no unspecified constants and it does not require $\tilde p$ to be small. For the distribution of the number $S_n$ of records, the large deviation results in [Reference Barbour, Chen and Choi8] do not apply. However, recalling that $\lambda_n=\sum_{i=2}^n(1/i)$ , we apply Proposition 3.2 with the harmonic series $\lambda_n=\sum_{i=2}^n(1/i)\ge \ln n+\gamma-1$ and the Riemann zeta function

\begin{equation*}\sum_{i=2}^n\frac1{i^2}\le \sum_{i=2}^\infty\frac1{i^2}=\frac{\pi^2}6-1\end{equation*}

to get the following estimate.

Corollary 3.1. For any integer k with $x\coloneqq (k-\lambda_n)/\sqrt{\lambda_n}\ge 1$ , we have

\begin{equation*}0>\frac{\mathbb{P}(S_n\ge k)}{\text{Pn}(\lambda_n)([k,\infty))}-1> -\frac{2{\text{e}}^{13/12}\sqrt{2\pi}(\pi^2/6-1)}{\sqrt{(\ln n+\gamma-1)(\ln n+\gamma-\pi^2/6)}}\biggl(x^2+{1}+\frac{4x}{\sqrt{\ln n+\gamma-1}}\biggr),\end{equation*}

where $\gamma$ is Euler’s constant.

Remark 3.1. We conjecture that, with $a=\lfloor{\mu_2}\rfloor$ and $\lambda\coloneqq \mu-a$ , the bound in (3.5) can be significantly improved and the better estimate is likely dependent on the Radon–Nikodým derivative bound

\begin{equation*} \sup_{r\ge 0}\frac{\mathbb{P}(W-a=r)}{\text{Pn}(\lambda)(\{r\})}. \end{equation*}

3.2. Matching problem

For a fixed n, let $\pi$ be a uniform random permutation of $\{1,\dots,n\}$ , and let

\begin{equation*}W=\sum_{j=1}^n\textbf{ 1}_{\{j=\pi(\,j)\}}\end{equation*}

be the number of fixed points in the permutation.

Proposition 3.3. For the random variable W defined above and any integer $k\ge 2$ , we have

(3.8) \begin{equation} \biggl\vert\frac{\mathbb{P}(W\ge k)}{\text{Pn}(1)([k,\infty))}-1\biggr\vert\le{\frac{2}{n}{C}_1(1,k)}. \end{equation}

Proof. In this case, the size-biased distribution $\mathscr{L}(W^s)$ can be coupled with W as follows [Reference Chatterjee, Diaconis and Meckes14]. Let I be uniformly distributed on $\{1,2,\dots,n\}$ , independent of $\pi$ , and define

\begin{equation*} \pi^s(\,j)=\begin{cases}I&\text{if $j=I$,}\\[3pt] \pi(I)&\text{if $j=\pi^{-1}(I)$,}\\[3pt] \pi(\,j)&\text{otherwise}.\end{cases}\end{equation*}

Set $W^s=\sum_{j=1}^n\textbf{ 1}_{\{j=\pi^s(\,j)\}}$ . One can easily verify that $W^s$ has the size-biased distribution of W. Also, we can check that $\mathbb{E} W=\text{Var}(W)=1$ , giving $\mathbb{E} W^s=2$ . Let $\Delta=W+1-W^s$ . Using the above construction of $W^s$ , we can conclude that $\Delta$ takes values in $\{-1,0,1\}$ and $\mathbb{P}(\Delta=1\mid W)=W/n$ . Since $\mathbb{E}\Delta=0,$ we have $\mathbb{P}(\Delta=1)=\mathbb{P}(\Delta=-1)$ , and $\mathbb{E}\vert\Delta\vert=2/n$ . On the other hand, $\lambda=\mu$ allows us to get rid of the second term in (2.5). By Theorem 2.2 with $a=0$ , $\lambda=\mu=1$ , the claim follows.

Remark 3.2. The bound (3.8) contains no unknown constants and improves the bound of [18, Section 3.3].

3.3. Occupancy problem

The occupancy problem has a long history dating back to the early development of probability theory. General references on this subject can be found in classical treatments, e.g. [24, Vol. 1, Chapter 2] and [Reference Barbour, Holst and Janson9, Chapter 6].

The occupancy problem can be formulated as follows. Let l balls be thrown independently of each other into n boxes uniformly. Let $X_i$ be the indicator variable of the event that the ith box is empty, so the number of empty boxes can be written as $W=\sum_{i=1}^nX_i$ . Noting that $p\coloneqq \mathbb{E} X_i=(1-{{1}/{n}})^l$ , direct computation gives

\begin{gather*}\mu\coloneqq \mathbb{E} W=np, \sigma^2\coloneqq \text{Var}(W)=\mu-\mu^2+\mu(n-1)\biggl(1-\frac{1}{n-1}\biggr)^l.\end{gather*}

Proposition 3.4. For the random variable W defined above and any integer $k>\mu$ , we have

(3.9) \begin{equation} \biggl\vert\frac{\mathbb{P}(W\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert\le{C}_1(\mu,k)\mu\biggl[\mu-(n-1)\biggl(1-\frac{1}{n-1}\biggr)^l\biggr],\end{equation}

where $Y\sim \text{Pn}(\mu)$ .

Proof. For the sake of completeness, we provide the following proof, which is essentially a repeat of [Reference Barbour, Holst and Janson9, page 23]. From the construction of W-size-biased distribution in (3.1), we can construct a coupling as follows. Let I be uniform on $\{1,\dots,n\}$ , that is, we randomly pick one box with equal probability. If the selected box is not empty, we redistribute all balls in the box randomly into the other $n-1$ boxes with equal probability $1/(n-1)$ . Define $X_{j}^{(i)}$ as the indicator of the event that the box being selected is i, and after the redistribution, box j is empty. With this coupling in mind, one can verify that $\{X_i\}$ is negatively related, so it follows from (3.2) that

\begin{equation*}\mathbb{E}\vert W+1-W^s\vert=\mu-(n-1)\biggl(1-\frac{1}{n-1}\biggr)^l.\end{equation*}

Now, applying Theorem 2.2 with $a=0$ yields (3.9).

3.4. Birthday problem

The classical birthday problem is essentially a variant of the occupancy problem. For this reason, we throw l balls independently and equally likely into n boxes and let $X_{ij}$ be the indicator random variable of the event that ball i and ball j fall into the same box. The number of pairs of balls going into the same boxes (i.e. the number of pairs of people having the same birthdays) can be written as $W=\sum_{i<j}X_{ij}$ . Define $p=\mathbb{E} X_{ij}={{1}/{n}}$ , so $\mu=\mathbb{E} W=\binom{l}{ 2}p$ . Chatterjee, Diaconis, and Meckes [Reference Chatterjee, Diaconis and Meckes14] gave the following construction of $W^s$ : label the balls from 1 to l, randomly choose two balls $J_1$ and $J_2$ , and move ball $J_1$ into the box that $J_2$ is in. Then W is the number of pairs of balls before the move while $W^s$ is the number of pairs of balls after the move. Let E be the event that $J_1$ and $J_2$ are from the same box. When E occurs, $W^s=W$ , so $\vert W+1-W^s\vert=1$ ; otherwise $J_1$ and $J_2$ are from different boxes with $B_1$ and $B_2$ balls respectively, giving

\begin{equation*} W+1-W^s=B_1-B_2.\end{equation*}

Hence

\begin{align*}\mathbb{E}\vert W+1-W^s\vert&=\mathbb{P}(E)+\mathbb{E}[\vert W+1-W^s\vert\mid E^c]\mathbb{P}(E^c)\\[2pt] &\le\frac1n+\mathbb{E}\vert B_1-B_2\vert\\[2pt] &\le\frac1n+\mathbb{E} (B_1+B_2)\\[2pt] &=\frac{1+2l}{n}.\end{align*}

This, together with Theorem 2.2 and $a=0$ , gives the following proposition.

Proposition 3.5. For the random variable W defined above and any integer $k>\mu$ , we have

\begin{equation*} \biggl\vert\frac{\mathbb{P}(W\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert\le{C}_1(\mu,k)\mu\frac{1+2l}{n},\end{equation*}

where $Y\sim \text{Pn}(\mu)$ .

3.5. Triangles in the Erdős–Rényi random graph

Let $G=G(n,p)$ be an Erdős–Rényi random graph on n vertices with edge probability p. Let $K_n$ be the complete graph on n vertices, and let $\Gamma$ be the set of all triangles in $K_n$ . For $\alpha\in\Gamma$ , let $X_\alpha$ be the indicator that there is a triangle in G at $\alpha$ , that is,

\begin{equation*} X_\alpha=\textbf{ 1}_{\{\alpha\subset G\}}.\end{equation*}

Therefore the number of triangles in G can be represented as $W=\sum_{\alpha\in\Gamma}X_\alpha$ . Clearly, $X_\alpha$ is independent of $X_\beta$ if $\alpha$ and $\beta$ do not share a common edge. By analysing the numbers of shared edges, we obtain (see [33, page 255])

\begin{gather*} \mu=\mathbb{E} W=\binom{n}{ 3}p^3, \sigma^2=\text{Var}(W)=\binom{n }{ 3}p^3[1-p^3+3(n-3)(\,p^2-p^3)].\end{gather*}

Proposition 3.6. For the random variable W defined above and any integer $k>\mu$ , we have

(3.10) \begin{equation} \biggl\vert\frac{\mathbb{P}(W\ge k)}{\mathbb{P}(Y\ge k)}-1\biggr\vert\le{C}_1(\mu,k)\mu (3(n-3)p^2(1-p)+p^3),\end{equation}

where $Y\sim \text{Pn}(\mu)$ .

Proof. The following proof is a special version of the general argument in [Reference Barbour, Holst and Janson9, page 89]. Since $X_\alpha$ and $X_\beta$ are independent if $\alpha$ and $\beta$ have no common edges, a size-biased distribution of W can be constructed as follows. Let

\begin{equation*} X_\beta^{(\alpha)}\coloneqq \textbf{ 1}_{\{\beta\subset G\cup\alpha\}},\quad \beta\in\Gamma {.} \end{equation*}

Then

\begin{equation*} \mathscr{L}(\{X_\beta^{(\alpha)},\beta\ne\alpha\})=\mathscr{L}(\{X_\beta,\beta\ne\alpha\}\mid X_\alpha=1). \end{equation*}

Here the union of graphs is in the sense of set operations on their vertices and edges. Let I be a random element taking values in $\Gamma$ with equal probability and independent of $\mathscr{L}(\{X_\beta^{(\alpha)},\alpha,\beta\})$ . Then we can write

\begin{equation*} W^s=\sum_{\beta\ne I}X_\beta^{(I)}+1. \end{equation*}

Because $X_\beta^{(\alpha)}\ge X_\beta$ for all $\beta\in\Gamma$ , (3.3) implies

\begin{equation*} \mathbb{E}\vert W+1-W^s\vert \le \mu^{-1}(\sigma^2-\mu+2\mu p^3)=3(n-3)p^2(1-p)+p^3.\end{equation*}

The claim follows from Theorem 2.2 with $a=0$ .

Remark 3.3. Since $\mu=\binom{n}{ 3}p^3$ , if $p = \text{O}(1/n)$ , then the error bound (3.10) is of the same order $\text{O}(1/n)$ .

3.6. 2-runs

Let $\{\xi_i,\dots,\xi_n\}$ be i.i.d. Bernoulli(p) random variables with $n\ge 9$ , $p<2/3$ . For each $1\le i\le n$ , define $X_i=\xi_i\xi_{i+1}$ and, to avoid edge effects, we define $\xi_{j+n}=\xi_j$ for $-3\le j\le n$ . The number of 2-runs in the Bernoulli sequence is defined as $W=\sum_{i=1}^nX_i$ . Then $\mu=np^2$ and variance $\sigma^2=np^2(1-p)(3p+1)$ .

Proposition 3.7. For any integer $k>\mu$ ,

(3.11) \begin{equation} \biggl\vert\frac{\mathbb{P}(W_n\ge k)}{\text{Pn}(\mu)([k,\infty))}-1\biggr\vert\le {C}_1(\mu,k)np^3(2-p).\end{equation}

If $a\coloneqq \lfloor{np^3(3p-2)}\rfloor$ , $\lambda=\mu-a$ , then for any integer $k>\lambda$ ,

(3.12) \begin{equation} \biggl\vert\frac{\mathbb{P}(W_n-a\ge k)}{\text{Pn}(\lambda)([k,\infty))}-1\biggr\vert\le{C}_2(\lambda,k)\frac{9.2np^2(1+5p)}{\sqrt{(n-8)(1-p)^3}}+{C}_1(\lambda,k)(1\wedge\lambda).\end{equation}

Proof. For (3.11), we apply Theorem 2.2 with $a=0$ ,

\begin{equation*}X_j^{(i)}=\begin{cases}X_j &\text{if $|j-i|\ge 2$,}\\[3pt] \xi_j &\text{if $j=i-1$,}\\[3pt] \xi_{j+1} &\text{if $j=i+1$,}\\[3pt] 1 &\text{if }j=i,\end{cases}\end{equation*}

I a uniform random variable on $\{1,\dots,n\}$ independent of $\{X_j^{(i)}\}$ , and

\begin{equation*}W^s=\sum_{j\ne I}X_j^{(I)}+1,\end{equation*}

giving

\begin{align*}\mathbb{E}\vert W+1-W^s\vert&=\mathbb{E}\vert X_{I-1}+X_I+X_{I+1}-\xi_{I-1}-\xi_{I+2}\vert\notag\\[3pt] &=\mathbb{E}\vert\xi_{i-1}\xi_{i}+\xi_{i}\xi_{i+1}+\xi_{i+1}\xi_{i+2}-\xi_{i-1}-\xi_{i+2}\vert\notag\\[3pt] &=p(2-p).\end{align*}

à propos of (3.12), we make use of Theorem 2.1. To this end, let $A_i=\{i-1,i,i+1\}$ , $B_i=\{i-2,i-1,i,i+1,i+2\}$ , and $\mathcal{F}_i=\sigma\{\xi_j\colon i-2\le j\le i+3\}$ . Then Lemma 5.1 of [Reference Barbour and Xia7] with $\alpha_j=0$ or 1 for $j=i-2,\ldots,i+5$ gives

\begin{equation*} \theta_i \le{d_{\text{TV}}} (W,W+1\mid \mathcal{F}_i)\le\frac{2.3}{\sqrt{(n-8)p^2(1-p)^3}}.\end{equation*}

On the other hand, $\mathbb{E}(Z_i^{\prime})=5p^2$ , $|\mathbb{E}((X_i-\mu_i)Z_i)|\le \mathbb{E}(Z_i)=3p^2$ ,

\begin{equation*}\mathbb{E}[|X_i-\mu_i|Z_i(Z_i^{\prime}-Z_i/2-1/2)]\le \mathbb{E}[Z_i(Z_i^{\prime}-Z_i/2-1/2)]=4p^3+5p^4,\end{equation*}

and $|\lambda-\sigma^2|\lambda^{-1}\le 1\wedge (\lambda^{-1})$ , $a=\lfloor{np^3(3p-2)}\le0\rfloor$ , $\lambda\ge \sigma^2$ , and hence $\mathbb{P}(W-a<-1)=0$ and (3.12) follows from Theorem 2.1 by collecting these terms.

4. The proofs of the main results

The celebrated Stein–Chen method [Reference Chen15] is based on the observation that a non-negative random variable $Y\sim\text{Pn}(\lambda)$ if and only if $\mathbb{E}[\lambda f(Y+1)-Yf(Y)]=0$ for all bounded functions $f\colon \mathbb{Z}_+\coloneqq \{0,1,2,\dots\}\rightarrow\mathbb{R}$ , leading to a Stein identity for Poisson approximation as

(4.1) \begin{equation} \lambda f(\,j+1)-jf(\,j)=h(\,j)-\text{Pn}(\lambda)\{h\},\quad j\ge 0,\end{equation}

where $\text{Pn}(\lambda)\{h\}\coloneqq \mathbb{E} h(Y)$ . Since f(0) plays no role in Stein’s equation, we set $f(0)=f(1)$ and $f(\,j)=0$ for $j<0$ . The following lemma plays a key role in the proofs of the main results, and it enables us to circumvent checking the moment condition (1.1), which seems to be inevitable in the existing procedure for proving moderate deviation theorems.

Lemma 4.1. For fixed $k\in\mathbb{Z}_+$ , let $h=\textbf{ 1}_{[k,\infty)}$ . With $\pi_\cdot=\text{Pn}(\lambda)(\{\cdot\})$ , $\Delta f(i)=f(i+1)-f(i)$ , and $\Delta^2f=\Delta(\Delta f)$ , the solution $f\coloneqq f_h$ of the Stein equation (4.1) has the following properties:

  1. (i) $\|\,f\|\coloneqq \sup_{i\in\mathbb{Z}_+}\vert\,f(i)\vert={C}_{0}(\lambda,k)\text{Pn}(\lambda)\{h\}$ , where

    \begin{equation*} {C}_{0}(\lambda,k)\coloneqq \frac{F(k-1)}{k\pi_k}{,} \end{equation*}
  2. (ii) $\Delta f(i)$ is negative and decreasing in $i\le k-1$ and positive and decreasing in $i\ge k$ ,

  3. (iii)

    \begin{equation*} \|\Delta f\|_{k-}\coloneqq \sup_{i\le k-1}\vert\Delta f(i)\vert ={C}_{1-}(\lambda,k)\text{Pn}(\lambda)\{h\}\end{equation*}
    and
    \begin{equation*} \|\Delta f\|_{k+}\coloneqq \sup_{i\ge k}\vert\Delta f(i)\vert= {C}_{1+}(\lambda,k)\text{Pn}(\lambda)\{h\}, \end{equation*}
    where
    \begin{equation*}{C}_{1-}(\lambda,k)\coloneqq \frac{F(k-1)}{k\pi_k}\biggl(1-\frac{F(k-2)}{F(k-1)}\cdot\frac{\lambda}{k-1}\biggr)\end{equation*}
    and
    \begin{equation*}{C}_{1+}(\lambda,k)\coloneqq \frac{F(k-1)}{k\pi_k}\biggl(1-\frac{\overline{F}(k+1)}{\overline{F}(k)}\cdot\frac{k}{\lambda}\biggr){,}\end{equation*}
  4. (iv)

    \begin{equation*} \|\Delta f\|\coloneqq \sup_{i\in\mathbb{Z}_+}\vert\Delta f(i)\vert={C}_1(\lambda,k)\text{Pn}(\lambda)\{h\}\end{equation*}
    and
    \begin{equation*} \|\Delta^2 f\|\coloneqq \sup_{i\in\mathbb{Z}_+}\vert\Delta^2 f(i)\vert={C}_2(\lambda,k) \text{Pn}(\lambda)\{h\}, \end{equation*}

where ${C}_1$ and ${C}_2$ are defined in (2.2) and (2.3).

For $k>\lambda$ , death rates are bigger than the birth rate, so it seems intuitively obvious that $\tau_{k}^-$ is stochastically less than or equal to $\tau_{k-2}^+$ for such k. In view of representation (4.11) and $f(k)<0$ as shown in (4.6), this is equivalent to ${C}_{1-}(\lambda,k)> {C}_{1+}(\lambda,k)$ , leading to the following conjecture.

Conjecture 4.1. We conjecture that ${C}_{1-}(\lambda,k)> {C}_{1+}(\lambda,k)$ for all $k>\lambda$ , and the gap increases exponentially as a function of $k-\lambda$ .

Proof of Lemma 4.1. We build our argument on the birth–death process representation of the solution

(4.2) \begin{equation}f(i)=-\int_0^{\infty}\mathbb{E} [h(Z_i(t))-h(Z_{i-1}(t))] \,{\text{d}} t,\quad\text{for $i\ge 1$,}\end{equation}

where $Z_n(t)$ is a birth–death process with birth rate $\lambda$ , unit per capita death rate, and initial state $Z_n(0)=n$ [Reference Barbour4, Reference Barbour and Brown5, Reference Brown and Xia12]. For convenience we adopt the notation in [Reference Brown and Xia12]: for $i,j\in\mathbb{Z}_+$ , define

\begin{equation*} \tau_{ij}=\inf\{t\colon Z_i(t)=j\},\quad \tau^+_j=\tau_{j,j+1},\quad \tau^-_j=\tau_{j,j-1}{,}\end{equation*}

and

\begin{equation*} \overline{\tau^+_j}=\mathbb{E}(\tau^+_j){,} \quad \overline{\tau^-_j}=\mathbb{E}(\tau^-_j){,} \quad \pi_i=\text{Pn}(\lambda)(\{i\}).\end{equation*}

Applying Lemmas 2.1 and 2.2 of [Reference Brown and Xia12] with birth rate $\lambda$ , death rate $\beta_i=i$ , $A\coloneqq [k,\infty)$ , and $\pi(\!\cdot\!)=\sum_{l\in \cdot}\pi_l$ , we have

(4.3) \begin{equation} f(i)=\overline{\tau^-_i}\pi(A\cap[0,i-1])-\overline{\tau^+_{i-1}}\pi(A\cap[i,\infty)),\quad i\ge 1 {,} \end{equation}

and for $j\in\mathbb{Z}_+$ ,

(4.4) \begin{equation} \overline{\tau^+_j}=\frac{F(\,j)}{\lambda\pi_j},\quad \overline{\tau^-_j}=\frac{\overline{F}(\,j)}{j\pi_j},\end{equation}

where, as in Theorem 2.1,

(4.5) \begin{equation} F(\,j)=\sum_{i=0}^j\pi_i{,}\quad \overline{F}(\,j)=\sum_{i=j}^{\infty}\pi_i.\end{equation}

One can easily simplify (4.3) to get

(4.6) \begin{equation} f(i)=\begin{cases}-\overline{\tau^+_{i-1}}\pi(A)&\text{for $i\le k$,}\\[6pt] -\overline{\tau^-_i} F(k-1)&\text{for $i>k$,}\end{cases}\end{equation}

which, together with (4.4) and the balance equations

(4.7) \begin{equation}\lambda \pi_i=(i+1)\pi_{i+1}\quad \text{for all }i\in\mathbb{Z}_+,\end{equation}

implies

(4.8) \begin{equation} \Delta f(i)=\begin{cases}-\pi(A)\biggl(\frac{F(i)}{\lambda\pi_i}-\frac{F(i-1)}{\lambda\pi_{i-1}}\biggr) &\text{for $i\le k-1$,}\\[15pt] -(1-\pi(A))\biggl(\frac{\overline{F}(i+1)}{\lambda\pi_i}-\frac{\overline{F}(i)}{\lambda\pi_{i-1}}\biggr) &\text{for $i\ge k$.}\end{cases}\end{equation}

It follows from [Reference Brown and Xia12, Lemma 2.4] that for $i\ge 1$ ,

\begin{equation*} \frac{F(i)}{F(i-1)}\ge\frac{\lambda}{i}\ge\frac{\overline{F}(i+1)}{\overline{F}(i)},\end{equation*}

which, together with (4.7), ensures

(4.9) \begin{align} \Delta f(i) \le 0\quad \text{for } i \le k-1,\end{align}
(4.10) \begin{align}\hspace*{-17pt}\Delta f(i) \ge 0 \quad \text{for}\ i \ge k.\end{align}

Hence $f(k)\le f(i)\le 0$ , and combining (4.4), (4.5), and (4.6) gives

\begin{equation*} \|\,f\|=\vert\,f(k)\vert=\frac{F(k-1)}{k\pi_k}\pi(A), \end{equation*}

as claimed in (i).

à propos of (ii), because of (4.9) and (4.10), it remains to show that $\Delta f$ is decreasing in the two ranges. To this end, we will mainly rely on the properties of the solution (4.2). Let T be an exponential random variable with mean 1 and independent of birth–death process $Z_{i-1}$ . Then $Z_i$ can be represented as

\begin{equation*} Z_i(t)=Z_{i-1}(t)+\textbf{ 1}_{\{T>t\}},\end{equation*}

and hence we obtain from (4.2) and the strong Markov property in the second-to-last equality that

\begin{align*} f(i)&=-\int_0^{\infty}\mathbb{E} [\textbf{ 1}_{\{Z_{i-1}(t)+\textbf{ 1}_{\{T>t\}}\ge k\}}-\textbf{ 1}_{\{Z_{i-1}(t)\ge k\}}] \,{\text{d}} t\notag\\[3pt] &=-\mathbb{E}\int_0^{\infty}{\text{e}}^{-t}\textbf{ 1}_{\{Z_{i-1}(t)=k-1\}} \,{\text{d}} t\notag\\[5pt] &=-\mathbb{E}\biggl\{\int_{\tau_{i-1,k-1}}^{\infty}{\text{e}}^{-t}\textbf{ 1}_{\{Z_{i-1}(t)=k-1\}} \,{\text{d}} t\biggr\}\notag\\[3pt] &=-\mathbb{E}\{{\text{e}}^{-\tau_{i-1,k-1}}\}\mathbb{E}\int_0^{\infty}{\text{e}}^{-t}\textbf{ 1}_{\{Z_{k-1}(t)=k-1\}} \,{\text{d}} t\notag\\[3pt] &=\mathbb{E} {\text{e}}^{-\tau_{i-1,k-1}}f(k).\end{align*}

This enables us to give another representation of (4.8) as

(4.11) \begin{equation} \Delta f(i)=f(k)(\mathbb{E} {\text{e}}^{-\tau_{i,k-1}}-\mathbb{E} {\text{e}}^{-\tau_{i-1,k-1}}),\end{equation}

and so

\begin{equation*} \Delta^2f(i)=f(k)(\mathbb{E} {\text{e}}^{-\tau_{i+1,k-1}}-2\mathbb{E} {\text{e}}^{-\tau_{i,k-1}}+\mathbb{E} {\text{e}}^{-\tau_{i-1,k-1}}). \end{equation*}

For $i\ge k$ , using the strong Markov property again in the equalities below, we have

\begin{align*} &\mathbb{E} ({\text{e}}^{-\tau_{i+1,k-1}}-2{\text{e}}^{-\tau_{i,k-1}}+{\text{e}}^{-\tau_{i-1,k-1}})\notag\\[4pt] &\quad =\mathbb{E} {\text{e}}^{-\tau_{i-1,k-1}}(\mathbb{E} {\text{e}}^{-\tau_{i+1,i-1}}-2\mathbb{E} {\text{e}}^{-\tau_{i,i-1}}+1)\notag\\[4pt] &\quad =\mathbb{E} {\text{e}}^{-\tau_{i-1,k-1}}(\mathbb{E} {\text{e}}^{-\tau_{i+1,i}}\mathbb{E} {\text{e}}^{-\tau_{i,i-1}}-2\mathbb{E} {\text{e}}^{-\tau_{i,i-1}}+1)\notag\\[4pt] &\quad \ge \mathbb{E} {\text{e}}^{-\tau_{i-1,k-1}}(\mathbb{E} {\text{e}}^{-\tau_{i,i-1}}-1)^2\ge0,\end{align*}

where the inequality follows from

\begin{align*} \tau_{i,i-1} & = \inf\{t\colon Z_{i}(t)=i-1\}\\[4pt] & =\inf\{t\colon Z_{i}(t)+\textbf{ 1}_{\{T>t\}}=i-1+\textbf{ 1}_{\{T>t\}}\}\\[4pt] & \ge\inf\{t\colon Z_{i+1}(t)=i\}=\tau_{i+1,i}.\end{align*}

Similarly, for $i\le k-2$ , $\tau_{i-1,i}$ is stochastically less than or equal to $\tau_{i,i+1}$ , so

\begin{align*} &\mathbb{E} ({\text{e}}^{-\tau_{i+1,k-1}}-2{\text{e}}^{-\tau_{i,k-1}}+{\text{e}}^{-\tau_{i-1,k-1}})\notag\\[4pt] &\quad =\mathbb{E} {\text{e}}^{-\tau_{i+1,k-1}}(\mathbb{E} {\text{e}}^{-\tau_{i-1,i+1}}-2\mathbb{E} {\text{e}}^{-\tau_{i,i+1}}+1)\notag\\[4pt] &\quad \ge\mathbb{E} {\text{e}}^{-\tau_{i+1,k-1}}(\mathbb{E} {\text{e}}^{-\tau_{i,i+1}}-1)^2\ge 0.\end{align*}

Hence $\Delta^2 f(i)\le 0$ for $i\ge k$ and $i\le k-2$ , which concludes the proof of (ii).

In terms of (iii), we use (ii) to obtain

\begin{align*} \|\Delta f\|_{k-}&=\vert\Delta f(k-1)\vert\\[4pt] &=f(k-1)-f(k)\notag\\[4pt] &=\pi(A)\frac{1}{\lambda}\biggl(\frac{F(k-1)}{\pi_{k-1}}-\frac{F(k-2)}{\pi_{k-2}}\biggr)\notag\\[4pt] &={\pi(A)\frac{F(k-1)}{k\pi_k}\biggl(1-\frac{F(k-2)}{F(k-1)}\cdot\frac{\lambda}{k-1}\biggr)}.\end{align*}

Likewise,

\begin{align*} \|\Delta f\|_{k+}&=\vert\Delta f(k)\vert\\[4pt] &=f(k+1)-f(k)\notag\\[4pt] &=\frac{F(k-1)}{\lambda\pi_{k-1}}\pi(A)-\frac{\overline{F}(k+1)}{\lambda\pi_k}F(k-1)\notag\\[4pt] &={\pi(A)\frac{F(k-1)}{k\pi_k}\biggl(1-\frac{\overline{F}(k+1)}{\overline{F}(k)}\cdot\frac{k}{\lambda}\biggr)}.\end{align*}

Since (iv) is clearly an immediate consequence of (iii), (2.2), and (2.3), the proof of Lemma 4.1 is complete.

Proof of Theorem 2.1. As in the proof of Lemma 4.1, we set $A=[k,\infty)$ and $h=\textbf{ 1}_A$ . Then

\begin{equation*} \mathbb{P}(W-a\ge k)-\mathbb{P}(Y\ge k)=\mathbb{E} h(W-a)-\text{Pn}(\lambda)\{h\}.\end{equation*}

Define

\begin{align*}e_1\coloneqq &\mathbb{E}(h(W-a)-\text{Pn}(\lambda)\{h\})\textbf{ 1}_{\{W-a<0\}}-\lambda f(0)\mathbb{P}(W-a=-1),\\[5pt] e_2\coloneqq &\mathbb{E}(\lambda f(W-a+1)-(W-a)f(W-a)).\end{align*}

Then it follows from (4.1) that

(4.12) \begin{equation} \mathbb{P}(W-a\ge k)-\mathbb{P}(Y\ge k)=e_1+e_2.\end{equation}

For the estimate of $e_1$ , from $f(0)=f(1)$ , we know that $\lambda f(0)=-\text{Pn}(\lambda)\{h\}$ , and thus

\begin{equation*} e_1=-\mathbb{P}(W-a<-1)\text{Pn}(\lambda)\{h\},\end{equation*}

which gives

(4.13) \begin{equation} \vert e_1\vert=\pi(A)\mathbb{P}(W-a<-1).\end{equation}

For the estimate of $e_2$ , denoting ${\tilde f}(\,j)\coloneqq f(\,j-a)$ , we have

(4.14) \begin{equation} e_2=\mathbb{E}\{\lambda\Delta{\tilde f}(W)-(W-\mu){\tilde f}(W)\}.\end{equation}

Using Lemma 4.1(ii), we have that $\Delta^2{\tilde f}(m)$ is negative for all m except $m=a+k-1$ , which implies

\begin{equation*}-\sum_{m\ne k-1}\Delta^2 f(m)\le \Delta^2 f(k-1)=\|\Delta^2 f\|\end{equation*}

and

\begin{align*} \mathbb{E} [\Delta^2{\tilde f}(W_i^{\prime}+l)\mid \mathcal{F}_i] & \le \Delta^2 f(k-1) \mathbb{P} [W_i^{\prime}=k-1+a-l\mid \mathcal{F}_i]\le \|\Delta^2 f\|\theta_i ,\\[6pt] \mathbb{E} [\Delta^2{\tilde f}(W_i^{\prime}+l)\mid \mathcal{F}_i] & \ge \sum_{m\ne k-1}\Delta^2 f(m) \mathbb{P} [W_i^{\prime}=m+a-l\mid \mathcal{F}_i ]\ge -\theta_i \|\Delta^2 f\|,\end{align*}

and hence

(4.15) \begin{equation}|\mathbb{E} [\Delta^2{\tilde f}(W_i^{\prime}+l)\mid \mathcal{F}_i ]|\le \|\Delta^2 f\|\theta_i.\end{equation}

By taking

\begin{equation*} \theta\coloneqq \lambda-\sigma^2,\end{equation*}

we have from (4.14) that

(4.16) \begin{align} e_2&=\theta\mathbb{E}\Delta{\tilde f}(W)+\mathbb{E}\{\sigma^2\Delta{\tilde f}(W)-(W-\mu){\tilde f}(W)\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\mathbb{E}\biggl\{\sigma^2\Delta{\tilde f}(W)-\sum_{i\in\mathcal{I}}(X_i-\mu_i){\tilde f}(W)\biggr\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\sigma^2\mathbb{E}\Delta{\tilde f}(W)-\sum_{i\in\mathcal{I}}\mathbb{E}\{(X_i-\mu_i) ({\tilde f}(W)-{\tilde f}(W_i))\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\sigma^2\mathbb{E}\Delta{\tilde f}(W)-\sum_{i\in\mathcal{I}}\mathbb{E}\biggl\{(X_i-\mu_i)\biggl(\sum_{j=0}^{Z_i-1}\Delta{\tilde f}(W_i+j)\biggr)\biggr\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\sigma^2\mathbb{E}\Delta{\tilde f}(W)-\sum_{i\in\mathcal{I}}\mathbb{E} [(X_i-\mu_i)Z_i ]\mathbb{E}\Delta{\tilde f}(W_i^{\prime})\notag\\[3pt] &\quad\, -\sum_{i\in\mathcal{I}}\mathbb{E}\biggl\{(X_i-\mu_i)\sum_{j=0}^{Z_i-1} [\Delta{\tilde f}(W_i+j)-\Delta{\tilde f}(W_i^{\prime}) ]\biggr\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\sum_{i\in\mathcal{I}}\mathbb{E} [(X_i-\mu_i)Z_i ]\mathbb{E} [\Delta{\tilde f}(W)-\Delta{\tilde f}(W_i^{\prime})]\notag\\[3pt] &\quad\, -\sum_{i\in\mathcal{I}}\mathbb{E}\biggl\{(X_i-\mu_i)\sum_{j=0}^{Z_i-1} [\Delta{\tilde f}(W_i+j)-\Delta{\tilde f}(W_i^{\prime}) ]\biggr\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\sum_{i\in\mathcal{I}}\mathbb{E} [(X_i-\mu_i)Z_i ]\mathbb{E}\Biggl[\sum_{j=0}^{Z_i^{\prime}-1}\Delta^2{\tilde f}(W_i^{\prime}+j)\Biggr]\notag\\[3pt] &\quad\, -\sum_{i\in\mathcal{I}}\mathbb{E}\biggl\{(X_i-\mu_i)\sum_{j=0}^{Z_i-1}\sum_{l=0}^{Z_i^{\prime}-Z_i+j-1}\Delta^2{\tilde f}(W_i^{\prime}+l)\biggr\}\notag\\[3pt] &=\theta\mathbb{E}\Delta{\tilde f}(W)+\sum_{i\in\mathcal{I}}\mathbb{E} [(X_i-\mu_i)Z_i ]\mathbb{E}\Biggl[\sum_{j=0}^{Z_i^{\prime}-1}\mathbb{E}(\Delta^2{\tilde f}(W_i^{\prime}+j)\mid \mathcal{F}_i )\Biggr]\notag\\[3pt] &\quad\, -\sum_{i\in\mathcal{I}}\mathbb{E}\biggl\{(X_i-\mu_i)\sum_{j=0}^{Z_i-1}\sum_{l=0}^{Z_i^{\prime}-Z_i+j-1}\mathbb{E} (\Delta^2{\tilde f}(W_i^{\prime}+l)\mid \mathcal{F}_i )\biggr\},\end{align}

where the third-to-last equality is because $\sum_{i\in\mathcal{I}}\mathbb{E}[(X_i-\mu_i)Z_i]=\sigma^2$ and $(X_i,Z_i)$ is independent of $W_i^{\prime}$ , and the last equality is due to the assumption that $\{X_j\colon j\in B_i\}$ is $\mathcal{F}_i$ -measurable. Using (4.15) in (4.16), we obtain

(4.17) \begin{align}&|e_2|\le \|\Delta f\||\theta| +\|\Delta^2 f\|\sum_{i\in\mathcal{I}}\theta_i\{{|\mathbb{E}(X_i-\mu_i)Z_i|}\mathbb{E} (Z_i^{\prime})+\mathbb{E} [|X_i-\mu_i|Z_i(Z_i^{\prime}-Z_i/2-1/2)]\}.\end{align}

Now, combining Lemma 4.1(iii, iv), (4.12), (4.13), and (4.17) gives (2.1).

Proof of Corollary 2.1. Under the setting of the local dependence, the claim follows from Theorem 2.1 by taking $Z_i=Z_i^{\prime}=X_i$ .

Proof of Theorem 2.2. Recall the Stein representation (4.12) and the estimate (4.13). It remains to tackle (4.14). However,

\begin{align*} e_2&=\mathbb{E} (\lambda {\tilde f}(W+1)-\mu {\tilde f}(W^s)+a{\tilde f}(W) )\notag\\[4pt] &=\mu\mathbb{E}({\tilde f}(W+1)-{\tilde f}(W^s))+(\lambda-\mu)\mathbb{E}\Delta {\tilde f}(W),\end{align*}

and thus

(4.18) \begin{align}\vert e_2\vert&\le\|\Delta f\|(\mu\mathbb{E}\vert W+1-W^s\vert+\vert\lambda-\mu\vert)\notag\\[4pt] &\le\text{Pn}(\lambda)\{h\} [{C}_1(\lambda,k)(\mu\mathbb{E}\vert W+1-W^s\vert+\vert\lambda-\mu\vert)]. \end{align}

Hence, combining (4.12), (4.13), and (4.18) completes the proof.

Proof of Theorem 2.3. Again we make use of the Stein representation (4.12) and the estimate (4.13) so that it suffices to deal with (4.14). To this end, we have

\begin{align*} e_2&=\mathbb{E} (\lambda \Delta{\tilde f}(W)-(W-\mu) {\tilde f}(W) )\notag\\[4pt] &=\mathbb{E} (\lambda \Delta{\tilde f}(W)-\sigma^2 \Delta{\tilde f}(W^\star) )\notag\\[4pt] &=\mathbb{E} ((\lambda-\sigma^2) \Delta{\tilde f}(W)+\sigma^2 (\Delta{\tilde f}(W)-\Delta{\tilde f}(W^\star)) ). \end{align*}

However, with $R=W^\star-W$ ,

\begin{align*} &\mathbb{E} [\Delta{\tilde f}(W)-\Delta{\tilde f}(W^\star)]\notag\\[4pt] &\quad =-\mathbb{E}\biggl\{\sum_{j=0}^{R-1}\mathbb{E} (\Delta^2{\tilde f}(W+j))\textbf{ 1}_{R>0}-\sum_{j=1}^{-R}\mathbb{E} (\Delta^2{\tilde f}(W-j))\textbf{ 1}_{R<0}\biggr\}\notag\\[4pt] &\quad =-\mathbb{E}\biggl\{\sum_{j=0}^{R-1}\mathbb{E} ( \Delta^2{\tilde f}(W+j)\mid R )\textbf{ 1}_{R>0}-\sum_{j=1}^{-R}\mathbb{E} (\Delta^2{\tilde f}(W-j)\mid R )\textbf{ 1}_{R<0}\biggr\},\end{align*}

and a similar argument for (4.15) ensures

\begin{equation*}|\mathbb{E} (\Delta^2{\tilde f}(W+j)\mid R )|\le \|\Delta^2f\|\theta_R,\end{equation*}

and hence

(4.19) \begin{align} \vert e_2\vert&\le|\lambda-\sigma^2|\|\Delta f\|+\sigma^2\|\Delta^2f\|\mathbb{E}[|R|\theta_R].\end{align}

The claim follows from combining (4.12), (4.13), and (4.19) and using Lemma 4.1(iii, iv).

Proof of Proposition 3.2. The first inequality of (3.7) is a direct consequence of [Reference Hoeffding27]. For the second inequality, let $h=\textbf{ 1}_{[k,\infty)}$ and f be the solution of the Stein identity (4.1) with $\lambda=\mu$ , setting $W_i=W-X_i$ , $Y\sim\text{Pn}(\mu)$ , the following argument is standard (see [Reference Barbour, Holst and Janson9, page 6]) and we repeat it for ease of reading:

(4.20) \begin{align}&\mathbb{P}(W\ge k)-\mathbb{P}(Y\ge k)\notag\\[1pt] &\quad=\mathbb{E}\{\mu f(W+1)-Wf(W)\}\notag\\[1pt] &\quad=\mu \mathbb{E} f(W+1)-\sum_{i=1}^n\mathbb{E} \{X_if(W)\}\notag\\[1pt] &\quad=\mu \mathbb{E} f(W+1)-\sum_{i=1}^np_i\mathbb{E} \{f(W_i+1)\}\notag\\ &\quad=\sum_{i=1}^np_i^2\mathbb{E}\Delta f(W_i+1).\end{align}

For any non-negative integer-valued random variable U such that the following expectations exist, summation by parts gives

\begin{equation*}\mathbb{E} g(U+1)=\sum_{j=1}^\infty \Delta g(\,j)\mathbb{P}(U\ge j)+g(1).\end{equation*}

On the other hand, Proposition 2.1 of [Reference Barbour, Chen and Choi8] ensures that

\begin{equation*}\frac{\mathbb{P}(W_i\ge j)}{\mathbb{P}(Y\ge j)}\le \frac{\mathbb{P}(W\ge j)}{\mathbb{P}(Y\ge j)}\le \sup_{r\ge 0}\frac{\mathbb{P}(W=r)}{\mathbb{P}(Y=r)}\le M,\end{equation*}

so using Lemma 4.1(ii), we have

(4.21) \begin{align} &\mathbb{E}\Delta f(W_i+1)\notag\\[2pt] &\quad = \sum_{j=1}^\infty \Delta^2f(\,j)\mathbb{P}(W_i\ge j)+\Delta f(1)\notag\\[2pt] &\quad \ge M\sum_{j\ge 1, j\ne k-1} \Delta^2f(\,j)\mathbb{P}(Y\ge j)+\Delta f(1)\notag\\[2pt] &\quad = M\biggl\{\sum_{j=1}^\infty \Delta^2f(\,j)\mathbb{P}(Y\ge j)+\Delta f(1)\biggr\} -M\Delta^2f(k-1)\mathbb{P}(Y\ge k-1)+(1-M)\Delta f(1)\notag\\[2pt] &\quad = M\mathbb{E}\Delta f(Y+1)-M\Delta^2f(k-1)\mathbb{P}(Y\ge k-1)+(1-M)\Delta f(1)\notag\\[2pt] &\quad > M\mathbb{E}\Delta f(Y+1)-M\Delta^2f(k-1)\mathbb{P}(Y\ge k-1).\end{align}

However, by (4.2), since $\text{Pn}(\mu)$ is the stationary distribution of $Z_i$ , $Z_{Y}(t)\sim \text{Pn}(\mu)$ , leading to

(4.22) \begin{align}&\mathbb{E}\Delta f(Y+1)\notag\\[2pt] &\quad = -\int_0^\infty \mathbb{E}[h(Z_{Y+2}(t))-2h(Z_{Y+1}(t))+h(Z_{Y}(t))] \,{\text{d}} t\notag\\[2pt] &\quad = -\int_0^\infty \mathbb{E}[h(Y+\textbf{ 1}_{\{T_1>t\}}+\textbf{ 1}_{\{T_2>t\}})-h(Y+\textbf{ 1}_{\{T_1>t\}})-h(Y+\textbf{1}_{\{T_2>t\}})+h(Y)] \,{\text{d}} t\notag\\[2pt] &\quad = -\int_0^\infty {\text{e}}^{-2t}\mathbb{E}[\Delta^2h(Y)] \,{\text{d}} t=-\frac12 (\pi_{k-2}-\pi_{k-1}),\end{align}

where $T_1,T_2$ are i.i.d. $\exp(1)$ random variables independent of Y. Combining (4.20), (4.21), and (4.22), we have

(4.23) \begin{equation}\frac{\mathbb{P}(W\ge k)}{\mathbb{P}(Y\ge k)}-1 > -\frac12M\mu_2\frac{\pi_{k-2}-\pi_{k-1}}{\mathbb{P}(Y\ge k)}-M\mu_2\Delta^2f(k-1)\frac{\mathbb{P}(Y\ge k-1)}{\mathbb{P}(Y\ge k)}.\end{equation}

For the first term of (4.23), using [Reference Barbour, Holst and Janson9, Proposition A.2.1 (ii)], we obtain

(4.24) \begin{align}\frac{\pi_{k-2}-\pi_{k-1}}{\mathbb{P}(Y\ge k)}&=\frac k {\mu}\cdot \frac{\pi_k}{\mathbb{P}(Y\ge k)}\cdot\frac{k-1-\mu}{\mu}\notag\\[3pt] &\le \frac {4(k-\mu)} {\mu}\cdot\frac{k-1-\mu}{\mu} \notag \\[3pt] &\le 4x^2/\mu.\end{align}

For the second term of (4.23), we use the crude estimate of

\begin{equation*}\Delta^2 f(k-1)\le 2 \|\Delta f\|\le 2(1-{\text{e}}^{-\mu})/\mu\end{equation*}

(see [Reference Barbour, Holst and Janson9, Lemma 1.1.1] or Remark 2.1), so applying [Reference Barbour, Holst and Janson9, Proposition A.2.1 (ii)] again,

(4.25) \begin{align} &\Delta^2 f(k-1)\frac{\mathbb{P}(Y\ge k-1)}{\mathbb{P}(Y\ge k)}\notag\\[5pt] &\quad \le \frac{2(1-{\text{e}}^{-\mu})}\mu\biggl(1+\frac{\pi_k}{\mathbb{P}(Y\ge k)}\cdot \frac k\mu\biggr)\notag\\[5pt] &\quad \le \frac{2(1-{\text{e}}^{-\mu})}\mu\biggl(1+\frac{4(k-\mu)}{\mu}\biggr)\notag \\[5pt] &\quad \le \frac2{\mu}\biggl(1+4x\sqrt{\frac{1-{\text{e}}^{-\mu}}\mu}\biggr).\end{align}

The bound (3.7) follows by collecting (4.23), (4.24), and (4.25).

Acknowledgements

We thank the anonymous referees for suggesting the ‘naive bound’ in Remark 2.1 and for comments leading to the improved version of the paper. We also thank Serguei Novak for email discussions about the quality of the bounds presented in the paper versus the ‘naive bound’. This work was supported in part by the Chinese Scholarship Council and in part by Australian Research Council grant DP190100613.

References

Arratia, R. andGoldstein, L. (2010). Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent? Available at arXiv:1007.3910.Google Scholar
Arratia, R., Goldstein, L. andGordon, L. (1989). Two moments suffice for Poisson approximations: the Chen–Stein method. Ann. Prob. 17, 925.CrossRefGoogle Scholar
Arratia, R., Goldstein, L. andKochman, F. (2013). Size bias for one and all. Available at arXiv:1308.2729.Google Scholar
Barbour, A. D. (1988). Stein’s method and Poisson process convergence. J. Appl. Prob 25 (A), 175184.10.2307/3214155CrossRefGoogle Scholar
Barbour, A. D. andBrown, T. C. (1992). Stein’s method and point process approximation. Stoch. Proc. Appl. 43, 931.10.1016/0304-4149(92)90073-YCrossRefGoogle Scholar
Barbour, A. D. andEagleson, G. K. (1984). Poisson convergence for dissociated statistics. J. R. Statist. Soc. B [Statist. Methodology] 46, 397402.Google Scholar
Barbour, A. D. andXia, A. (1999). Poisson perturbations. ESAIM Prob. Statist. 3, 131150.CrossRefGoogle Scholar
Barbour, A. D., Chen, L. H. Y. andChoi, K. P. (1995). Poisson approximation for unbounded functions, I: Independent summands. Statist. Sinica 2, 749766.Google Scholar
Barbour, A. D., Holst, L. andJanson, S. (1992). Poisson Approximation. Oxford University Press.Google Scholar
Borovkov, K. A. (1988). Refinement of Poisson approximation. Theory Prob. Appl 33, 343347.CrossRefGoogle Scholar
Borovkov, K. andPfeifer, D. (1996). On improvements of the order of approximation in the Poisson limit theorem. J. Appl. Prob. 33, 146155.10.2307/3215272CrossRefGoogle Scholar
Brown, T. C. andXia, A. (2001). Stein’s method and birth–death processes. 1373–1403.Google Scholar
Čekanavičius, V. andVellaisamy, P. (2019). On large deviations for sums of discrete m-dependent random variables. Stochastics 91 (8), 10921108.CrossRefGoogle Scholar
Chatterjee, S., Diaconis, P. andMeckes, E. (2005). Exchangeable pairs and Poisson approximation. Prob. Surv. 2, 64106.CrossRefGoogle Scholar
Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Prob. 3, 534545.10.1214/aop/1176996359CrossRefGoogle Scholar
Chen, L. H. Y. andChoi, K. P. (1992). Some asymptotic and large deviation results in Poisson approximation. Ann. Prob. 20, 18671876.CrossRefGoogle Scholar
Chen, L. H. Y. andShao, Q.-M. (2004). Normal approximation under local dependence. Ann. Prob. 32, 19852028.CrossRefGoogle Scholar
Chen, L. H. Y., Fang, X. andShao, Q.-M. (2013). Moderate deviations in Poisson approximation: a first attempt. Statist. Sinica. 23, 15231540.Google Scholar
Chen, L. H. Y., Fang, X. andShao, Q.-M. (2013). From Stein identities to moderate deviations. Ann. Prob. 41, 262293.CrossRefGoogle Scholar
Chung, F. andLu, L. (2006). Complex Graphs and Networks (CBMS Regional Conference Series in Mathematics 107). American Mathematical Society.Google Scholar
Cochran, W. (1977). Sampling Techniques. John Wiley & Sons.Google Scholar
Deheuvels, P. andPfeifer, D. (1988). On a relationship between Uspensky’s theorem and Poisson approximations. Ann. Inst. Statist. Math. 40 (4), 671681.CrossRefGoogle Scholar
Dwass, M. (1960). Some k-sample rank order tests. In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, ed. I. Olkin, pp. 198202. Stanford University Press, CA.Google Scholar
Feller, W. (1968). An Introduction to Probability Theory and its Applications, vols 1 and 2, 3rd edn. John Wiley & Sons.Google Scholar
Gnedenko, B. (1943). Sur la distribution limite du terme maximum d’une série aléatoire. Ann. of Math. 44, 423453.CrossRefGoogle Scholar
Goldstein, L. andXia, A. (2006). Zero biasing and a discrete central limit theorem. Ann. Prob 34, 17821806.CrossRefGoogle Scholar
Hoeffding, W. (1956). On the distribution of the number of successes in independent trials. Ann. Math. Statist. 27, 713721.CrossRefGoogle Scholar
Mattner, L. andRoos, B. (2007). A shorter proof of Kanter’s Bessel function concentration bound. Prob. Theory Rel. Fields 139, 191205.10.1007/s00440-006-0043-0CrossRefGoogle Scholar
Petrov, V. V. (1975). Sums of Independent Random Variables. Springer.Google Scholar
Rényi, A. (1962). Théorie des éléments saillants d’une suite d’observations. Ann. Fac. Sci. Univ. Clermont-Ferrand No. 8, 713.Google Scholar
Röllin, A. (2005). Approximation of sums of conditionally independent variables by the translated Poisson distribution. Bernoulli 11, 11151128.CrossRefGoogle Scholar
Röllin, A. (2007). Translated Poisson approximation using exchangeable pair couplings. Ann. Appl. Prob. 17, 15961614.10.1214/105051607000000258CrossRefGoogle Scholar
Ross, N. (2011). Fundamentals of Stein’s method. Prob. Surv. 8, 210293.CrossRefGoogle Scholar
Tan, Y., Lu, Y. andXia, C. (2018). Relative error of scaled Poisson approximation via Stein’s method. Available at arXiv:1810.04300.Google Scholar
Figure 0

Figure 1: Pn($\lambda_n$).

Figure 1

Figure 2. $N(\lambda_n,\sigma_n^2)$ without correction.

Figure 2

Figure 3. $N(\lambda_n,\sigma_n^2)$ with correction.

Figure 3

Figure 4. $\text{Pn}(\sigma_n^2)$.

Figure 4

Figure 5. Performance of the bound.