1. Introduction and main results
Branching processes $(Z_n)_{n \ge 0}$ in a varying environment generalize the classical Galton–Watson processes, in that they allow time dependence of the offspring distribution. This natural setting promises relevant applications (e.g. to random walks on trees, as in [Reference Lyons18]) and has recently received a renewal of interest, see e.g. [Reference Bansaye and Simatos2, Reference Braunsteins and Hautphenne4, Reference González, Kersting, Minuesa and del Puerto13, Reference Sagitov and Jagers20]. Former research on branching processes in a varying environment was temporarily affected by the appearence of certain exotic properties, and one could get the impression that it is difficult to grasp some kind of generic behaviour of these processes. Even so, steps in this direction were taken by Peter Jagers [Reference Jagers15]; in particular, he aimed for a classification into supercritical, critical and subcritical regimes in the spirit of ordinary Galton–Watson processes. In this paper we take up this line of research. To this end we prove several theorems reaching from criteria for almost sure extinction up to Yaglom-type results. We require only mild regularity assumptions, and in particular we set no restrictions on the sequence of expectations $\mathrm E[Z_n]$, $n \ge 0$, thereby generalizing and unifying a number of individual results from the literature.
In order to define a branching process in a varying environment (BPVE), let the sequence $Y_1, Y_2, \ldots$ denote random variables with values in $\mathbb N_0$, and $f_1,f_2, \ldots$ their distributions. Let $Y_{ni}$, $n,i\in \mathbb N$, be independent random variables such that $Y_{ni}$ and $Y_n$ coincide in distribution for all $n,i\ge 1$. Define the random variables $Z_n$, $n \ge 0$, with values in $\mathbb N_0$ recursively as
Then the process $(Z_n)_{n \ge 0}$ is called a branching process in the varying environment $v=(\,f_1,f_2, \ldots\!)$with initial value $Z_0=1$. These processes may be considered as a model for the development of the size of a population where individuals reproduce independently with offspring distributions $f_n$ potentially changing among generations. Without further mention we always require that $0 \lt \mathrm E[Y_n] \lt \infty$ for all $n \ge 1$.
There is one non-trivial statement on BPVEs requiring no extra assumption. It says that $Z_n$ is almost surely (a.s.) convergent to a random variable $Z_\infty$ with values in $\mathbb N_0\cup \{\infty\}$. This result is due to Lindvall [Reference Lindvall17] and extends results of Church [Reference Church5] (for a comparatively short proof see [Reference Kersting and Vatutin14, Theorem 1.4]). It also clarifies under which conditions $(Z_n)_{n \ge 0}$ may ‘fall asleep’ at a positive state, meaning that the event $\{0 \lt Z_\infty \lt \infty\}$ occurs with positive probability. Let us call such a branching process asymptotically degenerate. Thus, for a BPVE it is no longer true that the process either gets extinct a.s. or else converges a.s. to infinity.
As mentioned above, a BPVE may exhibit extraordinary properties that do not show up for ordinary Galton–Watson processes. Thus a BPVE may possess different growth rates, as detected by MacPhee and Schuh [Reference MacPhee and Schuh19]. Here we establish a framework which excludes such exceptional phenomena and elucidates the generic behaviour. As we shall see, this is naturally done in an $\mathcal L^2$ setting.
Our main assumption is a uniformity requirement which reads as follows. There is a constant $c<\infty$ such that for all natural numbers $n \ge 1$ we have
This regularity assumption is notably mild. As we shall explain in the next section, it is fulfilled for distributions $f_n$, $n \ge 1$, belonging to any common class of probability measures, like Poisson, binomial, hypergeometric, geometric, linear fractional or negative binomial distributions, without any restriction on the parameters. It is also satisfied in the case that the random variables $Y_n$, $n \ge 1$, are a.s. uniformly bounded by a constant $c \lt \infty$. To see this, take into account that we have $\mathrm E[Y_{n} \mid Y_n \ge 1] \ge 1$. Since direct verification of (A) may be tedious in examples, we shall present in the next section a third moment condition which implies (A) and which can often be easily checked.
Let us call a BPVE regular if it fulfils condition (A).
Remark 1. (A property of consistency) Observe that together with a BPVE $(Z_n)_{n \ge 0}$, any subsequence $(Z_{n_i})_{i \ge 0}$ with $n_0\,:\!=0 \lt n_1 \lt n_2 <\ldots$ is a BPVE, too. We note that the condition (A) is then transmitted, i.e. any subsequence of a regular BPVE is regular, too. The proof will be given after Lemma 6 below.
Before presenting our results, let us agree on the following notational conventions. Let $\mathcal P$ be the set of all probability measures on $\mathbb N_0$. The weights of $f\in \mathcal P$ are named f[k], $k \in \mathbb N_0$. We set
Thus, we denote the probability measure f and its generating function by one and the same symbol. This facilitates presentation and will cause no confusion. Keep in mind that each operation applied to these measures has to be understood as an operation applied to their generating functions. Thus, $f_1f_2$ stands not only for the multiplication of the generating functions $f_1,f_2$ but also for the convolution of the respective measures. Also, $f_1 \circ f_2$ expresses the composition of generating functions as well as the resulting probability measure. We shall consider the mean and second factorial moment of a random variable Y with distribution f,
and its normalized second factorial moment and normalized variance
We shall discuss branching processes in a varying environment along the lines of ordinary Galton–Watson processes. For $n \ge 1$, let
and also $\mu_0\,:\!=1$. Thus, q is the probability of extinction and $\mu_n= \mathrm E[Z_n]$, $n \ge 0$. Note that for the standardized factorial moments $ \nu_n$ we have $ \nu_n \lt \infty$ under Assumption (A). This implies $\mathrm E[Z_n^2] \lt \infty$ for all $n \ge 0$ (see Lemma 4 below).
Assumption (A) is a mild requirement with substantial consequences, as seen from the following diverse necessary and sufficient criteria for almost sure extinction.
Theorem 1. Assume (A). Then the conditions
(i) $q=1$,
(ii) $\mathrm E[Z_n]^2 = o(\mathrm E[Z_n^2])$ as $n \to \infty$,
(iii) $ \sum\limits_{k=1}^\infty \frac {\rho_k}{\mu_{k-1}}= \infty$,
(iv) $ \sum\limits_{k=1}^\infty \frac {\nu_k}{\mu_{k-1}}= \infty$ or $ \mu_n \to 0$
are equivalent. Moreover, the conditions
(v) $q<1$,
(vi) $\mathrm E[Z_n^2] = O(\mathrm E[Z_n]^2)$ as $n \to \infty$,
(vii) $ \sum\limits_{k=1}^\infty \frac {\rho_k}{\mu_{k-1}} \lt \infty$,
(viii) $ \sum\limits_{k=1}^\infty \frac {\nu_k}{\mu_{k-1}} \lt \infty$and there exists $0 \lt r\le \infty$such that $\mu_n \to r$
are equivalent.
These conditions are useful in different ways. Condition (iii)/(vii) appears to be a particulary suitable criterion for almost sure extinction, whereas conditions (iv) and (viii) will prove helpful for the classification of BPVEs. Condition (vi) will allow us to determine the growth rate of $Z_{n}$ (see Theorem 2). Observe that (ii) can be rewritten as $\mathrm E[Z_n]=o(\sqrt{\mathrm{Var}[Z_n]})$. Briefly speaking, this means that under (A) we have almost sure extinction if and only if the noise dominates the average growth in the long run.
We point out that conditions (iii), (iv), (vii) and (viii) access not only the expectations $\mu_n$ but also the second moments. This is a novel aspect in comparsion to ordinary Galton–Watson processes and also to Agresti’s classical criterion on BPVEs [Reference Agresti1, Theorem 2]. Agresti’s result provides almost sure extinction if and only if $\sum_{k\ge1} 1/\mu_{k-1} = \infty$. He could do so by virtue of his stronger assumptions, which exclude, e.g., asymptotically degenerate processes. In our setting there is the possibility that we have both $\sum_{k \ge 1} \rho_k/\mu_{k-1}=\infty$ and $\sum_{k \ge 1} 1/\mu_{k-1}<\infty$, and also the other way round. This is shown by the following examples.
Example 1. Let $Y_n$ take just the values $n+2$ and 0, with $\mathrm P(Y_n= n+2)= n^{-1}$. Then $\mathrm E[Y_n(Y_n-1)]\sim n$, $\mathrm E[Y_n]= 1+ 2/n$, $\mathrm E[Y_n-1 \mid Y_n\ge 1]\sim n$, and thus (A) is fulfilled. Also, $\mu_n \sim n^2/2$ and $\rho_n\sim n$, hence $\sum_{k \ge 1} 1/\mu_{k-1}<\infty$ and $\sum_{k \ge 1} \rho_k/\mu_{k-1}=\infty$.
Example 2. Let $Y_n$ take just the values 0, 1 and 2, with $\mathrm P(Y_n=0)=\mathrm P(Y_n=2) = 1/(2n^2)$. Then $\mathrm E[Y_n(Y_n-1)] \sim n^{-2}$, $\mathrm E[Y_n]=1$ and $\mathrm E[Y_n-1\mid Y_n \ge 1]\sim 1/(2n^2)$, and thus (A) is fulfilled. Also, $\mu_n=1$ and $\rho_n \sim n^{-2}$, hence $\sum_{k \ge 1} 1/\mu_{k-1}=\infty$ and $\sum_{k \ge 1} \rho_k/\mu_{k-1}<\infty$.
The last example exhibits an asymptotically degenerate branching process, as seen from Corollary 1 below.
Next we turn to the normalized population sizes
Clearly, $(W_n)_{n \ge 0}$ constitutes a non-negative martingale, and thus there exists an integrable random variable $W\ge 0$ such that we have $W_n \to W$ a.s. as $n\to \infty$. With (A), the random variable W exhibits the dichotomy known for Galton–Watson processes.
Theorem 2. For a regular BPVE we have:
(i) If $q=1$then $W=0$a.s.
(ii) If $q\lt1$then $\mathrm E[W]=1$, $\mathrm E[W^2]<\infty$and $\mathrm P(W=0)=q$.
In particular, in the case of $q<1$ the martingale $(W_n)_{n\ge 0}$ is convergent in $\mathcal L^2$, implying
This formula goes back to Fearn [Reference Fearn10]. We point out that Assumption (A) excludes the possibility of $\mathrm P(W=0) \gt q$ and, in particular, of different rates of growth as in the examples constructed by MacPhee and Schuh [Reference MacPhee and Schuh19] (see also [Reference D’Souza6, Reference D’Souza and Biggins7]). By means of Theorem 2(ii) we also gain further insight into asymptotically degenerate processes. Under Assumption (A) they are just those processes which fulfil the properties $q<1$ and $0 \lt \lim_{n\to \infty} \mu_n \lt \infty$. Also, taking Theorem 1(v) and (viii) into account we obtain the following corollary.
Corollary 1. A regular BPVE is asymptotically degenerate if and only if both $\sum_{k=1}^\infty \nu_k \lt \infty$and the sequence $(\mu_n)_{n \ge 0}$has a positive, finite limit. Then $Z_\infty \lt \infty$a.s.
Now we address the behaviour of the random variables $Z_n$ conditioned on the events that $Z_n>0$. The next theorem shows that their values largely follow the corresponding conditional expectations $\mathrm E[Z_n \mid Z_n \gt 0]$. For $n \ge 0$ let
Theorem 3. For a regular BPVE, the sequence of random variables $Z_n/a_n$conditioned on $Z_n \gt 0$, $n \ge 0$, is tight, i.e. for any $\varepsilon \gt 0$there is a $u<\infty$such that, for all $n \ge 0$,
moreover, there exist numbers $\theta \gt 0$and $u \gt 0$such that, for all $n\ge 0$,
Also, we have
with some constant $\gamma \gt 0$, so that we may replace $a_n$ by $\mathrm E[Z_n\mid Z_n>0]$in (2) and in (3).
For $q<1$ we do not learn anything new from this theorem; here, Theorem 2(ii) gives much preciser information. Thus, let us focus on the case $q=1$, the situation of almost sure extinction. At first sight one might expect that the constant $\theta$ in (3) can be chosen arbitrarily close to 1, if only u gets sufficiently small. This will apply to many interesting cases, but it is not always true. The following example gives an illustration.
Example 3. For $n \ge 1$ let
It is easy to check that (A) is valid (as well as the conditions (B) and (C) below). We have $f^{\prime}_{2n-1}(1)= 2^{-n}$ and $f^{\prime}_{2n}(1)= 2^{n}$, hence
for all $n \ge 1$. In particular, we have $Z_{2n-1}\to 0$ in probability, which entails $q=1$. Also, $\nu_{2n-1}=0$ and $\nu_{2n} \sim 2$ as $n \to \infty$, implying
and
From Theorem 3 it follows that there is a $z \lt \infty$ such that
for all $n \ge 1$. Therefore,
for all $n \ge 1$, and for any $u>0$,
if $a_{2n} \ge z/u$. Since $a_{2n} \to \infty$, the constant $\theta$ from (3) cannot take a value above $1 - 2^{-z-1}$ in this example.
This example suggests that quite different scenarios may occur for BPVEs with $q=1$, and that their behaviour may abruptly change from one subsequence to the next. We point out that Assumption (A) does not put (e.g. for Poisson distributions) any restrictions on the expectation $\mu_n$, $n \ge 1$, allowing a variety of examples. Of special interest is the case that the numbers $a_n$ are uniformly bounded. Here, Theorem 3 reads as follows.
Corollary 2. Under Assumption (A) the conditions
(i) the sequence of random variables $Z_n$conditioned on the events that $Z_n \gt 0$, $n \ge 0$, is tight,
(ii) $\sup_{n\ge 0} \mathrm E[Z_n \mid Z_n \gt 0] \lt \infty$,
(iii) $ \sum\limits_{k=1}^n \frac{\nu_k}{\mu_{k-1}} = O \Big( \frac 1{\mu_n}\Big)$as $n \to \infty$,
are equivalent.
For an ordinary Galton–Watson process these three conditions apply just in the subcritical regime, and then the conditioned random variables $Z_n$ have a limiting distribution. It is easy to see that such a feature will not hold in general for a BPVE. Indeed, there are two offspring distributions $\skew2\hat{f}$ and $\skew2\tilde{f}$ such that the limiting distributions $\skew2\hat{g}$ and $\skew2\tilde{g}$ for the corresponding conditional Galton–Watson processes differ from each other. Choose an increasing sequence $0=n_0 \lt n_1 <n_2 \lt \cdots$ of natural numbers and consider the BPVE $(Z_n)_{n \ge 0}$ in the varying environment $v=(\,f_1,f_2, \ldots\!)$, where $f_n= \skew2\hat{f}$ for $n_{2k} \lt n \le n_{2k+1}$, $k \in \mathbb N_0$, and $f_n = \skew2\tilde{f}$ otherwise. Then it is obvious that $Z_{n_{2k+1}}$ given the event $Z_{n_{2k+1}}>0$ converges in distribution to $\skew2\hat{g}$, and $Z_{n_{2k}}$ given the event $Z_{n_{2k}}>0$ converges in distribution to $\skew2\tilde{g}$, provided that the sequence $(n_k)_{k \ge 0}$ is increasing sufficiently fast.
Thus, it may come as a surprise that in the opposite situation, $\mu_{n}^{-1}=o(\sum_{k=1}^{n} \frac{\nu_k}{\mu_{k-1}})$, we encounter a distinctive behaviour of the conditional-limit distributions of $Z_n$, which is in accordance with Yaglom’s theorem for ordinary Galton–Watson processes. For technical reasons we have to somewhat strengthen Assumption (A). We require that for every $\varepsilon \gt 0$ there is a constant $c_\varepsilon \lt \infty$ such that, for all natural numbers $n \ge 1$,
This condition is again widely satisfied, as we shall explain in the next section. It implies Assumption (A). Namely, for $\varepsilon=1/2$ we have
Since $1+ \mathrm E[Y_n] \le 2\mathrm E[Y_n \mid Y_n \ge 1]$, we obtain (A) with $c=4c_{1/2}$.
Theorem 4. Let (B) be satisfied and let $q=1$. Then the following conditions are equivalent:
(i) There is a sequence $b_n$, $n \ge 0$, of positive numbers such that $Z_n/b_n$conditioned on the event $Z_n>0$converges in distribution to a standard exponential distribution as $n \to \infty$;
(ii) $\mathrm E[Z_n \mid Z_n \gt 0] \to \infty$as $n \to \infty$;
(iii) $\frac1{\mu_n} = o \bigg( \sum\limits_{k=1}^n \frac {\nu_k}{\mu_{k-1}} \bigg)$as $n \to \infty$.
Under these conditions we may set $b_n\,:\!= \mathrm E[Z_n \mid Z_n \gt 0]$, and we have
or equivalently
as $n \to \infty$.
This theorem covers the classical results of Kolmogorov and Yaglom for critical Galton–Watson processes in the finite variance case (without further moment restrictions), since then (B) is trivially satisfied.
Our results show how to implement a classification of regular BPVEs which connects to the notions used for classical Galton–Watson processes. If $q<1$, then in view of Theorem 2 and Corollary 1 we distinguish two regimes. There is the supercritical regime in the case of $\mathrm E[Z_n]\to \infty$, and the asymptotically degenerate regime otherwise. If, on the other hand, we have $q=1$, then Theorem 4 suggests characterizing the critical regime by the condition ${\mathrm E[Z_n \mid Z_n>0]\to \infty}$ (and not just by some condition on the limiting behaviour of $\mu_n$, as one might do in a first attempt), and to allocate the other BPVEs to the subcritical regime. In this way we differentiate the clear-cut limiting property of critical BPVEs from the indeterminacy of the remaining processes. In this classification a subcritical BPVE $(Z_n)_{n\ge 0}$ exhibits subcritical behaviour in the sense that according to Theorem 3 the random variables $Z_n$ conditioned on $Z_n>0$ are tight, at least along some subsequence in which the $a_n$ stay bounded. The $Z_n$ may diverge with positive probability along some other subsequence, yet this does not in general imply critical behaviour in the sense that along that subsequence the random variables $Z_{n}$, conditioned on $Z_{n}>0$ and suitably scaled, have asymptotically an exponential distribution. For a counterexample we refer to Example 3 and (5).
By means of Theorems 1 and 3 we may streamline the determining conditions of the four regimes, as summarized in the subsequent overview.
Proposition 1. A regular BPVE is
Note that convergence of the means $\mu_n$ is not enforced in the critical case; they may diverge, converge to zero or even oscillate in between.
Example 4. In the case $0 \lt \inf_n \nu_n \le \sup_n \nu_n \lt \infty$ (as e.g. for Poisson variables) the classification simplifies. Here, we are in the supercritical regime if and only if $\sum_{k \ge 0} 1/\mu_k \lt \infty$ (enforcing $\mu_n \to \infty$). Asymptotically degenerate behaviour is excluded, and there is plenty of room for critical processes, i.e. for processes which conform to the conditions $\sum_{k \ge 0} 1/\mu_k = \infty$ and $1/\mu_n =o(\sum_{k=0}^{n-1} 1/\mu_k)$. The second requirement is e.g. fulfilled if we have $\mu_n/\mu_{n-1} \to 1$ as $n \to \infty$. This latter condition covers a variety of scenarios for $\mu_n$ below exponential growth and above exponential decay.
Example 5. In the binary case $\mathrm P(Y_n=2) = p_n$, $\mathrm P(Y_n=0)=1-p_n$ we get $f^{\prime}_{n}(1)= f^{\prime\prime}_{n}(1)=2p_n$. Therefore $\nu_k/\mu_{k-1}=1/\mu_k$, so that the situation conforms to the previous example.
Example 6. In the symmetric case $\mathrm P(Y_n=0)=\mathrm P(Y_n=2)= p_n/2$ and $\mathrm P(Y_n=1)=1-p_n$ we have $\mu_n=1$ and $\nu_n=p_n$. Here, we find critical or asymptotically degenerate behaviour, according to whether $\sum_{k=1}^\infty p_n$ is divergent or convergent.
Example 7. If the $Y_n$ take only the values 0 and 1, then all $\nu_n$ vanish. Now the BPVE is subcritical or asymptotically degenerate, according to whether $\mu_n$ converges to zero or to a positive value.
Our proofs rely largely on analytic considerations. The task is to get a grip on the probability measures $f_1 \circ \cdots \circ f_n$, which are the distributions of the random variables $Z_n$. In order to handle such iterated compositions of generating functions we resort to a device which has been applied from the beginning in the theory of branching processes. For a probability distribution f on $\mathbb N_0$ with positive, finite mean m we define a function $\varphi\,:\,[0,1)\to \mathbb R$ by the equation
In this way the mean and the ‘shape’ of f are separated to a certain extent. Indeed, Lemma 1 below shows that $\varphi$ takes values which are of the size of the standardized second factorial moment $\nu$. Therefore we briefly name $\varphi$ the shape function of f. As we shall see, these functions are useful to dissolve the generating function $f_1 \circ \cdots \circ f_n$ into a sum (see Lemma 4 below). Here, our contribution consists in obtaining sharp upper and lower bounds for the function $\varphi$ and its derivative. The interaction of these bounds then allows for precise estimates e.g. of the survival probabilities $\mathrm P(Z_n \gt 0)$. The role of Assumption (A) in this interplay is to keep both bounds together uniformly in n.
Concluding this introduction, let us comment on the literature. Agresti in his paper [Reference Agresti1] on almost sure extinction already derived the sharp upper bound for $\varphi$ which we give below in (8). We note that this bound is related to the well-known Paley–Zygmund inequality (compare the proof of Lemma 7). Agresti also obtained a lower bound for the survival probabilities, which, however, in general is away from our sharp bound. Lyons [Reference Lyons18] obtained the equivalence of conditions (v), (vi), (vii) and (somewhat disguised) (viii) from Theorem 1 under the assumption that the random variables $Y_n$ are a.s. bounded by a constant, with methods completely different from ours. He also proved Theorem 2, again under the assumption that the offspring numbers are a.s. uniformly bounded by a constant. D’Souza and Biggins [Reference D’Souza and Biggins7] derived Theorem 2 under a different set of assumptions. They required that there are numbers $a>0$, $b>1$ such that $\mu_{m+n}/\mu_m \ge ab^n$ for all $m,n \ge 1$ (called the uniform supercritical case). They did not need finite second moments, but assumed instead that the random variables $Y_n$ are uniformly dominated by a random variable Y with $\mathrm E[Y\log^+ Y] \lt \infty$. Goettge [Reference Goettge12] obtained $\mathrm E [W]=1$ under the condition $\mu_n \ge an^b$ with $a>0$, $b>1$ (together with a uniform domination assumption), but did not consider the validity of the equation $\mathrm P(W=0)=q$. In order to prove the conditional limit law from Theorem 4, Jagers [Reference Jagers15] drew attention to uniform estimates due to Sevast’yanov [Reference Sevast’yanov21] (see also [Reference Fahady, Quine and Vere Jones9, Lemma 3]). This approach demands, amongst others, the strong assumption that the sequence $\mathrm E[Z_n]$, $n \ge0$, is bounded from above and away from zero. Independently, and in parallel to our work, Bhattacharya and Perlman [Reference Bhattacharya and Perlman3] have presented a considerable generalization of Jager’s result, on a different route and under assumptions which are stronger than ours. For recent results on almost sure extinction and asymptotic exponentiality of multitype BPVEs we refer to [Reference Dolgopyat, Hebbar, Koralov and Perlman8].
The remainder of this paper is organized as follows. In Section 2 we discuss the assumptions and several examples. In Section 3 we analyze the shape function $\varphi$. Section 4 contains the proofs of our theorems.
2. Examples
The following example illustrates the difference in range of the conditions (A) and (B).
Example 8. Let Y have a linear fractional distribution, meaning that
with some $0<p<1$ and some probability $\mathrm P(Y\ge 1)$. Then, from properties of geometric distributions, we have
and it follows that
Thus, for any sequence $Y_n$ of linear fractional random variables, Assumption (A) is fulfilled with $c=4$, whatever the parameters $p_n$ and $\mathrm P(Y_n \ge 1)$ are.
However, for condition (B) the corresponding statement fails. To see this we resort for linear fractional distributions to the formula
If we assume (B), then the inequality (6) is also valid, yielding
For linear fractional distributions this estimate may be rewritten as
which simplifies to
Thus, condition (B) implies $\inf_n (p_n + \mathrm P(Y_n \ge 1)) \gt 0$, and a sequence of linear fractional random variables satisfying $p_n + \mathrm P(Y_n \ge 1)\le1/n$ does not meet (B).
Incidentally, Theorem 4 still holds true for linear fractional $Y_n$, $n \ge 1$, regardless of the validity of (B). Then, as is well known, $Z_n$ is also linear fractional for any $n \ge 1$, and consequently the sequence $Z_n/ \mathrm E[Z_n \mid Z_n\ge 1]$ given the events that $Z_n \ge 1$ converges in distribution to a standard exponential distribution provided that we have $\mathrm E[Z_n \mid Z_n\ge 1] \to \infty$.
In other examples, a direct verification of Assumptions (A) or (B) can be cumbersome. Therefore we introduce another assumption, which is often easier to handle: there is a constant $\skew2\bar{c} \lt \infty$ such that, for all natural numbers $n \ge 1$,
Condition (C) implies (A) and (B), as seen from the following proposition.
Proposition 2. If condition (C) is fulfilled, then (B) holds with $c_\varepsilon \,:\!=\max(3,5\skew2\bar{c}/\varepsilon)$and (A) holds with $c\,:\!=\max (12,40 \skew2\bar{c})$.
Proof. From $c_\varepsilon \ge 3$ and (C) we obtain
It follows that
which is our first claim. The second one follows by means of (6).
Condition (C) can be easily handled by means of generating functions and their derivatives. Here are some examples.
Example 9. If the $Y_n$ are a.s. uniformly bounded by a constant c, then (C) is satisfied with $\skew2\bar{c}=c$.
Example 10. Let Y be Poisson with parameter $\lambda \gt 0$. Then
Here, (C) is fulfilled with $\skew2\bar{c}=1$.
Example 11. For binomial Y with parameters $m\ge 1$ and $0 \lt p \lt 1$ the situation is analogous, and here
Example 12. For a hypergeometric distribution with parameters (N,K,m) we have, for $N \ge 3$,
and (C) is satisfied with $\skew2\bar{c}=3$. The case $N \le 2$ can immediately be included.
Example 13. For negative binomial distributions the generating function is given by
with $0<p<1$ and a positive integer $\alpha$. Now,
Thus,
Again, (C) is fulfilled with $\skew2\bar{c}=3$.
3. Bounds for the shape function
For $f \in \mathcal P$ with mean $0<m=f^{\prime}(1)<\infty$, define the shape function as the function $\varphi=\varphi_f\,:\, [0,1) \to \mathbb R$ given by the equation
Due to convexity of f(s) we have $\varphi(s) \ge 0$ for all $0\le s \lt 1$. By means of a Taylor expansion of f around 1, one obtains $\lim_{s \uparrow 1} \varphi(s) = f^{\prime\prime}(1)/(2f^{\prime}(1)^2) $, and thus we extend $\varphi$ by setting
In this section we prove the following sharp bounds.
Lemma 1. Assume $f^{\prime\prime}(1)<\infty$. Then, for $0\le s \le 1$,
Note that $\varphi$ is identically zero if $f[z]=0$ for all $z \ge 2$. Otherwise, $\varphi(0)>0$ and the lower bound of $\varphi$ becomes strictly positive. Choosing $s=1$ and $s=0$ in (8), we obtain $\varphi(0)/2\le \varphi(1)$ and $\varphi(0)\le 2\varphi(1)$. Note that for $f=\delta_k$ (Dirac measure at point k) and $k \ge 2$ we have $\varphi(1)=\varphi(0)/2$, implying that the constants 1/2 and 2 in (8) cannot be improved. The upper bound was derived in [Reference Geiger and Kersting11] using a different method of proof.
The next lemma is based on a close investigation of the derivative of $\varphi(s)$.
Lemma 2. Let Y be a random variable with distribution f and assume $ f^{\prime\prime}(1) \lt \infty$. Then, for $0 \le s \le 1$and natural numbers $a\ge 1$,
Uniform estimates of $\varphi(1)-\varphi(s)$ based on third moments have already been obtained by Sevast’yanov [Reference Sevast’yanov21] and others (see [Reference Fahady, Quine and Vere Jones9, Lemma 3]). Our lemma implies and generalizes these estimates. For the proof of these lemmas we use the following result.
Lemma 3. Let $g_1,g_2$be elements of $\mathcal P $with the same support and satisfying the following property. For any $y\in \mathbb N_0$with $g_1[y]>0$we have
Also, let $\alpha\,:\, \mathbb N_0 \to \mathbb R$be a non-decreasing function. Then
Proof. The lemma’s assumption is called the ‘monotone likelihood ratio property’, which is known to imply our claim. For convenience, we give a short proof. By assumption there is a non-decreasing function h(y), $y \in \mathbb N_0$, such that $h(y)= g_2(y)/g_1(y)$ for all elements y of the support of $g_1$. Then, for any real number c,
For $c\,:\!= \min\{ \alpha(y)\,:\, h(y) \ge 1\}$ we have $\alpha(0) \le c \lt \infty$. For this choice of c, since h and $\alpha$ are non-decreasing, every summand of the right-hand sum is non-negative. Thus, the whole sum is non-negative, too, and our assertion follows.
Proof of Lemma 1. (i) First, we examine a special case of Lemma 3. Consider for $0 \lt s \le 1$ and $r \in \mathbb N_0$ the probability measures
Then, for $0 \lt s \le t\le 1$, $0 \le y \lt z\le r$ we have $g_s[z ]/g_s[y]=s^{y-z}\ge t^{y-z}= g_t[z]/g_t[y]$. Hence, we obtain that
is a decreasing function in s. Also, $\sum_{y=0}^r yg_0[y] = r$ and $\sum_{y=0}^r yg_1[y]= r/2$, and it follows for $0\le s \le 1$ that
(ii) Next, we derive a second representation for $\varphi$. We have
and
Therefore,
From (9) it follows that
with
Now consider the probability measures $g_s\in \mathcal P$, $0\le s \le 1$, given by
Then, for $f[y] \gt 0$ and $z>y$, after some algebra,
which is an increasing function in s. Therefore, by Lemma 3, the function $\psi(s)$ is increasing in s. In combination with (10) we get
This gives the claim of the lemma.
Proof of Lemma 2. First, we estimate the derivative of $\varphi$, which is given by
It turns out that this expression becomes more manageable if we replace the squared geometric mean $\sqrt{mf^{\prime}(s)}$ on the right-hand side by the square of the arithmetic mean $(m+f^{\prime}(s))/2$. Therefore, we split the derivative into parts according to
with
We show that both $\psi_1$ and $\psi_2$ are non-negative functions, and estimate them from above.
For $\psi_1$ we accomplish this task by introducing the function
Since
for all $0 \le s \le 1$, and since $\zeta(1)=0$, we see that $\zeta$ is a non-negative, decreasing function. Thus, $\psi_1$ is a non-negative function, too. Also, $\zeta(0)\le m$.
Moreover, we have for $y \ge 3$ the polynomial identity
and consequently
with
The function $\xi $ is non-negative and increasing.
Coming back to $\psi_1$, we rewrite it as
Using $f^{\prime}(s)\le m$, it follows that
By means of Lemma 1, by the monotonicity properties of $\xi$ and $\zeta$, and by $\varphi(1)=\nu/2$, $\zeta(0)\le m$, we obtain
Now we investigate the function $\psi_2$, which we rewrite as
We have
and
Using the notation from (11) it follows that
As above, we may apply Lemma 3 to the probability measures $g_s$ and conclude that the right-hand term is increasing with s. Therefore,
and hence
Coming to our claim, note first that owing to the non-negativity of $\psi_1$ and $\psi_2$ we obtain from (12), for any $s \le u \le 1$,
Equations (13) and (14) entail
It remains to estimate the right-hand integral. We have, for $0 \le s \lt 1$,
The right-hand sum is monotonically decreasing in u, and therefore for natural numbers a we end up with the estimate
Combining this estimate with (15), our claim follows.
Remark 2. We have
and hence from (12), (13), (14) and the monotonicity of $\xi$ for $0 \le s \le 1$,
The quality of these bounds becomes evident from the observation that
as follows by means of Taylor expansions of f and fʹ about 1.
4. Proofs of the theorems
First let us consider some formulas for moments. There exists a clear-cut expression for the variance of $Z_n$ due to Fearn [Reference Fearn10]. It seems to be less noticed that there is a similar appealing formula for the second factorial moment of $Z_n$, which turns out to be more useful for our purpose.
Lemma 4. For a BPVE $(Z_n)_{n \ge 0}$we have
Proof. The proof follows a standard pattern. Let $v=(\,f_1,f_2, \ldots\!)$ denote a varying environment. For non-negative integers $k \le n$ let us define the probability measures
with the convention $f_{n,n}= \delta_1$ (the Dirac measure at point 1). We have
in particular $f^{\prime}_{n,n}(s)=1$, and after some rearrangements we have
in particular $f^{\prime\prime}_{n,n}(s)=0$. Since the distribution of $Z_n$ is given by $f_{0,n}$, by choosing $k=0$ and $s=1$ Lemma 4 is proved.
Next, we recall an expansion of the generating function of $Z_n$ taken from [Reference Jirina16] and [Reference Geiger and Kersting11]. This kind of formula has been used in many investigations of branching processes. Let $\varphi_n$, $ n \ge 1$, be the shape functions of $f_n$, $n \ge 1$. Then, since $f_{k,n}=f_{k+1}\circ f_{k+1,n}$ for $k \lt n$, we have
Iterating the formula we end up with the following identity.
Lemma 5. For $0\le s \lt 1$, $0 \le k \lt n$,
i.e. $\varphi_{k,n}$ is the shape function of $f_{k,n}$.
In order to estimate survival probabilities, Assumption (A) now comes into play. The next lemma reveals its role.
Lemma 6. Condition (A) is fulfilled if and only if there is a constant $c' \lt \infty$such that we have $\varphi_n(1)\le c'\varphi_n(0)$for all $n \ge 1$.
Proof. Recall that $Y_n$ denotes a random variable with distribution $f_n$. We have $\mathrm P(Y_n \ge 2)=0$ if and only if $\varphi_n(1)= \mathrm E[Y_n(Y_n-1)]/(2\mathrm E[Y_n]^2)=0$. Then both inequalities from (A) and from our lemma are valid for all $c \gt 0$ and $c'>0$, respectively. Therefore we may, without loss of generality, assume that $\mathrm P(Y_n \ge 2)>0$ for all $n \ge 1$. Then we have
and therefore, because of (7),
It is not difficult to see that these expressions are bounded uniformly in n if and only if the same holds true for the terms
which in turn is equivalent to condition (A). This gives our claim.
In particular, if $\varphi_n(1)\le c'\varphi_n(0)$ for all $n \ge 1$ then we obtain for the shape functions $\varphi_{k,n}$ of the generating functions $f_{k,n}$ from Lemma 5, by means of Lemmas 6 and 1,
for all $1\le k \le n$. This estimate, together with Lemma 4, proves Remark 1 from Section 1, namely that any subsequence of a regular BPVE is regular, too.
The next lemma has a forerunner in Agresti’s estimate [Reference Agresti1, Theorem 1].
Lemma 7. Under Assumption (A) there is a $\gamma>0$such that, for all $n \ge 0$,
Proof. The left-hand estimate is just the standard Paley–Zygmund inequality. For the right-hand estimate observe that $\mathrm P(Z_n \gt 0)= 1- f_{0,n}[0]=1-f_{0,n}(0)$. Using Lemma 4 with $s=0$ we get the representation
and hence, by means of Lemma 1,
and, by Assumption (A), Lemma 6 and (7),
Letting $\gamma\,:\!= \min(1, (4c')^{-1})$, we obtain
On the other hand, Lemma 4 implies that
Combining the last two formulas, our claim follows.
(i) if and only if (ii): Since $\lim_{n \to \infty} \mathrm P(Z_n>0) = 1-q$, the equivalence follows from Lemma 7.
(ii) if and only if (iii): We have
(19)\begin{align}\sum_{k=1}^n \frac {\rho_k}{\mu_{k-1}} &= \sum_{k=1}^n \frac{\nu_k+ f_k(1)^{-1}-1}{\mu_{k-1}}\notag \\&= \sum_{k=1}^n \frac {\nu_k}{\mu_{k-1}} + \sum_{k=1}^n \bigg(\frac 1{\mu_k}-\frac 1{\mu_{k-1}}\bigg) = \sum_{k=1}^n \frac {\nu_k}{\mu_{k-1}} + \frac 1{\mu_n} -1 ;\end{align}thus, because of (18),(20)\begin{align} \frac{\mathrm E[Z_n^2]}{\mathrm E[Z_n]^2} = \sum_{k=1}^n \frac {\rho_k}{\mu_{k-1}} +1 .\end{align}This gives the claim.(iii) if and only if (iv): This equivalence is an immediate consequence of (19).
(v) if and only if (vi): This implication follows again from Lemma 7.
(vi) if and only if (vii): This is a consequence of (20).
(vii) if and only if (viii): Again, this claim follows from (19).
Remark 3. From (17) it follows that a sufficient condition for almost sure extinction is given by the single requirement $\sum_{k\ge 1} \varphi_k(0)/\mu_{k-1} = \infty$ (without (A)). This confirms a conjecture of Jirina [Reference Jirina16].
Proof of Theorem 2. Statement (i) is obviously valid. For the first part of statement (ii), note that, from Theorem 1(vi) it follows that $\sup_{n \ge 0} \mathrm E[W_n^2] \lt \infty$. Therefore the martingale $(W_n)_{n \ge 0}$ is bounded in $\mathcal L^2$, implying $\mathrm E[W]=\mathrm E[W_0]=1$ and $\mathrm E[W^2] \lt \infty$. From (20) it follows that
This implies (1).
For the proof of the last claim we distinguish two cases. Either $\mu_n \to r$ with $0<r<\infty$, in which case $W_n =Z_n/\mu_n \to Z_\infty/r$ a.s. and consequently $W=Z_\infty/r$ a.s. and $\mathrm P(W=0)= \mathrm P(Z_\infty=0)=q$, or we may assume $\mu_n \to \infty$ in view of Theorem 1(viii). Also, $ \{Z_\infty=0\} \subset \{W=0\}$ a.s., and thus it is sufficient to show that $\mathrm P(Z_\infty \gt 0, W=0)=0$. First, we estimate $\mathrm P(Z_\infty=0 \mid Z_k=1)$ from below. From Lemmas 5 and 1, for $k \lt n$,
as well as
with $\lambda \gt 0$. By means of Lemma 4 this entails
Letting $n \to \infty$ we get
and with $\lambda \to \infty$,
Using ${\rm e}^{-2x} \le 1-x $ for $0 \le x \le 1/2$, it follows for $\mathrm P(W>0 \mid Z_k=1) \le (8c')^{-1}$ that
Now we draw on a martingale which already appears in the work of D’Souza and Biggins [Reference D’Souza and Biggins7]. For $n \ge 0$, let
From standard martingale theory $M_n\to I\{W=0\}$ a.s. In particular, we have
a result which has already been exploited by D’Souza [Reference D’Souza6].
We distinguish two cases. Either there is an infinite sequence of natural numbers such that $\mathrm P(W \gt 0 \mid Z_n=1) \gt (8c')^{-1}$ along this sequence, so (22) implies that $Z_n \to 0$ a.s. on the event $W=0$, or we may apply our estimate (21) to obtain from (22) that
Therefore, given $\varepsilon \gt 0$, we have, for n sufficiently large,
Letting $n \to \infty$ we thus obtain $\mathrm P(Z_\infty \gt 0, W=0) \le \varepsilon$; the claim then follows with $\varepsilon \to 0$.
Proof of Theorem 3. We begin with the proof of the last claim. Note that the assertion from Lemma 7 can be rewritten as
and (18) gives $ \mathrm E[Z_n^2]/\mathrm E[Z_n]= 1+ \mu_n \sum_{k=1}^n \frac{\nu_k}{\mu_{k-1}}=a_n$. This implies (4).
Consequently, by means of Markov’s inequality we obtain
which implies the theorem’s first claim.
Concerning the second claim we remark that for $a_n \lt 2$ we may set $u=1/2$. For $a_n \ge 2$ we have, by means of Lemma 5, the estimate
with $0 \lt s \lt 1$ and $u \gt 0$. Lemmas 1 and 6 along with (7) yield the bound
Moreover, $1-s^{1/a_n} \ge a_n^{-1}(1-s) $, since $1/a_n \le 1$. Hence, choosing $s=1/2$ we get
Finally, from $a_n\ge 2$ it follows that $a_n \le 2 \mu_n \sum_{k=1}^{n} \nu_k/\mu_{k-1}$, and consequently
for all $u>0$. If we now set $\theta =1/(40 c')$ and choose $u>0$ so small that $1-2^{-u} \le \theta$ we obtain $ \mathrm P(Z_n/a_n \gt u) \ge \theta $, which is our second claim.
The next lemma prepares the proof of Theorem 4. It clarifies the role of (B).
Lemma 8. Assume condition (B) and let $q=1$. Then the condition
implies
as $n\to \infty$.
Proof. Fix $\varepsilon \gt 0$ and choose $c_{\varepsilon/9}$ according to Assumption (B). Let
with some $0<\eta \lt 1$. Then, from Lemma 3 with $a= \lfloor c_{\varepsilon/9} \rfloor$,
From the estimate in (6) it follows that
Therefore there is an $\eta=\eta_\varepsilon>0$ such that
Now set
Because of $f_{n,n}(0)=0$ this minimum is attained. In view of (24) and Lemma 1 it follows that
From (23) we have
and from Lemma 6,
From (16) it follows that $\mathrm P(Z_n>0 \mid Z_r=1)^{-1} =\mu_r/\mu_n+ \mu_r \sum_{k=r+1}^n\varphi_k(\,f_{k,n}(0))/\mu_{k-1}$ for $n>r$, and hence we may proceed to
Putting our estimates together, we get
Now the assumption $1/\mu_n=o(\sum_{k=1}^{n} \nu_k/\mu_{k-1})$ comes into play. It implies that there is a positive integer $r_\varepsilon$ such that, for all r, n with $ r_\varepsilon \lt r\le n $,
Also, from the assumptions that $q=1$ and $1/\mu_n=o(\sum_{k=1}^{n} \nu_k/\mu_{k-1})$, together with Theorem 1(iv) and (7), we have
as $n \to \infty$, which implies that (26) holds for all $r \le r_\varepsilon$ and thus for all $r \le n$, if only n is large enough. Thereby we may combine (25) and (26) to obtain
for sufficiently large n. This proves our claim.
Proof of Theorem 4. (i) implies (ii): We argue by contradiction. If assertion (ii) fails, then there is an increasing sequence $(n_i)_{i\ge 0}$ in $\mathbb N$ fulfilling $\sup_i \mathrm E[Z_{n_i}\mid Z_{n_i}>0] \lt \infty$. From Theorem 3 it follows that the random variables $Z_{n_i}$, $i \ge 0$, conditioned on $Z_{n_i}>0$ are tight. This does not conform with assertion (i), which proves the implication.
(ii) implies (iii): This implication follows from Theorem 3, since assertion (iii) just states that $a_n\to \infty$.
(iii) implies (i): For the proof, let
From Lemma 5 we have
Since $b_n \to \infty$, from Lemma 8 and the theorem’s assumption we have
as $n \to \infty$. From the definition of $b_n$ we get
This implies assertion (i).
Moreover, from (16), Lemma 8 and assertion (iii) it follows that
This formula give the extra claims, which concludes the proof.
Proposition 1. By Theorem 1(viii) the condition $q \lt 1$ is equivalent to the requirements of both $\sum_{k=1}^\infty \nu_k/\mu_{k-1} \lt \infty$ and $0 \lt \lim_n \mu_n \le \infty$. As already explained, the division between the supercritical regime and the asymptotically non-degenerate regime corresponds to the cases $\lim_n \mu_n=\infty $ and $0 \lt \lim_n \mu_n<\infty$. This gives the first two assertions of the proposition.
Next, the critical regime is given by the requirements that both $\mathrm E[Z_n \mid Z_n>0] \to \infty$ and $q=1$. By Theorems 3 and 1(iv) we may equivalently require that $1/\mu_n =o(\sum_{k=1}^n \nu_k/\mu_{k-1})$ together with either $\sum_{k=1}^n \nu_k/\mu_{k-1}=\infty$ or $\mu_n \to 0$. However, the third and the first of these conditions imply the second one, therefore the third condition can be skipped, and we end up with the requirements $1/\mu_n =o(\sum_{k=1}^n \nu_k/\mu_{k-1})$ and $\sum_{k=1}^n \nu_k/\mu_{k-1}=\infty$, as stated in the proposition.
Finally, the subcritical regime is characterized by the conditions $\mathrm E[Z_n \mid Z_n \gt 0] \not\to \infty$ and $q=1$. Because of Theorem 3, the first condition is equivalent to the requirement $a_n \not\to \infty$, respectively to $\liminf_n \mu_n \sum_{k=1}^n \nu_k/\mu_{k-1} \lt \infty$. Moreover, $\liminf_n \mu_n=0$ implies $q=1$, and therefore the conditions stated in the proposition imply subcriticality. Conversely, if $q=1$ then by Theorem 1(iv) we have $\lim_n \mu_n=0$ or $\sum_{k=1}^n \nu_k/\mu_{k-1} \lt \infty$. The former of these conditions trivially yields $\liminf_n \mu_n=0$, whereas the latter, together with $\liminf_n \mu_n \sum_{k=1}^n \nu_k/\mu_{k-1} \lt \infty$, implies $\liminf_n \mu_n=0$. Therefore the two conditions stated in the proposition are also necessary for subcriticality.