Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-02-05T17:57:29.904Z Has data issue: false hasContentIssue false

The number of individuals alive in a branching process given only times of deaths

Published online by Cambridge University Press:  31 January 2025

Frank Ball*
Affiliation:
University of Nottingham
Peter Neal*
Affiliation:
University of Nottingham
*
* Postal address: School of Mathematical Sciences, University of Nottingham, NG7 2RD, United Kingdom.
* Postal address: School of Mathematical Sciences, University of Nottingham, NG7 2RD, United Kingdom.
Rights & Permissions [Opens in a new window]

Abstract

The study of many population growth models is complicated by only partial observation of the underlying stochastic process driving the model. For example, in an epidemic outbreak we might know when individuals show symptoms to a disease and are removed, but not when individuals are infected. Motivated by the above example and the long-established approximation of epidemic processes by branching processes, we explore the number of individuals alive in a time-inhomogeneous branching process with a general phase-type lifetime distribution given only (partial) information on the times of deaths of individuals. Deaths are detected independently with a detection probability that can vary with time and type. We show that the number of individuals alive immediately after the kth detected death can be expressed as the mixture of random variables each of which consists of the sum of k independent zero-modified geometric distributions. Furthermore, in the case of an Erlang lifetime distribution, we derive an easy-to-compute mixture of negative binomial distributions as an approximation of the number of individuals alive immediately after the kth detected death.

Type
Original Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

For many population growth models, both mathematical modelling and statistical inference are complicated by only partial observation of the underlying stochastic process driving the model. For example, in epidemic models we might have information on when individuals show symptoms of the disease and hence are removed from the active population, but rarely have information on when an individual is infected. Therefore, a question of interest at time t, say, is, given information on the times at which individuals exhibited symptoms (and were removed) up to time t, how many infectives are there in the population at time t? This is the question considered and answered for the Markovian susceptible–infectious–removed (SIR) epidemic in Ball and Neal [Reference Ball and Neal2], where the distribution of the number of infectives was derived given only information on removal times assuming an exponential infectious period. In [Reference Ball and Neal2] a time-inhomogeneous birth–death process was considered as an approximation of the epidemic process, with the number of individuals alive in the birth–death process approximating the number of infectives in the epidemic process; cf. Whittle [Reference Whittle11]. In particular, it was shown that the distribution of the number of individuals alive in a time-inhomogeneous birth–death process at time t, given the times of deaths of individuals up to and including time t, can be expressed as a mixture of negative binomial distributions.

The time-inhomogeneous birth–death process assumes that the rate $\mu_t$ at which an individual dies does not depend on how long the individual has been alive. In the case where $\mu_t = \gamma$ $(t \in \mathbb{R})$ , this corresponds to individuals having exponentially distributed lifetimes with mean duration $1/\gamma$ . However, exponentially distributed lifetimes (infectious periods) are unrealistic in most biological settings. Therefore phase-type distributions, and in particular, the Erlang distribution, are commonly chosen to model infectious period distributions in SIR epidemic models; see, for example, Walker et al. [Reference Walker, Black and Ross10], Black and Ross [Reference Black and Ross3], and House et al. [Reference House, Ross and Sirl5].

In this paper we consider the situation where individuals’ lifetimes are independent and identically distributed (i.i.d.) according to an arbitrary, but specified, phase-type distribution L; see, for example, Chapter III of Asmussen [Reference Asmussen1]. Phase-type distributions are presented in Section 2, and L can be represented as the distribution of the time to absorption in a continuous-time Markov chain with $J( \lt \infty)$ transient states and one absorbing state 0. Thus the case $J=1$ corresponds to an exponential distribution. Special cases of the phase-type distribution include the hyper-exponential distribution (mixture of $J ( \gt 1)$ exponential distributions) and the Erlang distribution (sum of J i.i.d. exponential distributions). Moreover, the class of phase-type distributions is dense, so any lifetime distribution on $[0,\infty)$ can be approximated arbitrarily closely by a phase-type distribution and there is essentially no loss of generality in focussing on phase-type distributions.

We study time-inhomogeneous branching processes where all individuals alive at time t have birth rate $\beta_t$ . We place no restriction on the form $\beta_t$ can take, and therefore we can use the branching process model to approximate epidemic models with $\beta_t$ denoting the effective infection rate at time t. For example, in the modelling of an emerging disease such as Covid-19 in a large population (e.g. a country), the infection rate $\beta_t$ varies over time with the implementation and relaxation of control measures. We assume that individuals are potentially detected (observed) at death (removal in an epidemic context). We allow the detection probability to depend on both the time of death and the individual’s type at death, where, during their lifetime, an individual’s type is defined by the state j ( $j=1,2,\ldots, J$ ) of the lifetime Markov chain to which they belong. In an epidemic context we allow for asymptomatic individuals and more general under-reporting of infectious cases, which can fluctuate through time to account for differing levels of surveillance and disease awareness.

Given the times of observed deaths (removals) in the population, we show in Theorem 1 that the number of individuals alive in the population immediately after the kth detected death, $X_k^\ast$ say ( $k=1,2,\ldots$ ), can be expressed as the mixture of $(k-1)!$ random variables, each of which consists of the sum of k independent zero-modified geometric distributions; see Definition 1. Computing the mixture of $(k-1)!$ random variables to obtain $X_k^\ast$ using Theorem 1 is only possible for small k. However, under certain conditions it is shown in Corollary 1 that $X_k^\ast$ can be expressed as a mixture of k negative binomial distributions, although computing the mixing weights is non-trivial. In Section 3.5.2, we present an approximation for $X_k^\ast$ using a mixture of k negative binomial distributions with a simple-to-compute recursive relationship for $X_k^\ast$ in terms of $X_{k-1}^\ast$ and the proportion of individuals of each type at the $(k-1)$ th detected death in the spirit of [Reference Ball and Neal2, Theorem 3.2]. This approximation is applicable for Erlang lifetime distributions and is successfully applied in a simulation study in Section 7.

As noted above, the current work extends the results of [Reference Ball and Neal2]. Previous work in a similar vein but for a slightly different model can be found in Trapman and Bootsma [Reference Trapman and Bootsma9], Lambert and Trapman [Reference Lambert and Trapman6], Lefèvre and Picard [Reference Lefèvre and Picard7], and Lefèvre et al. [Reference Lefèvre, Picard and Utev8]. In these papers it is assumed that each individual, whilst alive, can be detected at a constant rate, $\delta$ say. The papers [Reference Lambert and Trapman6, Reference Trapman and Bootsma9] focus on the distribution of individuals alive at the first detection time, with [Reference Lambert and Trapman6] allowing for a general lifetime distribution. The papers [Reference Lefèvre and Picard7, Reference Lefèvre, Picard and Utev8] restrict attention to Markovian models but allow for epidemic dynamics and multiple births, and they study the size of the population at the jth detection $(j \geq 1)$ . Probability generating functions are also a key tool in [Reference Lefèvre and Picard7, Reference Lefèvre, Picard and Utev8] and in themselves provide no insight as to the distribution of the size of the population (number of infectives) at a given detection time.

The remainder of the paper is structured as follows. In Section 2 we define the time-inhomogeneous phase-type branching process model. In Section 3, we present the main results (Theorem 1 and the approximation of Section 3.5.2), along with an overview of the exploration process (Section 3.2) between two time points t and $t+\tau$ , which plays an important role in identifying how the population evolves between detected deaths. We also introduce key notation (Section 3.3). In Section 4 we present a more detailed analysis of the exploration process, including the computation of the key quantities required for Theorem 1. In Section 5 we prove the main result, Theorem 1. In Section 6 we use results from the proof of Theorem 1 to establish the approximation given in Section 3.5.2. Finally, in Section 7 we illustrate the approximation given in Section 3.5.2 through a numerical example. Further examples are provided in the supplementary material.

2. Phase-type branching process model

In this section we define the phase-type branching process model considered in this paper. We consider a time-inhomogeneous branching process where the birth rate, $\beta_t$ , depends on t but all individuals have i.i.d. lifetimes distributed according to a phase-type distribution L. Examples of L include the exponential, hyper-exponential (mixture of exponentials), and Erlang distributions. We begin by considering the relationship between L and an associated Markov process.

Consider a Markov process with J transient states, labelled $1,2,\ldots,J$ , and an absorbing state, labelled state 0. For $j=1,2,\ldots, J$ , let $\chi_j$ be the probability that the Markov process starts at time 0 in state j with $\sum_{i=1}^J \chi_i =1$ . The Markov process moves between the states $1,2,\ldots, J$ according to a sub-stochastic transition-rate matrix $\Lambda = (\lambda_{ij})$ . That is, for all $j =1,2,\ldots, J$ , we have $\sum_{i \neq j} \lambda_{ji} \leq - \lambda_{jj}$ , with strict inequality for at least one j. For $t \geq 0$ , let

(2.1) \begin{align} \textbf{P} (t) = \exp\!(t \Lambda),\end{align}

so $P_{ij} (t)$ is the probability that if the Markov process starts in state i at time 0 then it is in state j at time t. Since the states $1,2,\ldots, J$ are transient, we have that $\textbf{P} (t) \rightarrow \textbf{0}_{J \times J}$ ( $J \times J$ matrix of zeros) as $t \rightarrow \infty$ . The probability the Markov process has entered state 0 (absorption) by time t given that it starts in state i is $P_{i0} (t) = 1- \sum_{j=1}^J P_{ij} (t)$ . The distribution of the time to absorption, L, is a phase-type distribution with cumulative distribution function

(2.2) \begin{align} \mathbb{P} (L \leq t) = \sum_{i=1}^J \chi_i P_{i0} (t) = 1 - \sum_{i=1}^J \chi_i \sum_{j=1}^J P_{ij} (t) \qquad (t \geq 0)\end{align}

and $\mathbb{P} (L \leq t) =0$ $(t \lt 0)$ .

Let us turn to the branching process. Suppose that there is one initial individual in the population. We define time 0 to be the time at which the first detected death occurs in the population, so the initial individual enters the population sometime before time 0. Note that it is possible for the branching process to die out without any deaths being detected. However, we are interested in situations where at least one death is detected. We assume that there are J types of individual in the population, corresponding to the J states of the Markov process with transition matrix $\Lambda$ to which L is associated. Therefore, for $t, \tau \gt 0$ , an individual of type j at time t has probability $P_{jl} (\tau)$ of being an individual of type l at time $t + \tau$ .

For all $t \in \mathbb{R}$ , let $\beta_t$ denote the birth rate at time t. We assume that there exists $\alpha_1 \gt 0$ such that $\beta_t = \alpha_1$ $(t \leq 0)$ . That is, there is a constant, positive birth rate up to and including the time of the first detected death. At time t, we assume that all individuals alive, irrespective of their type, give birth to new individuals at rate $\beta_t$ . Similarly, for $j=1,2,\ldots, J$ and $t \in \mathbb{R}$ , let $d_{t,j}$ denote the probability that an individual of type j who dies at time t is detected with $\textbf{d}_t = (d_{t,1}, d_{t,2}, \ldots, d_{t,J})$ . Therefore we allow the probability of a death being detected to depend on type, although it should be noted that, for $l=1,2,\ldots,J$ , $d_{t,l}$ is only required if $\gamma_l \gt 0$ , where $\gamma_l = -\sum_{i=1}^J \lambda_{li}$ is the death rate of a type-$l$ individual. We require at least one $j=1,2,\ldots,J$ , such that $d_{t,j} \gamma_j \gt 0$ to ensure that some deaths are detected. The proofs presented in this paper do not extend to the case where the birth rate is type-dependent. For $j=1,2,\ldots, J$ , we assume that there exists $0 \leq \epsilon_j \leq 1$ such that $d_{t,j} = \epsilon_j$ $(t \leq 0)$ . That is, there is a constant probability of detecting a death of a type-j individual up to, and including, the first detected death. The assumptions of time-homogeneity in the branching process and its observation process prior to the first detected death are in agreement with [Reference Ball and Neal2, Theorems 3.1 and 3.2] and ensure that the number of individuals alive immediately following the first detected death follows a geometric distribution.

Given the number of individuals of each type, the above time-inhomogeneous phase-type branching process is a Markov process. For $t \in \mathbb{R}$ and $j=1,2,\ldots, J$ , let $Y_j (t)$ denote the number of individuals of type j in the population at time t, and let $\textbf{Y} (t) = (Y_1 (t), Y_2 (t), \ldots, Y_J (t))$ and $Y^\ast (t) = \sum_{j=1}^J Y_j (t)$ . Let $S_0$ denote the (unknown) time at which the initial individual enters the population, with $\textbf{Y} (S_0)$ denoting the initial state of the population at time $S_0$ . Then $\mathbb{P} (\textbf{Y} (S_0) = \textbf{e}_j) = \chi_j$ ( $j=1,2,\ldots, J$ ), where $\textbf{e}_j$ is a vector of length J in which the jth element is equal to 1 and all other elements are 0. For $t \gt S_0$ , the phase-type branching process satisfies

(2.3) \begin{align} \mathbb{P}( \textbf{Y} (t+h) = \textbf{y} + \textbf{v} | \textbf{Y} (t) = \textbf{y}) = \left\{ \begin{array}{l@{\quad}l}\left[ \sum_{l=1}^J y_l \right] \beta_t \chi_j h +o(h), & \textbf{v} = \textbf{e}_j, \\[5pt] y_j \gamma_j h +o(h), & \textbf{v} = - \textbf{e}_j, \\[3pt] y_l \lambda_{lj} h +o(h), & \textbf{v} = \textbf{e}_j - \textbf{e}_l \; (l \neq j), \\[5pt] 1 - \sum_{l=1}^J y_l (\beta_t - \lambda_{ll}) h + o(h), & \textbf{v} = \textbf{0}, \end{array} \right. \end{align}

where all other events occur with probability o(h). The events in (2.3) correspond to the birth of a type-j individual, the death of a type-j individual, the transition of an individual from type l to type j, and nothing happening.

Suppose that there exists $\alpha \gt 0$ such that $\beta_t =\alpha$ for all $t \in \mathbb{R}$ . Then we have a time-homogeneous branching process where individuals have i.i.d. lifetimes distributed according to L and reproduce during their lifetime at the points of a homogeneous Poisson point process with rate $\alpha$ . Moreover, if there exists ${\boldsymbol{\epsilon}} =(\epsilon_1, \epsilon_2, \ldots, \epsilon_J)$ such that $\textbf{d}_t = {\boldsymbol{\epsilon}}$ , for all $t \in \mathbb{R}$ , then the observation process of the branching process is also time-homogeneous.

Similar to [Reference Ball and Neal2], the motivation for the phase-type branching process model is as an approximation for SIR epidemic models. Consider a SIR epidemic in a population of size N with one initial infective, with infection rate at time t given by $\alpha_t$ (that is, we allow for a time-inhomogeneous infection rate), and with individuals having infectious periods that are i.i.d. according to L. At time t, if we can estimate the proportion of the population that is susceptible, $s_t$ , the dynamics of the SIR epidemic can be approximated by a phase-type branching process with birth rate $\beta_t = \alpha_t s_t$ , the rate at which infectious contacts are made with susceptible individuals. The assumption that, prior to the first detected death, the birth (infection) rate and detection probabilities are constant is not unreasonable, at least from an epidemic perspective. For example, until a disease is detected it is unlikely that any control measures will be put in place. In the supplementary material we discuss the scenario where the time at which the initial individual is born (infected) is known and results similar to those obtained in Theorem 1 can be derived without restrictions on $\beta_t$ or $\textbf{d}_t$ $(t \lt 0)$ .

3. Main results

3.1. Introduction

In this section, we present the main results of the paper. In Theorem 1, presented in Section 3.4, we derive the distribution of the number of individuals of each type immediately following the kth detected death for $k=1,2,\ldots$ . The distribution obtained in Theorem 1 is a mixture of $(k-1)!$ components, so is impractical for calculating the distribution of the number of individuals of each type for moderate k. However, it is informative about the distribution of the total number of individuals alive immediately following the kth detected death, which can be expressed as sums of zero-modified geometric random variables; see Definition 1 below.

Definition 1. Suppose that X is a non-negative discrete random variable with probability mass function

\begin{align*}\mathbb{P} (X=0) & =a, \\[3pt] \mathbb{P} (X=x) & = (1-a) (1-b)^{x-1} b \qquad (x=1,2,\ldots),\end{align*}

for some $0 \leq a,b \leq 1$ . Then we say that X has a zero-modified geometric distribution.

In Section 3.2 we present an overview of the exploration process; technical details of, and results for, the exploration process are presented in Section 4. This is followed in Section 3.3 by the introduction of the key notation required to state Theorem 1. The notation is defined in a similar manner to that of [Reference Ball and Neal2, Section 3], to allow for comparisons where appropriate. In Section 3.5 we provide an approximation for the distribution of the number of individuals of each type immediately following the kth detected death which consists of only k components.

3.2. Overview of exploration process

A key component in understanding the dynamics of a branching process through partial observation of its death process is to study the behaviour of the branching process between two given time points t and $t + \tau$ $(t \in \mathbb{R}, \tau \gt 0)$ . In particular, we consider the cases where (i) there are no detected deaths in the interval $(t, t+ \tau]$ , and (ii) the first detected death after time t is at time $t+ \tau$ . In this section we focus on (i) and use an approach similar to that of [Reference Lambert and Trapman6, Section 2] to explore the branching process between two time points from a single progenitor. Note that if we have multiple individuals alive at time t then the branching processes evolving from each individual are independent. In Section 4 we present a more technical description of the exploration process, along with the methodology for calculating the key quantities required for Theorem 1.

Consider a single progenitor of type j, say, at time t. For $\tau \gt 0$ , we start with the forward exploration process from time t to time $t+\tau$ . Specifically, we are interested in the probability of the joint event that there is at least one individual alive at time $t+ \tau$ and there are no detected deaths in the interval $(t, t+\tau]$ , and also in the probability of the joint event that there are no individuals alive at time $t+ \tau$ and no detected deaths in the interval $(t, t+\tau]$ . The forward exploration process is as follows. If the progenitor is still alive at time $t+\tau$ we identify the type of the progenitor at time $t +\tau$ . If the progenitor dies, undetected, at time $t + u$ say, we then explore back in time through the progenitor’s life from time $t+u$ to reveal offspring. For each offspring revealed, we check whether they or one of their descendants is alive at time $t +\tau$ . Otherwise we continue to explore the progenitor’s life history to reveal any other offspring born earlier in time and study their descendants. The exploration ends in one of three ways; (i) an individual alive at time $t+\tau$ , whose type is identified, (ii) the exploration process being exhausted with no individuals alive at time $t+\tau$ , or (iii) a death being detected. For $l=1,2,\ldots,J$ , let $p_{jl} (t, 0, \tau)$ be the probability that the forward exploration process results in an individual of type l alive at time $t+\tau$ with no detected deaths along the way, given that we start from a progenitor of type j. Similarly, let $q_j (t, 0, \tau)$ be the probability of the branching process going extinct, with no detected deaths, by time $t+\tau$ given a progenitor of type j. For $0 \lt u \lt \tau$ , we give definitions of $p_{jl} (t, u, \tau)$ and $q_j (t, u, \tau)$ in (4.1) and (4.2), respectively, in Section 4.2.

In Figure 1A, we present a successful realisation of the forward exploration process which results in an individual alive at time $t + \tau$ with no detected deaths. In the event of the forward exploration process being successful, there is an unexplored path from time t to $t+\tau$ , consisting of a single individual alive at each timepoint, along which births can occur. The set of individuals comprising the path may consist of only the progenitor or of the progenitor and some of their descendants. The path presented in Figure 1A consists of the progenitor’s lifetime from t to $t+v$ and the lifetime of a child of the progenitor from $t+v$ to $t+\tau$ . Given the existence of an unexplored path from time t to $t+\tau$ , we initiate the backward exploration process, which consists of successive i.i.d. cycles until we encounter a failure. The backward exploration process takes the unexplored path from time t to $t+\tau$ and starting at time $t+\tau$ explores back in time to reveal the births of individuals. Each time we reveal an individual we check whether or not they, or one of their descendants, is alive at time $t + \tau$ , and whether there are any detected deaths by time $t+\tau$ of the revealed individual or their offspring. If we reveal an individual alive at time $t+ \tau$ , without encountering any detected deaths, then the current cycle of the exploration process is over and has been successful. An example of the outcome of a successful cycle in the backward exploration process is given in Figure 1B. A cycle fails if either it reveals a death in the interval $(t, t+\tau]$ or the exploration process hits time t without revealing any additional individuals alive at time $t+\tau$ . For $l=1,2,\ldots,J$ , let $\psi_l (t;\,\tau)$ be the probability that a cycle is successful and results in an individual of type l alive at time $t+ \tau$ . Similarly, let $\zeta (t;\,\tau)$ be the probability that a cycle results in failure, with the exploration process hitting time t without revealing any additional individuals alive at time $t+\tau$ . A formal construction of $\psi_l (t;\,\tau)$ and $\zeta (t;\,\tau)$ is given in Section 4.2; see (4.5) and (4.6), respectively. An example of a realisation of the full exploration process with no detected deaths is given in Figure 1C.

Figure 1. Exploration process from time t to $t+ \tau$ in three stages, A–C, with one progenitor. Horizontal dotted lines denote start (t) and end $(t+\tau)$ points. Vertical lines denote lifetimes of individuals, with black lines denoting explored lifetimes (births revealed) and grey lines denoting unexplored lifetimes. Black squares denote births, with horizontal dashed lines identifying parent–child. Open circles denote undetected deaths. Note that individuals are revealed in the exploration process from left to right.

3.3. Key notation

The description of the exploration process points towards the role of zero-modified geometric random variables in the branching process. Conditional on no detected deaths, the forward exploration process reveals whether there is no individual or at least one individual alive at time $t+\tau$ . If there is at least one individual alive we consider the backward process which consists of i.i.d. cycles with binary outcomes (success means an additional individual added to those alive in the branching process at time $t+ \tau$ ; failure means there are no more individuals to add to the branching process at time $t+ \tau$ ). Therefore we start our summary of key notation with a simple yet useful lemma concerning zero-modified geometric random variables.

Lemma 1. For $0 \lt q \lt 1$ and $A \geq -(1-q)$ , a random variable X with probability generating function

\begin{align}\mathbb{E} \left[s^X \right] = \frac{1 + As}{1+A} \times \frac{q}{1-(1-q) s} \left(0 \leq s \lt \frac{1}{1-q} \right) \nonumber\end{align}

is a zero-modified geometric random variable with probability mass function given by $\mathbb{P} (X=0) = q/(1+A)$ and $\mathbb{P} (X=k) =[1- q/(1+A)] (1-q)^{k-1} q$ $(k=1,2,\ldots)$ .

Proof. The result follows trivially from properties of probability generating functions.

In Lemma 1, for $A =0$ , $A \lt 0$ , and $A \gt 0$ , X corresponds to a $\text{Geom} (q)$ random variable, a mixture of a $\text{Geom} (q)$ random variable and a point mass at 0, and a sum of a $\text{Geom} (q)$ and an independent Bernoulli random variable, $\text{Bin} (1,A/(1+A))$ , respectively, where $G \sim \text{Geom} (q)$ has probability mass function $\mathbb{P} (G=k) = (1-q)^k q$ ( $k=0,1,\ldots$ ).

For $k=2,3,\ldots$ , let $T_k$ denote the inter-arrival time from the $(k-1)$ th to the kth detected death, with the convention that $T_k = \infty$ if fewer than k detected deaths occur. Also, let $S_k = \sum_{j=2}^k T_j$ denote the time of the kth detected death, with $S_1=0$ . Let $\textbf{T}_{2:k} = (T_2, T_3, \ldots, T_k)$ . Let $t_k$ and $s_k = \sum_{j=2}^k t_j$ $(k=2,3, \ldots)$ denote the observed inter-arrival time from the $(k-1)$ th to the kth detected death and the time of the kth detected death, respectively. For $k=1,2,\ldots$ and $j=1,2,\ldots,J$ , let $X_k^j$ denote the number of individuals alive of type j immediately after the kth detected death, with $X^\ast_k = \sum_{j=1}^J X_k^j$ denoting the total number of individuals alive. Let $\textbf{X}_k = (X_k^1, X_k^2, \ldots, X_k^J)$ . We observe that, for $k=2,3,\ldots$ ,

\begin{align}\{ \textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} \} = \{ \textbf{Y} (s_k) | \textbf{T}_{2:k} = \textbf{t}_{2:k} \}, \nonumber\end{align}

with $\textbf{X}_1 = \textbf{Y} (0)$ .

As noted in Section 2, we assume that prior to the first detected death the branching process (and its observation process) is time-homogeneous with birth rate $\alpha_1$ and death detection probabilities ${\boldsymbol{\epsilon}}$ . For $t \geq 0$ , let $\bar{\psi}_j (t) = \psi_j ({-}t;\,t)$ , where $\psi_j ({\cdot}; {\cdot})$ , defined in Section 3.2, is the probability that the backward exploration process is successful and reveals an individual of type j. Let $\bar{\psi} (t) = \sum_{j=1}^J \bar{\psi}_j (t)$ and set $\pi_0 = 1- \bar{\psi} (\infty)$ . For $j=1,2,\ldots, J$ , let $\eta_j^0 = \bar{\psi}_j (\infty)/\bar{\psi} (\infty)$ , the probability that an individual alive at the first detected death is of type j. The distribution of $\textbf{X}_1$ is defined in terms of $\pi_0$ and ${\boldsymbol{\eta}}^0 = (\eta_1^0, \eta_2^0, \ldots, \eta_J^0 )$ , and in Lemma 6 we show that

\begin{align} X_1^\ast &\sim \text{Geom} (\pi_0), \nonumber \\[3pt] \textbf{X}_1 |X_1^\ast =x &\sim \text{Multinomial} (x, {\boldsymbol{\eta}}^0). \nonumber \end{align}

For $n \in \mathbb{N}$ and $\textbf{q} \in [0,1]^J$ with $\sum_{i=1}^J q_i =1$ , $\textbf{Y} \sim \text{Multinomial} (n, \textbf{q})$ denotes a multinomial distribution with J categories and n trials, having probability mass function

\begin{align} \mathbb{P} (\textbf{Y} = \textbf{y}) = \frac{n!}{\prod_{j=1}^J y_j!} \prod_{j=1}^J q_j^{y_j} \left(y_j \geq 0 \; (j=1,2,\ldots,J);\; \sum_{j=1}^J y_j = n \right). \nonumber\end{align}

The distribution of the number of individuals alive of each type at time $t \geq 0$ is determined by $\pi_t$ and ${\boldsymbol{\eta}}^t$ , which are updated from the initial conditions $\pi_0$ and ${\boldsymbol{\eta}}^0$ as follows. Let $\psi (t;\,\tau) = \sum_{j=1}^J \psi_j (t;\,\tau)$ , and recall that $\zeta (t;\,\tau)$ is the probability that a cycle of the backward exploration process reveals no further individuals in the branching process at time $t+\tau$ . For the forward exploration process (see Section 3.2), from a type-j progenitor, the probability of having at least one individual alive at time $t +\tau$ and the individual being of type l is $p_{jl} (t, 0, \tau)$ , and the probability of having no individuals alive at time $t+\tau$ with no detected deaths is $q_j (t, 0, \tau)$ . For $t, \tau \geq 0$ , we set

(3.1) \begin{align} \pi_{t+\tau}&= 1 - \psi (t;\,\tau) - \frac{(1-\pi_t) \zeta (t;\,\tau) \sum_{i=1}^J \sum_{j=1}^J \eta_i^t p_{ij} (t, 0, \tau)}{1- (1- \pi_t) \sum_{i=1}^J \eta_i^t q_i (t, 0, \tau)}\end{align}

(cf. [Reference Ball and Neal2, (3.8) and (3.25)]), and for $j=1,2,\ldots, J$ ,

(3.2) \begin{align} \eta_j^{t+\tau} &= \frac{\psi_j (t;\,\tau)}{1-\pi_{t+\tau}} + \frac{(1-\pi_t) \zeta (t;\,\tau) \sum_{i=1}^J \eta_i^t p_{ij} (t, 0, \tau)}{[ 1-\pi_{t+\tau}] \left[ 1- (1- \pi_t) \sum_{i=1}^J \eta_i^t q_i (t, 0, \tau)\right]} \\[-24pt] \nonumber \end{align}
(3.3) \begin{align} &= \frac{\psi_j (t;\,\tau) + \phi_j (t;\,\tau)}{1-\pi_{t+\tau}}, \qquad \mbox{say}.\end{align}

The distribution of $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ is given in Theorem 1 in terms of random variables $\textbf{W} (t, \textbf{a})$ which are now defined. For $\textbf{a}, \textbf{b} \in \mathbb{R}^J$ , $\textbf{a} \gt \textbf{b}$ ( $\textbf{a} \geq \textbf{b}$ ) if and only if $a_j \gt b_j$ $(a_j \geq b_j)$ $(j=1,2,\ldots,J)$ . For $t \geq 0$ and $\textbf{a} > -(1-\pi_t) {\boldsymbol{\eta}}^t$ , let $\textbf{W} (t, \textbf{a}) = (W_1 (t, \textbf{a}), W_2 (t, \textbf{a}), \ldots , W_J (t, \textbf{a})) $ denote a J-dimensional random variable with, for ${\boldsymbol{\theta}} \in [0,1]^J$ , probability generating function

(3.4) \begin{align} \varphi ({\boldsymbol{\theta}};\,t, \textbf{a}) & = \mathbb{E} \left[ \prod_{j=1}^J \theta_j^{W_j (t,\textbf{a})} \right] \nonumber \\[3pt] & = \frac{1 + \sum_{j=1}^J a_j \theta_j}{1 + \sum_{j=1}^J a_j} \times \frac{\pi_t}{1- (1-\pi_t) \sum_{j=1}^J \eta_j^t \theta_j}.\end{align}

Let $W_\ast (t, \textbf{a}) = \sum_{j=1}^J W_j (t, \textbf{a})$ , the sum of the components of $\textbf{W} (t, \textbf{a})$ . Then it follows immediately from (3.4) that, for $0 \leq \vartheta \leq 1$ ,

\begin{align} \mathbb{E} \left[ \vartheta^{W_\ast (t,\textbf{a})} \right]& = \frac{1 + \vartheta \sum_{j=1}^J a_j }{1 + \sum_{j=1}^J a_j} \times \frac{\pi_t}{1- (1-\pi_t) \vartheta}, \nonumber\end{align}

and since $\sum_{j=1}^J a_j > \sum_{j=1}^J \{ - (1- \pi_t) \eta_j^t\} = - (1-\pi_t)$ , by Lemma 1 $W_\ast (t,\textbf{a})$ is a zero-modified geometric random variable. Throughout the following we assume that, for $t \geq 0$ , $\textbf{a} > -(1-\pi_t) {\boldsymbol{\eta}}^t$ .

In the special case $\textbf{a} = \textbf{0}$ , it follows immediately that

(3.5) \begin{align} W_\ast (t,\textbf{0}) \sim \text{Geom} (\pi_t), \\[-24pt] \nonumber \end{align}
(3.6) \begin{align} \textbf{W} (t, \textbf{0}) |W_\ast (t,\textbf{0}) =n &\sim \text{Multinomial} (n, {\boldsymbol{\eta}}^t). \end{align}

Hence, $\textbf{X}_1 \stackrel{D}{=} \textbf{W} (0, \textbf{0})$ .

In determining the distribution of $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ from the $\{ \textbf{W} (t, \textbf{a}) \}$ random variables, we need to update the values of $\textbf{a}$ in an iterative manner. The key relationships are defined in Lemma 2.

Lemma 2. For $t, \tau \geq 0$ and $\textbf{a} \in \mathbb{R}^J$ , let $\textbf{b} (t, \tau ;\,\textbf{a}), \textbf{c} (t, \tau ;\,\textbf{a}) \in \mathbb{R}^J$ satisfy, for $i=1,2,\ldots, J$ ,

(3.7) \begin{align} b_i (t, \tau ;\,\textbf{a}) &= \frac{- \psi_i (t;\,\tau) + \sum_{h=1}^J a_h \left[ \zeta (t;\,\tau) p_{hi} (t,0,\tau) - \psi_i (t;\,\tau) q_h (t,0, \tau) \right]}{1 + \sum_{h=1}^J a_h q_h (t,0, \tau)}\end{align}

and

(3.8) \begin{align} c_i (t, \tau ;\,\textbf{a}) &= \frac{(1- \pi_{t + \tau}) \sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ b_i (t, \tau ;\,\textbf{a}) \eta_j^{t + \tau} - b_j (t, \tau ;\,\textbf{a}) \eta_i^{t + \tau}\right]}{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[(1- \pi_{t + \tau}) \eta_j^{t+\tau} + b_j (t, \tau ;\,\textbf{a}) \right]}.\end{align}

Then provided $\textbf{a} > -(1-\pi_t) {\boldsymbol{\eta}}^t$ , we have

(3.9) \begin{align} \textbf{b} (t, \tau ;\,\textbf{a}), \textbf{c} (t, \tau ;\,\textbf{a}) > -(1-\pi_{t+\tau}) {\boldsymbol{\eta}}^{t+\tau}.\end{align}

Proof. The proof of (3.9) is given in Appendix A.

The coefficients $b_i (t, \tau ;\,\textbf{a})$ and $c_i (t, \tau ;\,\textbf{a})$ defined in (3.7) and (3.8), respectively, arise from the two cases noted in Section 3.2: (i) there are no detected deaths in the interval $(t, t+ \tau]$ , and (ii) the first detected death after time t is at time $t+ \tau$ .

For $t \in \mathbb{R}$ , $\tau \geq 0$ , and $\textbf{a} \in \mathbb{R}^J$ , let $\tilde{E}_{t,\tau}^{\textbf{a}}$ be the event that there are no detected deaths in the interval $(t, t+ \tau]$ given that the number of individuals of each type in the branching process at time t is distributed according to $\textbf{W} (t, \textbf{a})$ . Similarly, let $\tilde{D}_{t,\tau}^\textbf{a}$ be the event that the first detected death after time t is at time $t+\tau$ , given that the number of individuals of each type in the branching process at time t is distributed according to $\textbf{W} (t, \textbf{a})$ . Let

(3.10) \begin{align} U_E (t, \tau;\,\textbf{a}) & = \mathbb{P} \!\left(\tilde{E}_{t,\tau}^\textbf{a}\right), \\[-24pt] \nonumber\end{align}
(3.11) \begin{align} U_D (t, \tau;\,\textbf{a}) & = - \frac{\partial \;}{\partial \tau} U_E (t, \tau;\,\textbf{a}).\end{align}

Hence, $U_D (t, \tau;\,\textbf{a})$ is the probability density function for the time until the next detected death after time t. We obtain expressions for $U_E (t, \tau;\,\textbf{a})$ and $U_D (t, \tau;\,\textbf{a})$ in (5.17) and (5.27), respectively.

3.4. Statement and discussion of Theorem 1

We are now in position to state Theorem 1.

Theorem 1 (a) For $k=1$ , $\textbf{X}_1 \stackrel{D}{=} \textbf{W} (0, \textbf{0})$ .

(b) For $k=1,2,\ldots$ and $m=1,2,\ldots, M_k (=(k-1)!)$ , suppose that

(3.12) \begin{align} \left\{ \textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} \right\} \stackrel{D}{=} \sum_{l=1}^k \textbf{W}_{ml}\big (s_k, \textbf{a}_{ml}^k \big)\end{align}

with probability $w_m^k$ , where $\{\textbf{W}_{ml} (s_k, \textbf{a}_{ml}^k)\}$ are mutually independent and $\textbf{a}_{ml}^k >-(1-\pi_{s_k}) {\boldsymbol{\eta}}^{s_k}$ . Then

(3.13) \begin{align} \left\{ \textbf{X}_{k+1} | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \right\} \stackrel{D}{=} \sum_{l=1}^{k+1} \textbf{W}_{vl} \big(s_{k+1}, \textbf{a}_{vl}^{k+1}\big)\end{align}

with probability $w_v^{k+1}$ $(v=1,2,\ldots,M_{k+1} (=k!))$ , where, for $m=1,2,\ldots, M_k$ and $i=1,2,\ldots,k$ ,

(3.14) \begin{align} w_{(m-1)k+i}^{k+1} & = \frac{w_m^k U_D \big(s_k, t_{k+1};\, \textbf{a}_{mi}^k\big) \prod_{l \neq i} U_E \big(s_k, t_{k+1};\, \textbf{a}_{ml}^k \big)}{\sum_{u=1}^{M_k} \sum_{j=1}^k w_u^k U_D \big(s_k, t_{k+1};\, \textbf{a}_{uj}^k \big) \prod_{l \neq j} U_E \big(s_k, t_{k+1};\, \textbf{a}_{ul}^k \big)}, \\[-8pt] \nonumber\end{align}
(3.15) \begin{align} \textbf{a}_{(mk+i)i}^{k+1} &= \textbf{c}_{mi}^k, \qquad \textbf{a}_{(mk+i)j}^{k+1} = \textbf{b}_{mj}^k \; \; (j \neq i), \qquad \textbf{a}_{(mk+i)(k+1)}^{k+1} = \textbf{0}. \\[6pt] \nonumber\end{align}

Here we employ the shorthand notation $\textbf{b}_{ml}^k \big(= \textbf{b} \big(s_k, t_{k+1} ;\, \textbf{a}_{ml}^k \big)\big)$ and $\textbf{c}_{ml}^k \big(= \textbf{c} \big(s_k, t_{k+1};\, \textbf{a}_{ml}^k \big)\big)$ . Furthermore, $\big\{\textbf{W}_{vl} \big(s_{k+1}, \textbf{a}_{vl}^{k+1}\big) \big\}$ are mutually independent and $\textbf{a}_{vl}^{k+1} \gt -\big(1-\pi_{s_{k+1}}\big) {\boldsymbol{\eta}}^{s_{k+1}}$ .

For $0 < \tau < t_{k+1}$ , it is straightforward to show that if $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ satisfies (3.12), then, for $m=1,2,\ldots,M_k$ ,

\begin{align} \left\{ \textbf{Y} (s_k + \tau) | \textbf{T}_{2:k} = \textbf{t}_{2:k} \right\} \stackrel{D}{=} \sum_{l=1}^{k} \textbf{W}_{ml} \big(s_k + \tau, \textbf{b} \big(s_k, \tau;\, \textbf{a}_{ml}^k \big)\big) \nonumber\end{align}

with probability

\begin{align} \tilde{w}_m^k = \frac{w_m^k \prod_{l=1}^k U_E \big(s_k, \tau;\,\textbf{a}_{ml}^k \big)}{\sum_{u=1}^{M_k} w_u^k \prod_{l=1}^k U_E \big(s_k, \tau;\,\textbf{a}_{ul}^k \big)}. \nonumber\end{align}

An immediate consequence of Theorem 1 is that $X_k^\ast | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ is the sum of k independent zero-modified geometric random variables. If the components of the $\{ \textbf{a}_{ml}^k\}$ are all less than or equal to 0, we can instead express $X_k^\ast | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ as a random sum of $R_k$ i.i.d. $\text{Geom} (\pi_{s_k})$ random variables, where $R_k$ has support on $\{1,2,\ldots, k\}$ .

Corollary 1. For $k=1,2,\ldots$ , suppose that the distribution of $X_k |\textbf{T}_{2:k} = \textbf{t}_{2:k}$ satisfies the mixture distribution given in Theorem 1(b) with, for all $l=1,2,\ldots, k$ ,

(3.16) \begin{align} - \big(1-\pi_{s_k}\big) {\boldsymbol{\eta}}^{s_k} \lt \textbf{a}_{ml}^k \leq \textbf{0}.\end{align}

For $m=1,2,\ldots,M_k$ and $l=1,2,\ldots,k$ , let

\begin{align} A_{ml}^k = \sum_{i=1}^J a_{mli}^k \qquad \mbox{and} \qquad Q_{ml}^k = \frac{1-\pi_{s_k} + A_{ml}^k}{\big(1-\pi_{s_k} \big)\big(1+ A_{ml}^k \big)}, \nonumber\end{align}

so $- (1-\pi_{s_k}) <A_{ml}^k \leq 0$ and $0 \leq Q_{ml}^k \leq 1$ .

Let $G_1 (s_k), G_2 (s_k), \ldots, G_k (s_k)$ be i.i.d. according to $\text{Geom} (\pi_{s_k})$ . Then

\begin{align} X_k^\ast |\textbf{T}_{2:k} = \textbf{t}_{2:k} \sim \sum_{i=1}^{R_k^\ast} G_i (s_k) = \text{NegBin} \big(R_k^\ast, \pi_{s_k}\big), \nonumber\end{align}

where $R_k^\ast$ has support $\{1,2,\ldots,k\}$ , and for $i=1,2,\ldots,k$ , $\mathcal{V}_{k,i}$ is the set of vectors of length k consisting of i ones and $k-i$ zeros, with

(3.17) \begin{align} \mathbb{P} \big(R_k^\ast =i \big) = \sum_{m=1}^{M_k} \omega_m^k \sum_{\textbf{v} \in \mathcal{V}_{k,i}} \prod_{l=1}^k \big(Q_{ml}^k \big)^{v_l} \big[1-Q_{ml}^k \big]^{1-v_l}.\end{align}

Proof. From Lemma 1 it is straightforward to show that, for $\textbf{a}_{ml}^k$ satisfying (3.16),

(3.18) \begin{align} W^\ast (s_k;\, \textbf{a}_{ml}^k) \sim \left\{ \begin{array}{l@{\quad}l} \text{Geom} (\pi_{s_k}) \qquad & \mbox{with probability } Q_{ml}^k, \\[3pt] 0 & \mbox{with probability } 1-Q_{ml}^k. \end{array} \right.\end{align}

From (3.15), for $m=1,2,\ldots, k$ , $\textbf{a}_{mk}^k = \textbf{0}$ , so $Q_{mk}^k =1$ and $\mathbb{P} \big(R_k^\ast =0 \big)=0$ . The corollary then follows from (3.18).

Corollary 1 is similar to [Reference Ball and Neal2, Theorems 3.1 and 3.2] in deriving the total size of the branching process immediately after the kth death as a mixture of negative binomial distributions. The main difference between Corollary 1 and the theorems in [Reference Ball and Neal2] is that the probability mass function of $R_k^\ast$ given by (3.17) is computationally intensive to compute for moderate k, whereas, in [Reference Ball and Neal2], $R_k^\ast$ is closely related to a binomial distribution, which makes computing its probability mass function much quicker. These observations motivate the following derivation of an approximation of $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ , which provides an approximation for (3.17) and a multinomial approximation for the number of individuals of each type.

3.5. Approximation to the distribution of $\textbf{X}_k |\textbf{T}_{2:k} = \textbf{t}_{2:k}$

This section is structured as follows. In Section 3.5.1 we present the additional notation necessary for the approximation. We state the approximation in Section 3.5.2, with the justification for the approximation given in Section 3.5.3.

3.5.1. Notation.

In order to construct the approximation we require the following additional notation. For $t, \tau \geq 0$ , let $\textbf{C}_{t;\tau} \big(= \big(C_{t;\tau}^1, C_{t;\tau}^2, \ldots, C_{t;\tau}^J \big)\big)= \textbf{c} (t, \tau;\,0)$ . Setting $\textbf{a} = \textbf{0}$ in Lemma 2, from (3.7) we obtain $b_i (t, \tau;\,\textbf{0}) = - \psi_i (t;\,\tau)$ $(i=1,2,\ldots, J)$ . It then follows from (3.8) and (3.3) that

(3.19) \begin{align}C_{t;\tau}^i = \frac{(1 - \pi_{t+\tau}) \sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ \psi_j (t;\,\tau) \eta_i^{t+\tau} - \psi_i (t;\,\tau) \eta_j^{t+\tau} \right]}{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \phi_j (t;\,\tau)}. \end{align}

Hence,

\begin{align}C_{t;\tau}^\ast = \sum_{i=1}^J C_{t;\tau}^i = \frac{(1 - \pi_{t+\tau}) \sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ \psi_j (t;\,\tau) - \psi (t;\,\tau) \eta_j^{t+\tau} \right]}{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \phi_j (t;\,\tau)}. \nonumber\end{align}

Let

(3.20) \begin{align} \tilde{h} (t;\,\tau) = 1 + \frac{\pi_{t+ \tau} C_{t:\tau}^\ast}{(1-\pi_{t+ \tau})(1+ C_{t:\tau}^\ast)} = \frac{1 -\pi_{t+ \tau} +C_{t:\tau}^\ast}{(1-\pi_{t+ \tau})(1+ C_{t:\tau}^\ast)}.\end{align}

Then, provided that $C_{t;\tau}^\ast$ is non-positive, we have from (3.9) that $- (1-\pi_{t+\tau}) < C_{t;\tau}^\ast \leq 0$ , and hence $0 \leq \tilde{h} (t;\,\tau) \leq 1$ is a valid probability.

It is difficult to make general statements concerning the form of $C_{t;\tau}^\ast $ , although in the fully detected, time-homogeneous case, where $\beta_t = \alpha$ and $\textbf{d}_t=\textbf{1}$ $(t \in \mathbb{R})$ , we can show that $C_{t;\tau}^\ast \leq 0$ when the lifetime L follows an Erlang distribution (i.e. $\text{Gamma} (J, J \mu)$ with $J \geq 2$ ) and $C_{t;\tau}^\ast \geq 0$ when L is a mixture of $J(\geq 2)$ exponential distributions. Moreover, in the fully detected, time-homogeneous case, it can be shown that $\{X_2^\ast | T_2 = \tau\}$ is stochastically decreasing in $\tau$ for an Erlang distribution and stochastically increasing in $\tau$ for a mixture of exponential distributions. The proofs are straightforward but somewhat tedious, and are therefore given in the supplementary material.

For $k = 1, 2, \ldots$ and $t, \tau \geq 0$ , let $\textbf{M}_k (t;\,\tau)$ be the $k \times k$ matrix with (i,j)th element

(3.21) \begin{align} \left[\textbf{M}_k (s_k;\,\tau) \right]_{i,j} = \left\{ \begin{array}{l@{\quad}l} i \binom{i-1}{j-1} \left\{ \frac{\pi_{s_k} \phi (s_k;\,t_{k+1})}{\pi_{s_{k+1}}(1- \pi_{s_{k+1}}) L (s_k, \tau) } \right\}^{j-1}\left\{ \frac{\pi_{s_k} \psi (s_k;\,t_{k+1})}{(1- \pi_{s_{k+1}})L (s_k, \tau) } \right\}^{i-j} & \mbox{for } j \leq i, \\[3pt] 0 & \mbox{otherwise}, \end{array} \right. \nonumber \\[3pt] \end{align}

where $\phi (s_k;\, t_{k+1}) = \sum_{j=1}^J \phi_j (s_k;\, t_{k+1}) $ and

(3.22) \begin{align} L (s_k, \tau) = 1- (1-\pi_{s_k}) \sum_{i=1}^J \eta_i^{s_k} q_i (s_k ,0, \tau).\end{align}

We define $ \textbf{M}_k^1 (t;\,\tau)$ and $ \textbf{M}_k^2 (t;\,\tau)$ to be $k \times (k+1)$ matrices constructed from $\textbf{M}_k (t;\,\tau)$ by the addition of a column of zeros before (after) $\textbf{M}_k (t;\,\tau)$ in $ \textbf{M}_k^1 (t;\,\tau)$ ( $ \textbf{M}_k^2 (t;\,\tau)$ ). The matrix $\textbf{M}_k (s_k;\,\tau)$ is of a similar form to the matrices defined in [Reference Ball and Neal2]. (3.9) and the equation between (3.26) and (3.27) in Theorem 3.2, and is used to derive an approximation for the probability mass function of $R_k^\ast$ .

For $t \geq 0$ and $\textbf{q} \in [0,1]^J$ such that $\sum_{i=1}^J q_i =1$ , let $\hat{\textbf{W}} (t;\,\textbf{q})$ be the J-dimensional random variable with

(3.23) \begin{align} \hat{W}_\ast (t;\,\textbf{q}) \left(= \sum_{j=1}^J \hat{W}_j (t;\,\textbf{q}) \right) &\sim \text{Geom} (\pi_t),\\[-24pt] \nonumber \end{align}
(3.24) \begin{align} \hat{\textbf{W}} (t;\,\textbf{q}) | \hat{W}_\ast (t;\,\textbf{q}) =n & \sim \text{Multinomial} (n, \textbf{q}).\end{align}

The relationship between the random variables $\hat{\textbf{W}} (t;\,\textbf{q})$ and $\textbf{W} (t;\,\textbf{a})$ is that, for $t \geq 0$ ,

(3.25) \begin{align} \hat{\textbf{W}} (t;\, {\boldsymbol{\eta}}^t) \stackrel{D}{=} \textbf{W} (t;\, \textbf{0}).\end{align}

Therefore it follows from Theorem 1(a) that $\textbf{X}_1 \stackrel{D}{=} \hat{\textbf{W}} (0;\, {\boldsymbol{\eta}}^0)$ , which is the starting point for the approximation presented in Section 3.5.2.

3.5.2. Statement of approximation.

Suppose that for all $t, \tau \geq 0$ , $C_{t;\; \tau}^i \leq 0$ $(i=1,2,\ldots,J)$ .

Set $\hat{\textbf{X}}_1 = \textbf{X}_1 \stackrel{D}{=} \textbf{W} (0;\, \textbf{0}) \equiv \hat{\textbf{W}} (0;\, {\boldsymbol{\eta}}^0)$ .

For $k \geq 2$ , we approximate $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} $ by $\hat{\textbf{X}}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ satisfying

(3.26) \begin{align} \left\{ \hat{\textbf{X}}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} \right\} \stackrel{D}{=} \sum_{i=1}^{\hat{R}_k} \hat{\textbf{W}}_i \big(s_k;\, \hat{{\boldsymbol{\zeta}}}^k \big),\end{align}

where $\hat{R}_k$ has support $\{1,2,\ldots, k\}$ with $\hat{R}_1 \equiv 1$ and probability mass function given by (3.27), $\hat{{\boldsymbol{\zeta}}}^k$ is given by (3.29) with $\sum_{j=1}^J \hat{\zeta}^k_j = 1$ , and $\hat{\textbf{W}}_1 (s_k;\,\hat{{\boldsymbol{\zeta}}}^k), \hat{\textbf{W}}_2 (s_k;\,\hat{{\boldsymbol{\zeta}}}^k), \ldots$ are i.i.d. according to $\hat{\textbf{W}} (t;\; \hat{{\boldsymbol{\zeta}}}^k)$ defined by (3.23) and (3.24).

Let $\hat{P}_k^m = \mathbb{P} (\hat{R}_k = m)$ $(k=1,2,\ldots; m=1,2,\ldots,k)$ and $\hat{\textbf{P}}_k = (\hat{P}_k^1, \hat{P}_k^2, \ldots, \hat{P}_k^k)$ . Then $\hat{\textbf{P}} = (1)$ and $\hat{\textbf{P}}_{k+1}$ can be obtained recursively using

(3.27) \begin{align} \hat{\textbf{P}}_{k+1} = \frac{1}{K_k} \hat{\textbf{P}}_k \left[ \tilde{h} (s_k;\,t_{k+1}) \textbf{M}_k^1 (s_k;\,t_{k+1}) + \big\{1-\tilde{h} (s_k;\,t_{k+1}) \big\} \textbf{M}_k^2(s_k;\,t_{k+1}) \right],\end{align}

where $\textbf{M}_k^1 (s_k;\,t_{k+1})$ and $\textbf{M}_k^2(s_k;\,t_{k+1})$ are defined after (3.21) and

(3.28) \begin{align} K_k &= \sum_{m=1}^k m \left[ \frac{\hat{\pi}_k (1- \psi (s_k;\,\tau))}{\hat{\pi}_{k+1}} \right]^{m-1}\hat{P}_k^m \nonumber \\[3pt] &= \hat{\textbf{P}}_k \left[ \tilde{h} (s_k;\,t_{k+1}) \textbf{M}_k^1 (s_k;\,t_{k+1}) + \big\{1-\tilde{h} (s_k;\,t_{k+1}) \big\} \textbf{M}_k^2(s_k;\,t_{k+1}) \right] {\cdot} \textbf{1}_k^\top.\end{align}

For $k =1,2,\ldots$ , let $\nu_k = \mathbb{E} [ \hat{R}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}]$ . Set ${\boldsymbol{\eta}}^{s_k} = \hat{{\boldsymbol{\zeta}}}^k$ and compute ${\boldsymbol{\psi}} (s_k;\,t_{k+1}) = (\psi_1 (s_k;\,t_{k+1}), \psi_2 (s_k;\,t_{k+1}), \ldots, \psi_J (s_k;\,t_{k+1}))$ and ${\boldsymbol{\eta}}^{s_{k+1}}$ using (3.2) with $t=s_k$ and $\tau =t_{k+1}$ . For $k=1,2,\ldots$ and $j=1,2,\ldots, J$ , set

(3.29) \begin{align} \hat{\zeta}^{k+1}_j &= (\nu_k+1) \eta_j^{s_{k+1}} \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}} - (\nu_k -1) \frac{\psi_j (s_k;\, t_{k+1})}{1- \psi(s_k;\,t_{k+1})} + \frac{C_{s_k; t_{k+1}}^j}{1 + C_{s_k;t_{k+1}}^\ast } \nonumber \\[3pt] & \qquad \times \left[(\nu_k+1) \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}} - (\nu_k -1) \frac{\psi (s_k;\, t_{k+1})}{1- \psi(s_k;\,t_{k+1})} + \frac{C_{s_k; t_{k+1}}^\ast}{1+ C_{s_k;t_{k+1}}^\ast } \right]^{-1}. \nonumber \\[3pt] \end{align}

For $j=1,2,\ldots,J$ , the marginal distribution of $\hat{X}_k^j $ , satisfies

(3.30) \begin{align} \hat{X}_k^j | \textbf{T}_{2:k} = \textbf{t}_{2:k} &\sim \text{NegBin} \left(\hat{R}_k, \frac{\pi_{s_k}}{\pi_{s_k} + \hat{\zeta}_k^j - \pi_{s_k} \hat{\zeta}_k^j} \right).\end{align}

3.5.3. Justification of approximation.

From Section 3.5.2, we note that $\hat{X}_k^\ast | \textbf{T}_{2:k} = \textbf{t}_{2:k} $ takes an almost identical form to the distribution of the number of individuals alive in a birth–death process immediately after the kth detected death given in [Reference Ball and Neal2, Theorem 3.2, (3.28)]. The only difference is that in the phase-type distribution case, $\hat{R}_k$ can take the value 1 for any k. The recursive equation for the probability mass function for $\hat{R}_k$ given in (3.27), and utilising (3.21), is similar to [Reference Ball and Neal2, Lemma 4.5, (4.17)] and is derived along similar lines.

The approximation in Section 3.5.2 works as follows. If $\{\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} \} \stackrel{D}{=} \{\hat{\textbf{X}}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} \}$ , then

$$ \big\{X_{k+1}^\ast | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \big\} \stackrel{D}{=} \big\{\hat{X}_{k+1}^\ast | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \big\}$$

with $\hat{\boldsymbol{\zeta}}^{k+1} = \Big(\hat{\zeta}^{k+1}_1, \hat{\zeta}^{k+1}_2, \ldots, \hat{\zeta}^{k+1}_{k+1} \Big)$ satisfying

\[ \mathbb{E} \big[ X_{k+1}^j | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \big] = \hat{\zeta}^{k+1}_j \mathbb{E} \big[ X_{k+1}^\ast| \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \big] \big( = \mathbb{E} \big[ \hat{X}_{k+1}^j | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \big] \big). \]

Details of the proof of the above are provided in Section 6.

4. Exploration process

4.1. Introduction

In this section we show how the key probabilities from the exploration process for Theorem 1 can be computed. We use an approach similar to that of [Reference Lambert and Trapman6, Section 2] to explore the branching process between two time points from a single progenitor. Note that the branching processes studied in [Reference Lambert and Trapman6] are referred to as splitting trees; see Geiger and Kersting [Reference Geiger and Kersting4]. There are minor modifications to the exploration process and analysis in [Reference Lambert and Trapman6, Section 2] to take account of the time-inhomogeneous birth rate and detection probability, differences in the detection process, and the phase-type lifetime distribution. The phase-type lifetime distribution enables us to obtain more explicit results than those presented in [Reference Lambert and Trapman6, Section 2].

In Section 4.2 we define in (4.1)–(4.4) the probabilities of the exploration process. These probabilities are expressed in terms of systems of ordinary differential equations (ODEs) in Lemma 3, which enable analytical solutions in some important special cases; see Sections 4.3 (all deaths detected) and 4.4 (a piecewise time-homogeneous branching process).

4.2. Exploration process equations

To obtain the key probabilities of the exploration process, it is helpful to use the jump chronological contour process (JCCP) of the branching process, as defined in [Reference Lambert and Trapman6], but with a time shift t in the starting time and time-dependent birth rate, $\beta_{t+u}$ . Let $\mathcal{X}^{(t,\tau)}$ denote the JCCP, started with one unique progenitor at time t, truncated up to height (time) $t+ \tau$ of the birth process. The JCCP is associated with the exploration process between times t and $t + \tau$ as follows. The process $\mathcal{X}^{(t,\tau)}$ starts at $t + (\tilde{L} \wedge \tau)$ , where $\tilde{L}$ is the residual lifetime, after time t, of the single progenitor. The JCCP has its own timescale and explores backward in time at rate 1 until it encounters a birth in the branching (exploration) process, at which point the JCCP jumps. If a jump occurs in the process $\mathcal{X}^{(t,\tau)}$ at position $t+v$ $(0 < v <\tau)$ , then the process jumps to $(t + v +L) \wedge (t+\tau)$ , where L is the length of the lifetime of the individual born at time $t+v$ . That is, the JCCP jumps to whichever occurs first, the end of the individual’s lifetime or time $t+\tau$ . The process $\mathcal{X}^{(t,\tau)}$ terminates when it reaches time t, corresponding to the exploration process being over, so $\mathcal{X}^{(t,\tau)}$ remains within the interval $[t,t+\tau]$ . The JCCP associated with the realisation of the branching process in Figure 1C is given in Figure 2.

Figure 2. The JCCP associated with the branching process in Figure 1C. The open circles denote potential detected deaths which in the given realisation of the branching process are all undetected.

For $x \in \mathbb{R}$ , let $\sigma_x = \sigma_{\{x\}}$ denote the first hitting time of x by $\mathcal{X}^{(t,\tau)}$ . Let $\sigma = \sigma_t \wedge \sigma_{t + \tau}$ , the first time the process hits either endpoint of the interval, corresponding to either the exploration process being over or an individual being identified as alive at time $t+\tau$ . Two useful quantities associated with the JCCP at time $\sigma$ are $D (\sigma)$ , the number of detected deaths in $[0, \sigma]$ , and $J (\sigma)$ , the type of the individual at time $\sigma$ , when $\sigma = \sigma_{t+\tau}$ .

We are now in position to define the probabilities of the exploration process required to obtain $p_{ij} (t,0,\tau)$ , $q_i (t, 0,\tau)$ , $\psi_j (t;\,\tau)$ , and $\zeta (t;\,\tau)$ as described in Section 3.2.

For $t \in \mathbb{R}$ , $0 \leq u \leq \tau$ , and $i,j =1,2, \ldots, J$ , let

(4.1) \begin{align} p_{ij} (t, u, \tau) = \mathbb{P}_{u,i} ( \sigma_{t +\tau} \lt \sigma_t, J (\sigma) = j, D (\sigma) =0) \end{align}

and

(4.2) \begin{align} q_i (t, u, \tau) = \mathbb{P}_{u,i} ( \sigma_t \lt \sigma_{t +\tau}, D (\sigma) =0), \end{align}

where $\mathbb{P}_{u,i}$ denotes the probability conditional upon $\tilde{L} \geq u$ and the progenitor being of type i at time $t+u$ . The equations (4.1) and (4.2) are associated with the forward exploration process, and the special case $u=0$ yields the $p_{ij} (t,0,\tau)$ and $q_i (t, 0,\tau)$ required for Theorem 1.

For $t \in \mathbb{R}$ , $0 \leq u \leq \tau$ , and $j =1,2, \ldots, J$ , let

(4.3) \begin{align} p_j (t, u, \tau) = \mathbb{P}_{u} \big( \sigma_{t +\tau} \lt \sigma_t, J (\sigma) = j, D (\sigma) =0 \big)\end{align}

and

(4.4) \begin{align} q(t, u, \tau) = \mathbb{P}_{u} \big( \sigma_t \lt \sigma_{t +\tau}, D (\sigma) =0 \big),\end{align}

where $\mathbb{P}_u$ denotes probability conditional upon $\tilde{L} =u$ and the death of the progenitor being unobserved.

In contrast to (4.1), the probabilities in (4.3) are obtained by starting at time $t+u$ and exploring backward in time for births which deliver an undetected pathway to an individual of type j alive at time $t+\tau$ . Since an individual’s type at birth does not depend on the type of their parent, we do not need to specify the type of the progenitor in (4.3) and (4.4). Then $\psi_j (t;\,\tau)$ $(j=1,2,\ldots, J)$ and $\zeta (t\,:\,\tau)$ can be defined as special cases of (4.3) and (4.4), respectively, with

(4.5) \begin{align} \psi_j (t;\,\tau) & = \lim_{u \uparrow \tau} p_j (t, u, \tau),\end{align}

the probability that a cycle of the backward exploration process is successful and produces an individual of type j at time $t + \tau$ , and

(4.6) \begin{align} \zeta(t;\,\tau) & = \lim_{u \uparrow \tau} q (t, u, \tau),\end{align}

the probability that a cycle of the backward exploration process reveals no further individuals alive at time $t+\tau$ with no detected deaths.

The main reason for giving the probabilities of the exploration process in the form of (4.1)–(4.4) is that we can express the probabilities via systems of ODEs. These ODEs can be solved exactly in terms of matrix exponentials in some special cases (see, for example, Section 4.4), or otherwise solved numerically in an efficient manner using an ODE solver.

Lemma 3. Fix $t \in \mathbb{R}$ and $\tau \gt 0$ .

For $j=1,2,\ldots,J$ , let $\textbf{p}_{{\cdot} j} (t, u, \tau) = (p_{1j} (t, u, \tau), p_{2j} (t, u, \tau), \ldots, p_{Jj} (t,u,\tau))$ .

Then $\{ (\textbf{p}_{{\cdot} j} (t, u, \tau) , p_j (t, u,\tau) ); 0 \leq u \leq \tau \}$ solves the system of $J+1$ ODEs with

(4.7) \begin{align} p_{ij}^\prime (t, u, \tau) &= - \sum_{l=1}^J \lambda_{il} p_{lj} (t, u, \tau) - \gamma_i (1-d_{t+u,i}) p_j (t, u, \tau),\\[-24pt] \nonumber \end{align}
(4.8) \begin{align} p_j^\prime (t, u, \tau)&= \beta_{t+u} \left\{ \sum_{l=1}^J \chi_l p_{lj} (t, u, \tau) - p_j (t, u, \tau) \right\},\end{align}

and boundary conditions $\textbf{p}_{{\cdot} j} (t,\tau,\tau) = \textbf{e}_j$ and $p_j (t, 0, \tau) =0$ .

Similarly, let $\textbf{q} (t, u, \tau) = (q_1 (t, u, \tau), q_2 (t, u, \tau), \ldots, q_J (t,u,\tau))$ .

Then $\{ (\textbf{q} (t, u, \tau) , q (t, u,\tau) );\ 0 \leq u \leq \tau \}$ solves the system of $J+1$ ODEs with

(4.9) \begin{align} q_i^\prime (t, u, \tau)&= - \sum_{l=1}^J \lambda_{il} q_l (t, u, \tau) - \gamma_i (1-d_{t+u,i}) q (t, u, \tau), \\[-24pt] \nonumber \end{align}
(4.10) \begin{align} q^\prime (t, u, \tau) & =\beta_{t+u} \left\{ \sum_{l=1}^J \chi_l q_l (t, u, \tau) - q (t, u, \tau) \right\},\end{align}

with boundary conditions $\textbf{q} (t, \tau, \tau) =\textbf{0}$ and $q(t,0, \tau) =1$ .

Proof. Fix $j =1,2, \ldots, J$ . The boundary conditions $\textbf{p}_{{\cdot} j} (t,\tau,\tau) = \textbf{e}_j$ and $p_j (t, 0, \tau) =0$ follow immediately from the construction of the JCCP.

For $i=1,2,\ldots, J$ , conditioning on the time the progenitor remains in state i, we have

(4.11) \begin{align} p_{ij} (t, u, \tau) &= \exp\!\big( \lambda_{ii} [\tau -u] \big)\delta_{ij} + \int_u^\tau - \lambda_{ii} \exp\!\big( \lambda_{ii} [z -u] \big) \nonumber \\[3pt] & \qquad \qquad \times \left\{ \sum_{l \neq i} \frac{\lambda_{il}}{-\lambda_{ii}} p_{lj} (t, z, \tau) + \frac{\gamma_i}{-\lambda_{ii}} (1- d_{t+z,i}) p_j (t, z, \tau) \right\} \, dz,\end{align}

where $\delta_{ij}$ denotes the Kronecker delta with $\delta_{ij} =1$ if $i=j$ and $\delta_{ij} =0$ otherwise. Differentiating (4.11) with respect to u and substituting in $p_{ij} (t, u, \tau)$ , we have that

(4.12) \begin{align} p_{ij}^\prime (t, u, \tau) &= - \lambda_{ii} \exp\!\big( \lambda_{ii} [\tau -u] \big)\delta_{ij} - \lambda_{ii} \int_u^\tau - \lambda_{ii} \exp\!\big( \lambda_{ii} [z -u] \big) \nonumber \\[3pt] & \qquad \qquad \times \left\{ \sum_{l \neq i} \frac{\lambda_{il}}{ - \lambda_{ii}} p_{lj} (t, z, \tau) + \frac{\gamma_i}{ - \lambda_{ii}} (1- d_{t+z,i}) p_j (t, z, \tau) \right\} \, dz \nonumber \\[3pt] & \qquad - \sum_{l \neq i} \lambda_{il} p_{lj} (t, u, \tau) - \gamma_i (1-d_{t+u,i}) p_j (t, u, \tau) \nonumber \\[3pt] &= - \sum_{l=1}^J \lambda_{il} p_{lj} (t, u, \tau) - \gamma_i (1-d_{t+u,i}) p_j (t, u, \tau), \end{align}

which agrees with (4.7).

Similarly, by conditioning on the time of the first birth when exploring back from u, we have

(4.13) \begin{align} p_j (t, u, \tau) = \int_0^u \exp\!\left({-}\int_z^u \beta_{t+s} \, ds \right) \beta_{t+z} \sum_{l=1}^J \chi_l p_{lj} (t, z , \tau) \, dz.\end{align}

Differentiating (4.13) with respect to u yields

\begin{align} p_j^\prime (t, u, \tau) &= \beta_{t+u} \sum_{l=1}^J \chi_l p_{lj} (t, u, \tau) - \int_0^u \beta_{t+u} \exp\!\left({-}\int_z^u \beta_{t+s} \, ds \right) \beta_{t+z} \sum_{l=1}^J \chi_l p_{lj} (t, z, \tau) \, dz \nonumber \\[3pt] &= \beta_{t+u} \left\{ \sum_{l=1}^J \chi_l p_{lj} (t, u, \tau) - p_j (t, u, \tau) \right\}, \nonumber\end{align}

which agrees with (4.8).

Turning to $\{ (\textbf{q} (t,u,\tau), q (t, u,\tau) );\,0 \leq u \leq \tau \}$ , we see that the boundary conditions $\textbf{q} (t, \tau, \tau) =0$ and $q(t,0, \tau) =1$ follow immediately from the construction of the JCCP.

For $i=1,2,\ldots, J$ , conditioning on the time the progenitor remains in state i, we have

\begin{align} q_i (t, u, \tau) & = \int_u^\tau - \lambda_{ii} \exp\!\big(\lambda_{ii} [z-u] \big) \nonumber \\[3pt] & \qquad \qquad \times \left\{ \sum_{l \neq i} \frac{\lambda_{il}}{-\lambda_{ii}} q_l (t,z, \tau) + \frac{\gamma_i}{-\lambda_{ii}} (1- d_{t+z,i})q (t, z, \tau) \right\} dz. \nonumber \end{align}

By an argument similar to the derivation of (4.12), it is straightforward to show that $q_i^\prime (t, u, \tau)$ satisfies (4.9). Also, we have that

\begin{align} q (t, u, \tau) & = \int_0^u \exp\!\left( - \int_z^u \beta_{t+s} \,ds \right) \beta_{t+z} \sum_{l=1}^J \chi_l q_l (t, z, \tau) dz, \nonumber \end{align}

from which it is straightforward to show that $q^\prime (t, u, \tau)$ satisfies (4.10), completing the proof of the lemma.

4.3. All deaths detected

The first special case we consider is the one in which all deaths are detected, $\textbf{d}_{t+u} =\textbf{1}$ $(0 \leq u \leq \tau)$ . In this case $p_{ij} (t,u,\tau) = P_{ij} (\tau-u)$ , $q_i (t,u,\tau) =0$ , and in (4.3) and (4.4) the conditioning event in $\mathbb{P}_u$ has probability 0. However, the exploration process is simpler if all deaths are detected, since any individual born in the process will either be alive at time $t+\tau$ or have been detected at their time of death. We can slightly modify the definitions of $\psi_j (t;\,\tau)$ and $\zeta (t;\,\tau)$ given in (4.5) and (4.6), respectively, and provide expressions for these quantities.

Let $\psi_j (t;\,\tau)$ be the probability that the backward exploration process from time $t+\tau$ to time t reveals a birth with the individual who is born being alive at time $t+\tau$ and of type j. Then, conditioning on the time of the birth and the type of individual at birth, we have by the theorem of total probability that

(4.14) \begin{align} \psi_j (t;\,\tau) & = \int_t^{t +\tau} \beta_u \exp\!\left({-}\int_u^{t+\tau} \beta_s \, ds \right) \left[ \sum_{i=1}^J \chi_i P_{ij} (t+\tau -u) \right] \, du. \end{align}

Similarly, let $\zeta (t;\,\tau)$ be the probability that the backward exploration process from time $t+\tau$ to time t reveals no births. Then we simply have that

(4.15) \begin{align} \zeta (t;\,\tau) & =\exp\!\left({-}\int_t^{t+\tau} \beta_s \, ds \right).\end{align}

4.4. Time-homogeneous equations

An important special case is when the branching process and its observation process are time-homogeneous on the interval $(t, t+\tau]$ with $\beta_{t+u} =\alpha$ and $\textbf{d}_{t+u} = {\boldsymbol{\epsilon}}$ $(0 \leq u \leq \tau)$ . In this case, explicit solutions to the systems of ODEs given in Lemma 3 exist, and since we assume that the branching process is time-homogeneous prior to the first detected death, we utilise the explicit solution in deriving the size of the branching process at the first detected death in Lemma 6.

For $\alpha, \tau > 0$ and $\textbf{0} \leq {\boldsymbol{\epsilon}} \leq \textbf{1}$ , let $\Omega \; (= \Omega (\alpha, {\boldsymbol{\epsilon}}))$ and $\Sigma \; (= \Sigma (\alpha, {\boldsymbol{\epsilon}}, \tau))$ denote the $(J+1) \times (J+1)$ matrices given by

(4.16) \begin{align} \Omega \; (= \Omega (\alpha, {\boldsymbol{\epsilon}})) &= \left(\begin{array}{l@{\quad}l} \Lambda & {\boldsymbol{\rho}}^T \\[3pt] - \alpha {\boldsymbol{\chi}} & \alpha \end{array}\right),\end{align}

and

(4.17) \begin{align} \Sigma \; (= \Sigma (\alpha, {\boldsymbol{\epsilon}}, \tau)) &= \exp\!\left[ \tau \Omega \right],\end{align}

where ${\boldsymbol{\rho}} = ([1-\epsilon_1]\gamma_1,[1-\epsilon_2] \gamma_2, \ldots,[1-\epsilon_J] \gamma_J)$ and ${\boldsymbol{\chi}} = (\chi_1, \chi_2, \ldots, \chi_J)$ are row vectors of length J. Note that for $j=1,2,\ldots, J$ , $\rho_j =[1-\epsilon_j] \gamma_j$ is the rate at which a type-j individual dies undetected. In Lemma 4, we show that $\textbf{p}_{{\cdot} j} (t,0,\tau)$ , $q_j (t,0,\tau)$ , $\psi_j (t;\,\tau)$ $(j=1,2,\ldots,J)$ , and $\zeta (t;\,\tau)$ can be obtained from $\Sigma$ given in (4.17).

Lemma 4. Fix $t \in \mathbb{R}$ and $\tau \gt 0$ . For a branching process with $\beta_{t+u} =\alpha$ and $\textbf{d}_{t+u} = {\boldsymbol{\epsilon}}$ $(0 \leq u \leq \tau)$ , we have that $\zeta (t;\,\tau) = \Sigma_{J+1,J+1}^{-1}$ , $\psi_j (t;\,\tau) = - \Sigma_{J+1,j}/\Sigma_{J+1,J+1}$ ,

(4.18) \begin{align} \textbf{p}_{{\cdot} j} (t,0,\tau) = \Sigma_{1:J,j} + \psi_j (t;\,\tau) \Sigma_{1:J,J+1}, \end{align}

and

(4.19) \begin{align} \textbf{q} (t,0,\tau) = \zeta (t;\,\tau) \Sigma_{1:J,J+1}. \end{align}

Proof. For $i,j=1,2,\ldots, J$ and $0 \leq u \leq \tau$ , let $\tilde{p}_{ij} (t,u,\tau) = p_{ij} (t,\tau -u, \tau)$ , $\tilde{p}_{j} (t,u,\tau) = p_{j} (t,\tau -u, \tau)$ , $\tilde{q}_j (t,u,\tau) = q_j (t,\tau -u, \tau)$ , and $\tilde{q} (t,u,\tau) = q (t,\tau -u, \tau)$ with $\tilde{\textbf{p}}_{{\cdot} j} (t, u, \tau) =\textbf{p}_{{\cdot} j} (t, \tau-u,\tau)$ and $\tilde{\textbf{q}} (t, u, \tau) =\textbf{q} (t, \tau-u, \tau)$ . Thus the tilde probabilities are time-reversed versions of the probabilities defined in (4.1)–(4.4), starting at time $t+ \tau$ when $u=0$ and finishing at time t when $u=\tau$ .

Using (4.7) and (4.8), it is straightforward to show that

(4.20) \begin{align} \tilde{p}_{ij}^\prime (t, u, \tau) &= - p_{ij}^\prime(t, \tau- u, \tau) = \sum_{l=1}^J \lambda_{il} \tilde{p}_{lj} (t, u, \tau) + [1-\epsilon_i]\gamma_i \tilde{p}_j (t, u, \tau), \\[-24pt] \nonumber \end{align}
(4.21) \begin{align}\tilde{p}_{j}^\prime (t, u, \tau) &= - p_{j}^\prime(t, \tau - u, \tau) = - \alpha \sum_{l=1}^J \chi_l \tilde{p}_{lj} (t, u, \tau) + \alpha \tilde{p}_j (t, u, \tau), \end{align}

with boundary conditions $\tilde{\textbf{p}}_{{\cdot} j} (t, 0, \tau) =\textbf{e}_j$ and $\tilde{p}_j (t, \tau, \tau) =0$ . Therefore, for $j=1,2,\ldots, J$ , the system of $J+1$ ODEs $\{ \tilde{\textbf{p}}_{{\cdot} j} (t,u,\tau) , \tilde{p}_j (t,u,\tau)\}$ has J initial conditions and one final boundary condition. Similarly, using (4.9) and (4.10), we have a system of $J+1$ ODEs satisfying

(4.22) \begin{align} \tilde{q}_i^\prime (t, u, \tau) &= - q_i^\prime(t, \tau-u, \tau) = \sum_{l=1}^J \lambda_{il} \tilde{q}_{l} (t, u, \tau) + [1-\epsilon_i]\gamma_i \tilde{q} (t, u, \tau), \end{align}
(4.23) \begin{align} \tilde{q}^\prime (t, u, \tau) &= - q^\prime(t, \tau-u, \tau) = -\alpha \sum_{l=1}^J \chi_l \tilde{q}_l (t, u, \tau) + \alpha \tilde{q} (t, u, \tau), \end{align}

with initial conditions $\tilde{\textbf{q}} (t, 0, \tau) =0$ and final boundary condition $\tilde{q}(t, \tau, \tau) =1$ .

From (4.5) and (4.6) we have that $\psi_j (t;\,\tau) = \lim_{u \downarrow 0} \tilde{p}_j (t,u,\tau)$ and $\zeta (t;\,\tau) = \lim_{u \downarrow 0} \tilde{q} (t,u,\tau)$ . Therefore it follows from (4.20)–(4.23) and (4.17) that

(4.24) \begin{align} \begin{pmatrix} \tilde{\textbf{p}}_{{\cdot} j} (t, \tau, \tau) \\[3pt] 0 \end{pmatrix} &= \Sigma \begin{pmatrix} \textbf{e}_j \\[3pt] \tilde{p}_j (t,0,\tau) \end{pmatrix} = \Sigma \begin{pmatrix} \textbf{e}_j \\[3pt] \psi_j (t;\,\tau) \end{pmatrix} \qquad (j=1,2,\ldots, J) \end{align}

and

(4.25) \begin{align} \begin{pmatrix} \tilde{\textbf{q}} (t, \tau, \tau) \\[3pt] 1 \end{pmatrix} &= \Sigma \begin{pmatrix} \textbf{0} \\[3pt] \tilde{q} (t,0,\tau) \end{pmatrix} = \Sigma \begin{pmatrix} \textbf{0} \\[3pt] \zeta (t;\,\tau) \end{pmatrix}.\end{align}

The lemma follows immediately from (4.24) and (4.25).

5. Proof of Theorem 1

The outline of the proof of Theorem 1 is as follows. A key step is to study the distribution of the number of individuals alive at time $t+\tau$ , and their types, conditioning on the number of individuals of each type alive in the population at time t. As noted in Section 4, we focus on two scenarios: (i) no detected deaths in the time interval $(t, t +\tau]$ , and (ii) precisely one detected death in the time interval $(t, t +\tau]$ , at time $t+\tau$ . The probability generating functions of the required distributions are given in Lemma 5 and Corollary 2. This enables us to show in Lemma 6 that $\textbf{X}_1 = \textbf{W} (0,\textbf{0})$ , provided that the branching process and its observation process are time-homogeneous before the first detected death. This provides the initial conditions, $\textbf{X}_1$ , for Theorem 1. Thus $X_1^\ast \sim \text{Geom} (\pi_0)$ , which complements results in [Reference Lambert and Trapman6, Reference Trapman and Bootsma9], where individuals have independent exponential clocks for detection. In Lemmas 7 and 8, we establish the distribution of the number of individuals alive at time $t + \tau$ in the two scenarios, given that the number of individuals alive at time t follows a $\textbf{W} (t, \textbf{a})$ distribution defined by its probability generating function (3.4). Then, given that $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} $ is the mixture of sums of k independent $\textbf{W} (s_k, {\cdot})$ random variables, it is straightforward to conclude the proof of Theorem 1 using induction and Lemmas 7 and 8.

Suppose that the $\kappa$ th detected death occurs at time t. The precise value of $\kappa$ is not important in the following arguments. Let $E_{t;\tau}^{\textbf{y}}$ be the event that there are no detected deaths in the interval $(t, t+\tau)$ given that $\textbf{Y} (t) = \textbf{y}$ . Similarly, let $D_{t,\tau}^{\textbf{y}}$ be the event that the first detected death in the interval $(t, t+ \tau]$ is at time $t+\tau$ given that $\textbf{Y} (t) = \textbf{y}$ . Then for $t, \tau \geq0$ , $\boldsymbol{\theta} = (\theta_1, \theta_2, \ldots, \theta_J) \in [0,1]^J$ , and $\textbf{y} = (y_1, y_2, \ldots, y_J) \in \mathbb{Z}_+^J$ , let

\begin{align}H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} ) &= \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{Y_j (t+\tau)} 1_{\{E_{t;\tau}^{\textbf{y}}\}} \right| \textbf{Y} (t) = \textbf{y} \right] \nonumber\end{align}

and

\begin{align}H_D (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} ) &= \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_{\kappa+1, j}} \right| \textbf{X}_\kappa = \textbf{y}, T_{\kappa +1}= \tau \right] f_{T_{\kappa+1}} (\tau | \textbf{X}_\kappa =\textbf{y}). \nonumber \end{align}

Note that $f_{T_{\kappa+1}} (\tau | \textbf{X}_\kappa =\textbf{y}) $ satisfies

(5.1) \begin{align} f_{T_{\kappa+1}} (\tau | \textbf{X}_\kappa =\textbf{y}) = H_D (\textbf{1};\, t, \tau;\, \textbf{y} ) .\end{align}

Lemma 5. For any $\tau \gt 0$ , $\boldsymbol{\theta} = (\theta_1, \theta_2, \ldots, \theta_J) \in [0,1]^J$ , and $\textbf{y} = (y_1, y_2, \ldots, y_J) \in \mathbb{Z}_+^J$ ,

(5.2) \begin{align}H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} )&=\prod_{i=1}^J \left( q_i (t, 0, \tau) + \frac{\zeta (t, \tau) \big[\sum_{j=1}^J p_{ij} (t,0, \tau) \theta_j \big]}{1 - \sum_{j=1}^J \psi_j (t;\,\tau) \theta_j} \right)^{y_i}.\end{align}

Proof. For $i=1,2,\ldots, J$ , let $\textbf{e}_i$ denote the vector of length J in which the ith entry is 1 and all other entries are equal to 0. Since the branching process evolves independently from each of the $\textbf{y}$ individuals alive in the population at time 0, we have that

(5.3) \begin{align}H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} )&=\prod_{i=1}^J H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{e}_i )^{y_i}.\end{align}

Using the exploration process described in Section 4, the joint probability of the number of individuals alive at time $t+\tau$ from a single progenitor of type i at time t, $\hat{Y}_i^\ast (t;\,\tau)$ , and the event of no detected deaths in the interval $(t, t+\tau]$ , denoted by $\hat{E}_{t,\tau}^i$ , are given by

\begin{align}\mathbb{P} \big(\hat{Y}_i^\ast (t;\,\tau) =0, \hat{E}_{t,\tau}^i \big)&=q_i (t, 0, \tau), \nonumber\\[3pt] \mathbb{P} \big(\hat{Y}_i^\ast (t, \tau) =n, \hat{E}_{t,\tau}^i \big) &= \left[ \sum_{j=1}^J p_{ij} (t, 0, \tau) \right]\psi (t;\,\tau)^{n-1} \zeta (t;\,\tau) \qquad (n=1,2,\ldots). \nonumber\end{align}

Note that if all deaths are detected then $q_i (t, 0, \tau)=0$ .

Given that $\hat{Y}_i^\ast (t, \tau) =n \gt 0$ , there is a single individual, either the progenitor or its descendant derived from the forward pathway, whose probability of being of type l at time $t +\tau$ is $p_{ij} (t, 0, \tau) /\sum_{l=1}^J p_{il} (t, 0, \tau) $ . By the strong Markov property, each of the other $n-1$ individuals alive at time $t+\tau$ are independently of type j with probability $\psi_j (t;\,\tau)/\psi (t;\,\tau)$ . It is then straightforward to show that

(5.4) \begin{align}H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{e}_i ) &= q_i(t, 0, \tau) + \sum_{n=1}^\infty \zeta (t;\,\tau) \left[ \sum_{j=1}^J p_{ij} (t,0,\tau) \theta_j \right] \left[ \sum_{j=1}^J \psi_j (t;\,\tau) \theta_j \right]^{n-1} \nonumber \\[3pt] &= q_i(t, 0, \tau) + \frac{\zeta (t;\,\tau) \left[ \sum_{j=1}^J p_{ij} (t,0,\tau) \theta_j \right] }{1 - \sum_{j=1}^J \psi_j (t;\,\tau) \theta_j}. \end{align}

Finally, (5.2) follows from substituting (5.4) into (5.3).

We now derive the relationship between $H_D (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} )$ and $H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} ) $ . Recall that, for $i=1,2,\ldots,J$ , $d_{t,i}$ denotes the probability that a death of a type-i individual at time t is detected.

Corollary 2. For any $t, \tau \gt 0$ , $\boldsymbol{\theta} = (\theta_1, \theta_2, \ldots, \theta_J) \in [0,1]^J$ , and $\textbf{y} = (y_1, y_2,\ldots, y_J) \in \mathbb{Z}_+^J$ ,

(5.5) \begin{align} H_D (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} ) &= \sum_{i=1}^J d_{t+\tau,i} \gamma_i\frac{\partial \;}{\partial \theta_i} H_E (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} ) . \end{align}

Proof. The proof follows arguments similar to those used for [Reference Ball and Neal2, Lemma 4.4]. Let $f_{T_{\kappa+1}}^i ({\cdot})$ denote the joint density of $T_{\kappa+1}$ and the $(\kappa+1)$ th detected death being of type i. Hence, for $t \geq 0$ , $f_{T_{\kappa+1}} (t) = \sum_{i=1}^J f_{T_{\kappa+1}}^i (t)$ .

We observe that

\begin{align*}& f (\textbf{X}_{\kappa+1} = \textbf{x}, T_{\kappa +1} = \tau | \textbf{X}_\kappa = \textbf{y}) \nonumber \\[3pt] & \qquad= \lim_{\sigma \uparrow \tau} \sum_{i=1}^J f_{T_{\kappa+1}}^i (\tau | \textbf{Y} (t+\sigma) = \textbf{x} + \textbf{e}_i, T_{\kappa+1} \gt \sigma)\end{align*}
\begin{align*}& \qquad \times \mathbb{P} \left( \textbf{Y} (t+\sigma) =\textbf{x} + \textbf{e}_i, T_{\kappa+1} > \sigma |\textbf{Y} (t) = \textbf{y} \right) \nonumber \\[3pt] & \qquad = \sum_{i=1}^J d_{t+\tau,i}\gamma_i (x_i+1) \mathbb{P} \left( \textbf{Y} (t+\tau) =\textbf{x} + \textbf{e}_i, T_{\kappa+1} > \tau |\textbf{Y} (t) = \textbf{y} \right). \nonumber\end{align*}

Therefore, letting $\textbf{v}^{(i)} = \textbf{x} + \textbf{e}_i$ , we have that

\begin{align} H_D (\boldsymbol{\theta};\, t, \tau;\, \textbf{y} ) &= \sum_{i=1}^J d_{t+\tau,i} \gamma_i \sum_{\textbf{x}} (x_i +1) \prod_{j=1}^J \theta_j^{x_j} \mathbb{P} \left( \textbf{Y} (t+\tau) =\textbf{x} + \textbf{e}_i, T_{\kappa+1} > \tau |\textbf{Y} (t) =X_\kappa = \textbf{y} \right) \nonumber \\[3pt] &= \sum_{i=1}^J d_{t+\tau,i}\gamma_i \sum_{\textbf{v}^{(i)}} v^{(i)}_i \theta_i^{v^{(i)}_i-1} \prod_{j \neq i}^J \theta_j^{v_j^{(i)}} \mathbb{P} \left( \textbf{Y} (t+\tau) =\textbf{v}^{(j)}, T_{\kappa+1} > \tau |\textbf{Y} (t) =X_\kappa = \textbf{y} \right) \nonumber \\[3pt] &= \sum_{i=1}^J d_{t+\tau,i} \gamma_i \frac{\partial \;}{\partial \theta_i} H_E (\boldsymbol{\theta};\, t,\tau;\, \textbf{y} ), \nonumber \end{align}

as required.

Suppose that we let $\textbf{X}_0$ denote the state of the population when the initial individual enters the population. For $i=1,2,\ldots, J$ , $\mathbb{P} (\textbf{X}_0 = \textbf{e}_i) = \chi_i$ . We can use Lemma 5 and Corollary 2 to obtain $\textbf{X}_1$ by integrating over $T_1$ , the inter-arrival time from the initial individual entering the population to the first detected death.

Lemma 6. Given a single initial individual who starts in type l with probability $\chi_l$ , the joint probability generating function of $\textbf{X}_1$ is given by

(5.6) \begin{align} \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_1^j} \right| T_1 <\infty \right] &= \frac{\pi_0}{1- (1-\pi_0) \sum_{j=1}^J \eta^0_j \theta_j}.\end{align}

Therefore, $\textbf{X}_1 \sim \textbf{W} (0, \textbf{0})$ as defined in (3.5) and (3.6), so

\begin{align}X_1^\ast = \sum_{j=1}^J X_1^j &\sim \text{Geom} (\pi_0), \nonumber \\[3pt] \{\textbf{X}_1 | X_1^\ast = x^\ast\} &\sim \text{Multinomial} \left(x^\ast, \boldsymbol{\eta}^{0} \right)\!, \nonumber\end{align}

where $\boldsymbol{\eta}^{0} = (\eta^0_1, \eta^0_2, \ldots, \eta^0_J)$ , and for $j=1,2,\ldots, J$ , $X_1^j \sim \text{Geom} \big(\pi_0/\{\pi_0 + \eta^0_j - \eta^0_j \pi_0\}\big)$ .

Proof. By conditioning upon $T_1$ and the type of the initial individual, we have, for any $\boldsymbol{\theta} \in [0,1]^J$ , that

(5.7) \begin{align} \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_1^j} \right| T_1 \lt \infty \right] &= \frac{ \sum_{i=1}^J \chi_i \int_0^\infty H_D (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) \, dt}{ \sum_{i=1}^J \chi_i \int_0^\infty H_D (\textbf{1}; -t,t;\, \textbf{e}_i) \, dt}. \end{align}

Using $\textbf{d}_t ={\boldsymbol{\epsilon}}$ $(t \leq 0)$ , we have that

(5.8) \begin{align} \sum_{i=1}^J \chi_i \int_0^\infty H_D (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) \, dt &= \sum_{j=1}^J \gamma_j \epsilon_j \frac{\partial \;}{\partial \theta_j} \left\{\sum_{i=1}^J \chi_i \int_0^\infty H_E (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) \, dt \right\}.\end{align}

Since the branching (and detection) process is time-homogeneous prior to the first detected death at time $t=0$ , it follows from Lemma 5 and Section 4 that

(5.9) \begin{align} H_E (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) = q_i ({-}t,0,t) + \frac{\bar{\zeta} (t) \big[ \sum_{j=1}^J p_{ij} ({-}t,0,t) \theta_j \big]}{1- \sum_{j=1}^J \bar{\psi}_j (t) \theta_j},\end{align}

where $\bar{\zeta} (t) = \zeta ({-}t;\,t)$ and $\bar{\psi}_j (t) = \psi_j ({-}t;\,t)$ ; cf. Section 4.2.

Let $\bar{\Omega} = \Omega (\alpha_1, {\boldsymbol{\epsilon}})$ and $\bar{\Sigma} (t) = \Sigma (\alpha_1, {\boldsymbol{\epsilon}},t)$ $(t \geq 0)$ , where $\Omega ({\cdot}, {\cdot})$ and $\Sigma ({\cdot}, {\cdot}, {\cdot})$ are defined in (4.16) and (4.17), respectively. Then

\begin{align} \frac{d \;}{dt} \bar{\Sigma} (t) = \bar{\Omega} \bar{\Sigma} (t). \nonumber\end{align}

Hence, for $i,j = 1,2,\ldots, J+1$ ,

(5.10) \begin{align} \frac{d \;}{dt} \bar{\Sigma}_{ij} (t) = \sum_{l=1}^{J+1} \bar{\Omega}_{il} \bar{\Sigma}_{lj} (t).\end{align}

It follows from Lemma 4 that $\bar{\psi}_j (t) = - \bar{\Sigma}_{J+1,j} (t) /\bar{\Sigma}_{J+1,J+1} (t)$ and $\bar{\zeta} (t) = 1/\bar{\Sigma}_{J+1,J+1} (t)$ . Therefore, from (5.10), we have that

(5.11) \begin{align} \frac{d \;}{dt}\bar{\psi}_j (t) &= \frac{-\bar{\Sigma}_{J+1,J+1} (t) \sum_{l=1}^{J+1} \bar{\Omega}_{J+1,l} \bar{\Sigma}_{l,j} (t) + \bar{\Sigma}_{J+1,j} (t) \sum_{l=1}^{J+1} \bar{\Omega}_{J+1,l} \bar{\Sigma}_{l,J+1} (t) }{\bar{\Sigma}_{J+1,J+1} (t)^2} \nonumber \\[3pt] & =- \left(\frac{ \sum_{l=1}^{J+1} \bar{\Omega}_{J+1,l} \left[ \Sigma_{J+1,J+1} (t) \Sigma_{l,j} (t) - \Sigma_{J+1,j} (t) \Sigma_{l,J+1} (t) \right]}{\bar{\Sigma}_{J+1,J+1} (t)^2} \right).\end{align}

From (4.18), $p_{lj} ({-}t,0,t) = \bar{\Sigma}_{lj} (t) + \bar{\psi}_j (t) \bar{\Sigma}_{l,J+1} (t)$ $(j,l =1,2,\ldots,J)$ . Then, since the sum on the right-hand side of (5.11) is 0 for $l=J+1$ and $\bar{\Omega}_{J+1,l} = - \alpha_1 \chi_l$ $(l=1,2,\ldots, J)$ , we have that

(5.12) \begin{align} \frac{d \;}{dt}\bar{\psi}_j (t) & = \alpha_1 \bar{\zeta} (t) \sum_{l=1}^J \chi_l p_{lj} ({-}t,0,t).\end{align}

Therefore, substituting (5.9) and (5.12) into the integral on the right-hand side of (5.8), we have that

(5.13) \begin{align} & \sum_{i=1}^J \chi_i \int_0^\infty H_E (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) \, dt \nonumber \\[3pt] & \qquad = \sum_{i=1}^J \chi_i \int_0^\infty \left\{ q_i ({-}t,0,t) + \frac{\bar{\zeta} (t) \big[ \sum_{j=1}^J p_{ij} ({-}t,0,t) \theta_j \big]}{1- \sum_{j=1}^J \bar{\psi}_j (t) \theta_j} \right\} \, dt \nonumber \\[3pt] & \qquad = \int_0^\infty \left\{ \sum_{i=1}^J \chi_i q_i ({-}t,0,t) \right\} \,dt + \int_0^\infty \frac{\bar{\zeta} (t) \sum_{j=1}^J \theta_j \left[ \sum_{i=1}^J \chi_i p_{ij} ({-}t,0,t) \right] }{1 - \sum_{j=1}^J \theta_j \bar{\psi}_j (t)} \, dt \nonumber \\[3pt] & \qquad =K+ \frac{1}{\alpha_1} \int_0^\infty \frac{ \frac{d\;}{dt} \sum_{j=1}^J \theta_j \bar{\psi}_j (t)}{1 - \sum_{j=1}^J \theta_j \bar{\psi}_j (t)} \, dt,\end{align}

where $K = \sum_{i=1}^J \chi_i \int_0^\infty q_i ({-}t,0,t) \, dt $ does not depend on ${\boldsymbol{\theta}}$ .

Applying a change of variable $z= \sum_{j=1}^J \theta_j \bar{\psi}_j (t)$ in (5.13) and noting that $\bar{\psi}_j (\infty) = (1-\pi_0) \eta_j^0$ , we have that

(5.14) \begin{align} \sum_{i=1}^J \chi_i \int_0^\infty H_E (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) \, dt &= K + \frac{1}{\alpha_1} \int_0^{\sum_{j=1}^J \theta_j (1-\pi_0) \eta_j^0} \frac{1}{1-z} \, dz \nonumber \\[3pt] &= K - \frac{1}{\alpha_1} \log \left( 1- (1-\pi_0 ) \sum_{j=1}^J \theta_j \eta_j^0\right). \end{align}

Substituting (5.14) into (5.8) and taking partial derivatives yields

(5.15) \begin{align} \sum_{i=1}^J \chi_i \int_0^\infty H_D (\boldsymbol{\theta}; -t,t;\, \textbf{e}_i) \, dt &= \frac{\sum_{i=1}^J\gamma_i \epsilon_i (1-\pi_0) \eta^0_i}{\alpha_1 \big(1- (1-\pi_0) \sum_{j=1}^J \eta^0_j \theta_j \big)}.\end{align}

It follows from (5.15) and (5.7) that

\begin{align} \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_1^j} \right| T_1 \lt \infty \right] &= \frac{\sum_{i=1}^J\gamma_i\epsilon_i (1-\pi_0) \eta^0_i}{\alpha_1 \big(1- (1-\pi_0) \sum_{j=1}^J \eta^0_j \theta_j \big)} \times \frac{\alpha_1 \pi_0}{\sum_{i=1}^J\gamma_i \epsilon_i (1-\pi_0) \eta^0_i} \nonumber \\[3pt] &= \frac{ \pi_0}{1- (1-\pi_0) \sum_{j=1}^J \eta^0_j \theta_j}, \nonumber\end{align}

as required.

The distributions of $X_1^\ast$ , $\textbf{X}_1 |X_1^\ast$ , and $X_1^j$ can all be obtained straightforwardly from (5.6).

The assumption that the birth rate and detection probabilities are constant prior to the first detected death are required in Lemma 6 to show that $\textbf{X}_1 \sim \textbf{W} (0,\textbf{0})$ . In the supplementary material we show that the distribution of $\textbf{X}_1$ can also be obtained in the case where the birth time of the initial individual, $S_0$ , is known and equal to $-t_0$ , say.

We proceed by studying the dynamics of a branching process from time t onwards with

(5.16) \begin{eqnarray} \textbf{X}_\kappa = \textbf{Y} (t) = \textbf{W} (t, \textbf{a}).\end{eqnarray}

Remember that (5.16) implies that $X_\kappa^\ast = Y^\ast (t) = W_\ast (t;\,\textbf{a})$ is a zero-modified geometric random variable with success probability $\pi_t$ . In Lemma 7 we consider the distribution of $\textbf{Y} (t+\tau)$ given that there are no detected deaths in the interval $(t, t +\tau]$ , the event $\tilde{E}_{t;\tau}^\textbf{a}$ defined in Section 3, and in Lemma 8 we consider the distribution of $X_{\kappa+1}$ given that $T_{\kappa+1} = \tau$ .

In the first instance, Lemmas 7 and 8 are useful in the case $t=0 (=s_1)$ and $\tau = t_2 (=s_2)$ , with $\textbf{X}_1 = \textbf{Y} (0) = \textbf{W} (0, \textbf{0})$ , where we consider the behaviour of the branching process between the first and second detected death and at the second detected death.

Lemma 7. For $t \geq 0$ and $\textbf{a} > - (1-\pi_t) {\boldsymbol{\eta}}^t$ , suppose that $ \textbf{Y} (t) \sim \textbf{W} (t, \textbf{a})$ . In addition, for $\tau \geq 0$ , let $\textbf{b} (= \textbf{b} (t, \tau;\,\textbf{a}))$ satisfy (3.7).

Then $U_E (t, \tau ;\,\textbf{a}) = \mathbb{P}(\tilde{E}_{t;\tau}^\textbf{a})$ (see (3.10)) satisfies

(5.17) \begin{align} &U_E (t, \tau ;\,\textbf{a}) = \frac{\pi_t \times \left[1 + \sum_{i=1}^J a_i q_i (t, 0, \tau)\right] \times \left[1 + \sum_{j=1}^J b_j \right]}{\pi_{t+\tau} \times \left[1 -(1 - \pi_t) \sum_{i=1}^J \eta_i^t q_i (t, 0, \tau)\right] \times \left[1 + \sum_{j=1}^J a_j \right]}, \end{align}
(5.18) \begin{align} &\mathbb{E} \left[ H_E ({\boldsymbol{\theta}};\, t , \tau;\, \textbf{W} (t, \textbf{a})) \right] = U_E (t, \tau ;\,\textbf{a}) \varphi ({\boldsymbol{\theta}};\, t+ \tau, \textbf{b}),\end{align}

and

(5.19) \begin{align} \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{Y^j (t +\tau)} \right| \tilde{E}_{t;\tau}^\textbf{a}, \textbf{Y} (t) \sim \textbf{W} (t, \textbf{a}) \right] = \varphi ({\boldsymbol{\theta}};\, t+ \tau, \textbf{b}).\end{align}

Thus, $ \{\textbf{Y} (t + \tau) | \tilde{E}_{t;\tau}^\textbf{a}, \textbf{Y} (t) \sim \textbf{W} (t, \textbf{a}) \} \sim \textbf{W} (t + \tau, \textbf{b})$ .

Proof. For $t, \tau \geq 0$ and ${\boldsymbol{\theta}} \in [0,1]^J$ , let

\begin{align} \xi_i ({\boldsymbol{\theta}};\, t, \tau) = q_i (t,0, \tau) + \frac{\zeta (t;\,\tau) \sum_{j=1}^J p_{ij} (t,0, \tau) \theta_j }{1 - \sum_{j=1}^J \psi_j (t;\,\tau) \theta_j} . \nonumber\end{align}

Therefore it follows from Lemma 5 and (3.4) that, for ${\boldsymbol{\theta}} \in [0,1]^J$ ,

(5.20) \begin{align} \mathbb{E} \left[ H_E ({\boldsymbol{\theta}};\, t , \tau;\, \textbf{W} (t, \textbf{a})) \right] &= \mathbb{E} \left[ \prod_{j=1}^J \xi_j ({\boldsymbol{\theta}};\, t, \tau)^{W_j (t, \textbf{a})} \right] \nonumber \\[3pt] &= \frac{1 + \sum_{i=1}^J a_i \xi_i({\boldsymbol{\theta}};\, t, \tau)}{1+ \sum_{i=1}^J a_i} \times \frac{\pi_t}{1- (1-\pi_t) \sum_{j=1}^J \eta_j^t \xi_j ({\boldsymbol{\theta}};\, t, \tau) }.\end{align}

Note that

(5.21) \begin{align} & \frac{1 + \sum_{i=1}^J a_i \xi_i ({\boldsymbol{\theta}};\, t, \tau)}{1 - (1-\pi_t) \sum_{i=1}^J \eta_i^t \xi_i ({\boldsymbol{\theta}};\, t, \tau)} \nonumber \\[3pt] & =\frac{1- \sum_{i=1}^J \psi_i (t;\,\tau) \theta_i + \sum_{i=1}^J a_i\left(q_i (t, 0, \tau) [1- \sum_{j=1}^J \psi_j (t;\,\tau) \theta_j] + \zeta (t;\,\tau)\left[ \sum_{j=1}^J \theta_j p_{ij} (t,0,\tau)\right] \right) }{1- \sum_{i=1}^J \psi_i (t;\,\tau) \theta_i - (1-\pi_t) \sum_{i=1}^J \eta_i^t\left(q_i (t, 0, \tau) [ 1- \sum_{j=1}^J \psi_j (t;\,\tau) \theta_j] + \zeta (t;\,\tau)\left[ \sum_{j=1}^J \theta_j p_{ij} (t,0,\tau)\right] \right) }.\nonumber \\[3pt] \end{align}

Using (3.2), we have that the denominator in (5.21) satisfies

(5.22) \begin{align} & \left[ 1-(1-\pi_t) \sum_{j=1}^J \eta_j^t q_j (t, 0, \tau)\right] \left\{ 1- \sum_{i=1}^J \theta_i \left[ \psi_i (t;\,\tau) + \frac{ (1-\pi_t)\zeta (t;\,\tau)\sum_{j=1}^J \eta_j^t p_{ji} (t,0,\tau)}{1-(1-\pi_t) \sum_{j=1}^J \eta_j^t q_j (t, 0, \tau)} \right] \right\} \nonumber \\[3pt] & \qquad = \left[ 1-(1-\pi_t) \sum_{j=1}^J \eta_j^t q_j (t, 0, \tau)\right] \times \left[ 1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t + \tau} \right]. \end{align}

Thus, substituting (5.22) into (5.21) and using (3.7) yields

(5.23) \begin{align} & \frac{1 + \sum_{i=1}^J a_i \xi_i ({\boldsymbol{\theta}};\, t, \tau)}{1 - (1-\pi_t) \sum_{i=1}^J \eta_i^t \xi_i ({\boldsymbol{\theta}};\, t, \tau)} \nonumber \\[3pt] & = \frac{1+ \sum_{i=1}^J a_i q_i (t,0,\tau) + \sum_{j=1}^J \theta_j \left[ - \psi_j (t;\,\tau) \left\{1+ \sum_{i=1}^J a_i q_i (t,0,\tau) \right\} + \zeta (t;\,\tau) \sum_{i=1}^J a_i p_{ij} (t,0,\tau)\right]}{\left[ 1-(1-\pi_t) \sum_{j=1}^J \eta_j^t q_j (t, 0, \tau)\right] \times \left[ 1- (1-\pi_{t +\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+ \tau} \right]}. \nonumber \\[3pt] & = \frac{ \left[1 + \sum_{i=1}^J a_i q_i (t,0,\tau) \right] \times \left[ 1+ \sum_{j=1}^J b_j \theta_j\right] }{\left[ 1- (1-\pi_t) \sum_{j=1}^J \eta_j^t q_j (t,0, \tau) \right] \times \left[ 1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+\tau}\right]}.\end{align}

Then (5.18) follows from (5.20) and (5.23) by straightforward algebraic manipulation.

By setting ${\boldsymbol{\theta}} = \textbf{1}$ in (5.18), we obtain (5.17).

Finally, (5.19) follows immediately from

\begin{align} \varphi ({\boldsymbol{\theta}};\, t+\tau, \textbf{b}) U_E (t, \tau ;\,\textbf{a}) &= \mathbb{E} \left[ H_E ({\boldsymbol{\theta}};\, t , \tau;\, \textbf{W} (t, \textbf{a})) \right] \nonumber \\[3pt] & = \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{Y^j (t +\tau)} 1_{\{\tilde{E}_{t,\tau}^{\textbf{a}} \}} \right| \textbf{Y} (t) \sim \textbf{W} (t, \textbf{a})\right] \nonumber \\[3pt] & = \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{Y^j (t +\tau)} \right| \tilde{E}_{t,\tau}^{\textbf{a}}, \textbf{Y} (t) \sim \textbf{W} (t, \textbf{a})\right] \mathbb{P} \left(\tilde{E}_{t,\tau}^{\textbf{a}} |\textbf{Y} (t) \sim \textbf{W} (t, \textbf{a}) \right) \nonumber \\[3pt] & = \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{Y^j (t +\tau)} \right| \tilde{E}_{t,\tau}^{\textbf{a}}, \textbf{Y} (t) \sim \textbf{W} (t, \textbf{a})\right] U_E (t, \tau ;\,\textbf{a}). \nonumber\end{align}

The following observations, which are consequences of Lemma 7, prove useful in the derivation of the approximation in Section 3.5.2. For $t \geq 0$ , let $G_t \sim \text{Geom} (\pi_t)$ . Then if $\textbf{Y} (t) \sim \textbf{W} (t, \textbf{0})$ , we have that $Y^\ast (t) \sim G_t$ . As noted prior to (3.19), $b_i (t, \tau;\, \textbf{0}) = -\psi_i (t;\,\tau)$ $(i=1,2,\ldots,J)$ , so it follows from (5.19) that, for $0 \leq \vartheta \leq 1$ ,

(5.24) \begin{align} \mathbb{E} \left[ \left. \vartheta^{Y^\ast (t +\tau)} \right| \tilde{E}_{t;\tau}^{\textbf{0}}, \textbf{Y} (t) \sim \textbf{W} (t;\, \textbf{0}) \right] & = \frac{1 - \psi (t;\,\tau) \vartheta}{1- \psi (t;\,\tau)} \times \frac{\pi_{t+\tau}}{1 - (1-\pi_{t+\tau}) \vartheta}.\end{align}

Since $0 < \psi (t;\; \tau ) < 1 - \pi_{t+\tau}$ , (see (3.1)) it follows from (5.24) and Lemma 1 that

(5.25) \begin{align} \left\{ \left. Y^\ast (t+ \tau)\right| E_{t,\tau}^{\textbf{Y} (t)}, Y^\ast (t) \sim G_t \right\} \sim \left\{ \begin{array}{l@{\quad}l} G_{t+\tau} & \mbox{with probability } h (t;\,\tau), \\[3pt] 0 & \mbox{with probability } 1-h(t;\,\tau), \end{array} \right. \end{align}

where, for $t, \tau \geq 0$ ,

(5.26) \begin{align} h (t;\,\tau) = \frac{1-\pi_{t+\tau}- \psi (t;\,\tau) }{(1-\pi_{t+\tau} )(1-\psi (t;\,\tau) )}.\end{align}

We turn to the distribution of $\textbf{X}_{\kappa +1}$ given that $\textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a})$ . Recall from (3.11) that $U_D (t, \tau;\,\textbf{a}) = - \frac{\partial \;}{\partial \tau} U_E (t, \tau;\,\textbf{a})$ , where $U_E (t, \tau;\,\textbf{a})$ is defined in (3.10) and derived in (5.17).

Lemma 8. For some $\kappa =1,2, \ldots$ , suppose that $s_\kappa =t$ with $\textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a})$ . For $\tau \geq 0$ , let $\textbf{b} \; (= \textbf{b} (t, \tau;\,\textbf{a}))$ and $\textbf{c} \;(= \textbf{c} (t, \tau;\,\textbf{a}))$ satisfy (3.7) and (3.8). Then

(5.27) \begin{align} & f_{T_{\kappa+1}} \left(\tau \left| S_\kappa =t, \textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a}) \right. \right) \nonumber \\[3pt] & \qquad = \frac{U_E (t, \tau;\,\textbf{a}) \sum_{j=1}^J d_{t+\tau,j}\gamma_j \left[b_j \pi_{t+\tau} + (1-\pi_{t+\tau}) \eta_j^{t +\tau} (1 + \sum_{i=1}^J b_i) \right]}{\pi_{t+\tau} (1 + \sum_{i=1}^J b_i)} \nonumber \\[3pt] & \qquad = - \frac{\partial \;}{\partial \tau} U_E (t, \tau;\,\textbf{a}) = U_D (t, \tau;\,\textbf{a}),\end{align}

and

(5.28) \begin{align} \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_{\kappa+1}^j} \right| T_{\kappa +1} = \tau, \textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a}) \right] & = \varphi (t +\tau, \textbf{0}) \varphi (t+\tau, \textbf{c}). \end{align}

Thus,

(5.29) \begin{align} \big\{ \textbf{X}_{\kappa +1} | S_\kappa =t, T_{\kappa +1} = \tau, \textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a}) \big\} \sim \textbf{W} (t +\tau, \textbf{c}) + \textbf{W} (t +\tau, \textbf{0}),\end{align}

where $\textbf{W} (t +\tau, \textbf{c})$ and $\textbf{W} (t +\tau, \textbf{0})$ are independent random variables.

Proof. For ${\boldsymbol{\theta}} \in [0,1]^J$ , it follows from Corollary 2, (5.5), and Lemma 7, (5.18), that

(5.30) \begin{align} \mathbb{E} \left[ H_D ({\boldsymbol{\theta}};\, t, \tau;\, \textbf{W} (t, \textbf{a})) \right]&= \sum_{j=1}^J \gamma_j d_{t+\tau,j} \frac{\partial \;}{\partial \theta_j} \mathbb{E} \left[ H_E ({\boldsymbol{\theta}};\, t, \tau;\, \textbf{W} (t, \textbf{a})) \right] \nonumber \\[3pt] &= \sum_{j=1}^J \gamma_j d_{t+\tau,j} U_E (t, \tau ;\,\textbf{a}) \frac{\partial \;}{\partial \theta_j} \varphi (t+\tau ;\, \textbf{b}).\end{align}

The partial derivative of $\varphi (t+\tau ;\, \textbf{b})$ with respect to $\theta_j$ $(j=1,2,\ldots, J)$ is given by

(5.31) \begin{align} & \frac{\partial \;}{\partial \theta_j} \varphi (t+\tau ;\; \textbf{b}) = \frac{\pi_{t+\tau}}{1 + \sum_{i=1}^J b_i } \nonumber\\[3pt] & \quad \times \frac{b_j \left[ 1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+\tau} \right] + (1-\pi_{t+\tau}) \eta_j^{t+\tau} \left(1 + \sum_{i=1}^J b_i \theta_i \right) }{\left[1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+\tau} \right]^2}.\end{align}

By setting ${\boldsymbol{\theta}} = \textbf{1}$ in (5.31) and using (5.1) and (5.30), we obtain (5.27).

Since

\begin{align} \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_{\kappa+1}^j} \right| T_{\kappa +1} = \tau, \textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a}) \right] &= \frac{\mathbb{E} \left[ H_D ({\boldsymbol{\theta}};\, t, \tau;\, \textbf{W} (t, \textbf{a})) \right] }{f_{T_{\kappa+1}} \left(\tau \left| S_\kappa =t, \textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a}) \right. \right)} \nonumber\end{align}

(cf. (5.1)), it follows that

(5.32) \begin{align} & \mathbb{E} \left[ \left. \prod_{j=1}^J \theta_j^{X_{\kappa+1}^j} \right| T_{\kappa +1} = \tau, \textbf{X}_\kappa \sim \textbf{W} (t, \textbf{a}) \right] \nonumber \\[3pt] & = \frac{\pi_{t+\tau}^2}{\left[1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+\tau} \right]^2} \nonumber \\[3pt] & \quad \times \frac{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left(b_j \left[ 1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+\tau} \right] + (1-\pi_{t+\tau}) \eta_j^{t+\tau} \left(1 + \sum_{i=1}^J b_i \theta_i \right) \right)}{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[b_j \pi_{t+\tau} + (1-\pi_{t+\tau}) \eta_j^{t +\tau} (1 + \sum_{i=1}^J b_i) \right]}.\end{align}

The second fraction on the right-hand side of (5.32) simplifies to

(5.33) \begin{align} & \frac{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left(b_j \left[ 1- (1-\pi_{t+\tau}) \sum_{l=1}^J \theta_l \eta_l^{t+\tau} \right] + (1-\pi_{t+\tau}) \eta_j^{t+\tau} \left(1 + \sum_{i=1}^J b_i \theta_i \right) \right)}{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[b_j \pi_{t+\tau} + (1-\pi_{t+\tau}) \eta_j^{t +\tau} (1 + \sum_{i=1}^J b_i) \right]}\nonumber \\[3pt] & \qquad = \frac{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ (1-\pi_{t+\tau}) \eta_j^{t +\tau} +b_j \right] + \sum_{l=1}^J \theta_l (1-\pi_{t+\tau}) \sum_{j=1}^J d_{t+\tau,j} \gamma_j \left( b_l \eta_j^{t +\tau} - b_j \eta_l^{t+\tau}\right) }{\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ (1-\pi_{t+\tau}) \eta_j^{t +\tau} +b_j \right] + \sum_{l=1}^J (1-\pi_{t+\tau}) \sum_{j=1}^J d_{t+\tau,j}\gamma_j \left( b_l \eta_j^{t +\tau} - b_j \eta_l^{t+\tau}\right) } \nonumber \\[3pt] & \qquad = \frac{ \left(\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ (1-\pi_{t+\tau}) \eta_j^{t +\tau} +b_j \right] \right) \times \left( 1 + \sum_{l=1}^J c_l \theta_l \right)}{ \left(\sum_{j=1}^J d_{t+\tau,j} \gamma_j \left[ (1-\pi_{t+\tau}) \eta_j^{t +\tau} +b_j \right] \right) \times \left( 1 + \sum_{l=1}^J c_l \right)} = \frac{ 1 + \sum_{l=1}^J c_l \theta_l}{ 1 + \sum_{l=1}^J c_l}.\end{align}

Since $\textbf{c} > - (1- \pi_{t+\tau}) {\boldsymbol{\eta}}^{t+\tau}$ , we can combine (5.32) and (5.33) to obtain (5.28), completing the proof of the lemma.

The following observations, which are consequences of Lemma 8, prove useful in the derivation of Section 3.5.2. The distribution of the size of the population immediately following the $(\kappa +1)$ th detected death, given that we have $\textbf{X}_\kappa \sim \textbf{W} (s_\kappa;\, \textbf{0})$ , depends on the sign of

$$C_{s_\kappa; t_{\kappa+1}}^\ast = \sum_{i=1}^J C_{s_\kappa; t_{\kappa+1}}^i,$$

where $C_{s_\kappa; t_{\kappa+1}}^i$ $(i=1,2,\ldots,J)$ are given by (3.19). In particular, it follows from (5.29) and Lemma 1 that if $C_{s_\kappa; t_{\kappa+1}}^\ast \leq 0$ with $s_{\kappa+1} = s_\kappa + t_{\kappa+1}$ , then

(5.34) \begin{align} \left\{ \left. X_{\kappa+1}^\ast \right| \textbf{X}_\kappa \sim \textbf{W} (s_k;\; \textbf{0}) \right\} \sim \left\{ \begin{array}{ll} G_{s_{\kappa+1}}^1 + G_{s_{\kappa+1}}^2 & \mbox{with probability } \tilde{h} (s_\kappa, t_{\kappa +1}), \\[3pt] G_{s_{\kappa+1}} & \mbox{with probability } 1 - \tilde{h} (s_\kappa, t_{\kappa +1}), \end{array} \right. \end{align}

where $\tilde{h} (s_\kappa, t_{\kappa +1})$ is defined in (3.20) and $G_{s_{k+1}}^1$ and $G_{s_{k+1}}^2$ are i.i.d. copies of $G_{s_{k+1}}$ .

We are now in position to prove Theorem 1.

Proof of Theorem 1 The theorem is proved by induction.

By Lemma 6, we have that $\textbf{X}_1 \sim \textbf{W} (0, \textbf{0})$ , and by Lemma 8, we have that $\textbf{X}_2|T_2 =t_2 \sim \textbf{W} (s_2, \textbf{C}_{0;t_2}) + \textbf{W} (s_2, \textbf{0})$ , where the components of $ \textbf{C}_{0;t_2}$ are given by (3.19) and $\textbf{W} (s_2, \textbf{C}_{0;t_2})$ and $\textbf{W} (s_2, \textbf{0})$ are independent.

For $k=2, 3, \ldots$ , we have that $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ satisfies (3.12). Therefore suppose that $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k} \sim \sum_{l=1}^k \textbf{W}_{ml} (s_k;\, \textbf{a}_{ml}^k)$ for some $m=1,2,\ldots, M_k (=(k-1)!)$ and note that this occurs with probability $w_m^k$ . We say that the population immediately after the kth detected death consists of k independent family groups, where the number of individuals of each type in the lth family group is given by $\textbf{W}_{ml} (s_k;\; \textbf{a}_{ml}^k)$ . Note that a family group can contain 0 individuals. Each family group evolves independently with births of new individuals, individuals changing type, and deaths of individuals. Given that the first detected death after time $s_k$ is at time $s_{k+1} = s_k + t_{k+1}$ , one of the family groups is responsible for the death, whilst the other $k-1$ family groups do not experience a detected death in the interval $(s_k, s_{k+1}]$ . Hence, it follows that the joint probability density for $T_{k+1} = t_{k+1}$ and the ith family group being responsible for the death is

(5.35) \begin{align} U_D (s_k, t_{k+1};\, \textbf{a}_{mi}^k) \prod_{l \neq i} U_E (s_k, t_{k+1};\, \textbf{a}_{ml}^k).\end{align}

Since $\textbf{X}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ is a mixture of $ \big\{\sum_{l=1}^k \textbf{W}_{ml} (s_k;\, \textbf{a}_{ml})\big\}$ with weights $w_m^k$ , we have that

(5.36) \begin{align} f_{T_{k+1}} (t_{k+1} | S_k =s_k, \textbf{X}_k) = \sum_{m=1}^{M_k} w_m^k \sum_{i=1}^k U_D \big(s_k, t_{k+1};\, \textbf{a}_{mi}^k \big) \prod_{l \neq i} U_E \big(s_k, t_{k+1};\, \textbf{a}_{ml}^k \big).\end{align}

It follows from (5.35) and (5.36) that, given $T_{k+1} = t_{k+1}$ , the probability of the detected death coming from the ith family group in the mth mixture distribution is $w_{mk+i}^{k+1}$ as defined in (3.14).

Given that the ith family group in the mth mixture distribution is responsible for the $(k+1)$ th detected death at time $s_{k+1}$ , we have $k+1$ family groups at time $s_{k+1}$ . The $k-1$ family groups not responsible for the detected deaths will have evolved by Lemma 7 into family groups having different distributions. Specifically, the lth family group $(l \neq i)$ which had numbers of individuals of each type distributed according to $\textbf{W} (s_k , \textbf{a}_{ml}^k)$ will have, given no detected death in $(s_k, s_{k+1}]$ , numbers of individuals of each type distributed according to $\textbf{W} (s_{k+1} , \textbf{b}_{ml}^k)$ . By Lemma 8, we can split the family group i, in which the detected death occurs at $s_{k+1}$ , into two independent family groups, with the numbers of individuals of each type in the two family groups distributed according to $\textbf{W} (s_{k+1} , \textbf{c}_{ml}^k)$ and $\textbf{W} (s_{k+1} , \textbf{0})$ , respectively. The labellings of $\{\textbf{a}_{(mk+i)j}^{k+1} \}$ given in (3.15) follow.

Hence, (3.13) is proved and the theorem follows.

The description of family groups given in Theorem 1 differs from the description in [Reference Ball and Neal2, Theorem 3.2]. In [Reference Ball and Neal2], the number of family groups immediately after the kth detected death is random with support $\{2,3,\ldots, k\}$ for $k \geq 2$ , and its distribution is dependent on the times of the detected deaths. The size of each family group is i.i.d. according to $\text{Geom} (\pi_{s_k})$ . In Theorem 1, there are k family groups immediately after the kth detected death, with the sizes of the family groups being independent but not identically distributed, and their distributions depending on the times of the detected deaths.

It is possible to obtain a result similar to Theorem 1 if the time $S_0$ at which the first individual is born is known and is equal to $-t_0$ , say; the details are provided in the supplementary material. In that case it can be shown using Corollary 2 and (5.2) that the distribution of $\textbf{X}_1$ is the sum of two independent zero-modified geometric random variables with $\pi_0 = 1- \psi ({-}t_0;t_0)$ ; cf. the discussion after the proof of [Reference Ball and Neal2, Theorem 3.1]. The distribution of $\textbf{X}_{2:k} |\textbf{T}_{2:k} = \textbf{t}_{2:k}$ can then be obtained as above, except that it consists of a mixture of $k!$ random variables, each consisting of $k+1$ independent zero-modified geometric random variables.

6. Derivation of the approximation in Section 3.5.2

In this section we construct the approximation $\hat{\textbf{X}}_k$ and $\hat{X}_k^\ast = \sum_{i=1}^k \hat{X}_k^j$ to $\textbf{X}_k$ and $X_k^\ast$ , where $\hat{X}_k^\ast$ is a mixture of negative binomial distributions and $\hat{\textbf{X}}_k | X_k^\ast$ follows a multinomial distribution. As stated in Section 3.5.2, we set $\hat{\textbf{X}}_1 = \textbf{X}_1 \sim \hat{\textbf{W}} (0;\, {\boldsymbol{\eta}}^0)$ .

For $k=1,2,\ldots$ , we outline the construction of $\hat{X}_{k+1}^\ast| \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} $ and $\hat{\textbf{X}}_{k+1}| \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} $ given that $\hat{\textbf{X}}_k| \textbf{T}_{2:k} = \textbf{t}_{2:k} \sim \sum_{i=1}^{\hat{R}_k} \hat{\textbf{W}}_i \big(s_k;\,\hat{{\boldsymbol{\zeta}}}^k \big)$ , where $\hat{R}_k$ has support $\{1,2,\ldots, k\}$ , $\hat{P}_k^m = \mathbb{P} \big(\hat{R}_k = m \big)$ $(k=1,2,\ldots;\ m=1,2,\ldots,k)$ , and the $\hat{\textbf{W}}_i \big(s_k;\,\hat{{\boldsymbol{\zeta}}}^k \big)$ are i.i.d. according to $\hat{\textbf{W}} \big(s_k;\,\hat{{\boldsymbol{\zeta}}}^k \big)$ given by (3.23) and (3.24). That is,

(6.1) \begin{align} \hat{X}_k^\ast | \textbf{T}_{2:k} = \textbf{t}_{2:k} &\sim \text{NegBin} \big(\hat{R}_k, \pi_{s_k} \big),\\[-24pt] \nonumber \end{align}
(6.2) \begin{align} \hat{\textbf{X}}_k |\hat{X}_k^\ast = x^\ast, \textbf{T}_{2:k} = \textbf{t}_{2:k} &\sim \text{Multinomial} \big(x^\ast, \hat{{\boldsymbol{\zeta}}}^k \big).\end{align}

Let $\tilde{\textbf{X}}_{k+1} | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} $ denote the distribution of the number of individuals alive of each type immediately after the $(k+1)$ th detected death, given that immediately after the kth detected death, at time $s_k$ , the distribution of the number of individuals alive of each type is given by $\hat{\textbf{X}}_k | \textbf{T}_{2:k} = \textbf{t}_{2:k}$ defined in (6.1) and (6.2). We obtain the distribution of $\tilde{\textbf{X}}_{k+1} | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} $ using minor modifications of Lemmas 7 and 8, by arguments along similar lines to those used in the proof of Theorem 1 and [Reference Ball and Neal2, Section 4].

Suppose that $\hat{R}_k = j$ for some $j=1,2, \ldots, k$ . Then at time $s_k$ , the population consists of j i.i.d. family groups, where the number of individuals of each type within a family group is distributed according to $\hat{\textbf{W}} \big(s_k;\,\hat{{\boldsymbol{\zeta}}}^k \big)$ . (As in the proof of Theorem 1, a family group may contain no individuals.) One of these j family groups will be responsible for the next detected death in the population, at time $s_{k+1}$ , and the distribution of the number of individuals alive in that family group and their types are obtained from Lemma 8 by setting $\textbf{a} = \textbf{0}$ and ${\boldsymbol{\eta}}^{s_k} = \hat{{\boldsymbol{\zeta}}}^k$ . The other $j-1$ family groups will not experience a detected death in the interval $(s_k, s_{k+1}]$ , and the distribution of the number of individuals alive in these family groups and their types are obtained by setting $\textbf{a} = \textbf{0}$ and ${\boldsymbol{\eta}}^{s_k} = \hat{{\boldsymbol{\zeta}}}^k$ in Lemma 7.

For each of the family groups which do not experience a detected death in the interval $(s_k, s_{k+1}]$ , the distribution of the number of individuals alive at time $s_{k+1}$ is given by (5.25) with $t=s_k$ and $\tau = t_{k+1}$ , a mixture distribution of $G_{s_{k+1}}$ with probability $h (s_k;\,t_{k+1})$ and a point mass at 0 with probability $1- h (s_k;\,t_{k+1})$ , where $h(t;\,\tau)$ is defined in (5.26). Similarly, for the family group which does experience the death, the distribution of the number of individuals alive at time $s_{k+1}$ , immediately after the death, is, by (5.34), a mixture of the sum of two independent copies of $G_{s_{k+1}}$ with probability $\tilde{h} (s_k;\,t_{k+1})$ and $G_{s_{k+1}}$ with probability $1-\tilde{h} (s_k;\,t_{k+1})$ . Hence,

\begin{align}\left. \tilde{X}_{k+1}^\ast = \sum_{j=1}^J X_{k+1}^j \right| \hat{R}_k=l,T_{k+1} = t_{t+1} \sim \text{NegBin} \big(\tilde{R}_{k+1}, \pi_{s_{k+1}} \big), \nonumber\end{align}

where

(6.3) \begin{align} \tilde{R}_{k+1} | \hat{R}_k = l, T_{k+1} = t_{k+1} \sim 1 + \text{Bin} (1, \tilde{h} (s_k;\,t_{k+1}) ) + \text{Bin} (l-1, h (s_k;\,t_{k+1}) )\end{align}

and the two binomial random variables are independent. We observe that (6.3) is similar to [Reference Ball and Neal2, (4.15)], and hence, we follow [Reference Ball and Neal2, Lemma 4.5] in the derivation of the probability mass function of $\{\tilde{R}_{k+1} | \hat{R}_k = l, T_{k+1} = t_{k+1}\}$ .

Remember that we set ${\boldsymbol{\eta}}^{s_k} = \hat{{\boldsymbol{\zeta}}}^k$ and $L(s_k,\tau)$ is defined in (3.22). Therefore, from (5.17) and (5.27), with $\textbf{a} =0$ and noting from (3.7) that $b_i (t, \tau;\, \textbf{0}) = - \psi_i (t\,:\,\tau)$ , we have, for $\tau \geq 0$ , that

(6.4) \begin{align} U_E (s_k, \tau ;\; \textbf{0}) & = \frac{\pi_{s_k} (1 - \psi (s_k;\tau))}{\pi_{s_k+\tau} L(s_k,\tau)},\end{align}

and (also using (3.3))

(6.5) \begin{align} & U_D (s_k, \tau ;\; \textbf{0}) \nonumber \\ & \quad = \frac{\pi_{s_k}\left[\sum_{j=1}^J d_{s_k+\tau,j} \gamma_j \phi_j (s_k ;\tau) + (1-\pi_{s_k+\tau}) \sum_{j=1}^J d_{s_k+\tau,j} \gamma_j \left( \psi_j (s_k;\tau) - \eta_j^{s_k +\tau} \psi (s_k;\tau) \right) \right]}{\pi_{s_k+\tau}^2 L(s_k,\tau)}.\end{align}

Therefore, it follows from (6.4) and (6.5) that the probability density function of $T_{k+1}$ , given $\hat{\textbf{X}}_k$ satisfies (3.26), is

(6.6) \begin{align} \hat{f}_{T_{k+1}} ( \tau | \textbf{T}_{2:k} = \textbf{t}_{2:k}, \hat{R}_k =m ) = m \left[ \frac{\pi_{s_k} (1- \psi (s_k;\,\tau))}{\pi_{s_{k+1}}L(s_k,\tau)} \right]^{m-1} U_D (s_k, \tau;\; \textbf{0}) \qquad (\tau \geq 0).\end{align}

We use (6.3) and (6.6) to derive $\tilde{P}_{k+1}^j = \mathbb{P} (\tilde{R}_{k+1} =j |\textbf{T}_{2:k+1} = \textbf{t}_{2:k+1})$ , which is given by

(6.7) \begin{align} \tilde{P}_{k+1}^j &= \mathbb{P} (\tilde{R}_{k+1} = j | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1}) \nonumber \\[3pt] &= \sum_{l=1}^k \mathbb{P} (\hat{R}_{k+1} = j, \hat{R}_k = l | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} ) \nonumber \\[3pt] &= \sum_{l=1}^k \frac{\mathbb{P} (\hat{R}_{k+1} = j| \hat{R}_k = l, T_{k+1} = t_{k+1}) \hat{f}_{T_{k+1}} ( t_{k+1} |\textbf{T}_{2:k} = \textbf{t}_{2:k}, \hat{R}_k =l ) \hat{P}_k^l }{\hat{f}_{T_{k+1}} ( t_{k+1} |\textbf{T}_{2:k} = \textbf{t}_{2:k} )}. \end{align}

Using (6.6), we have that the denominator in (6.7) is given by

(6.8) \begin{align} \hat{f}_{T_{k+1}} ( t_{k+1} |\textbf{T}_{2:k} = \textbf{t}_{2:k} ) &= \sum_{m=1}^k \hat{f}_{T_{k+1}} ( t_{k+1} |\textbf{T}_{2:k} = \textbf{t}_{2:k}, \hat{R}_k =m ) \hat{P}_k^m \nonumber \\[3pt] &= \sum_{m=1}^k m \left[ \frac{\pi_{s_k} (1- \psi (s_k;\,t_{k+1}))}{\pi_{s_{k+1}} L(s_k,t_{k+1})} \right]^{m-1} U_D (s_k, t_{k+1} ;\, \textbf{0}) \hat{P}_k^m \nonumber \\[3pt] &= U_D (s_k, t_{k+1} ;\, \textbf{0}) \sum_{m=1}^k m \left[ \frac{\pi_{s_k} (1- \psi (s_k;\,t_{k+1}))}{\pi_{s_{k+1}} L(s_k, t_{k+1})} \right]^{m-1}\hat{P}_k^m. \end{align}

Similarly, the numerator in (6.7) is given by

(6.9) \begin{align} & \sum_{l=1}^k \mathbb{P} (\hat{R}_{k+1} = j| \hat{R}_k = l, T_{k+1} = t_{k+1}) \hat{f}_{T_{k+1}} ( t_{k+1} |\textbf{T}_{2:k} = \textbf{t}_{2:k}, \hat{R}_k =l ) \hat{P}_k^l \nonumber \\[3pt] & = \sum_{l=1}^k \left[\tilde{h} (s_k;\,t_{k+1}) \binom{l-1}{j-2} h(s_k;\,t_{k+1})^{j-2} \{1-h(s_k;\,t_{k+1}) \}^{l+1-j} \right. \nonumber \\[3pt] & \left. + \left\{ 1-\tilde{h} (s_k;\,t_{k+1})\right\} \binom{l-1}{j-1} h(s_k;\,t_{k+1})^{j-1} \{1-h(s_k;\,t_{k+1}) \}^{l-j} \right] \nonumber \\[3pt] & \times l \left[ \frac{\pi_{s_k} (1- \psi (s_k;\,t_{k+1}))}{\pi_{s_{k+1}} L(s_k, t_{k+1})} \right]^{l-1} U_D (s_k, t_{k+1};\, \textbf{0}) \hat{P}_k^l.\end{align}

We note that the term $U_D (s_k, t_{k+1};\, \textbf{0})$ appears both in the numerator, (6.9), and the denominator, (6.8), and hence cancels in the calculation of $\tilde{P}_{k+1}^j$ .

Using (5.26) and (3.3), summed over j, we have

\begin{align} h (s_k;\,t_{k+1}) \frac{\pi_{s_k} (1- \psi (s_k;\, t_{k+1}))}{\pi_{s_{k+1}} L(s_k, t_{k+1})} = \frac{\pi_{s_k} \phi (s_k;\,t_{k+1})}{\pi_{s_{k+1}} (1-\pi_{s_{k+1}}) L(s_k, t_{k+1})}, \nonumber\end{align}

where $\phi (s_k;\,t_{k+1}) = \sum_{j=1}^J \phi_j (s_k;\,t_{k+1})$ , and

\begin{align} (1-h (s_k;\,t_{k+1})) \frac{\pi_{s_k} (1- \psi (s_k;\, t_{k+1}))}{\pi_{s_{k+1}} L(s_k,t_{k+1})} = \frac{\pi_{s_k} \psi (s_k;\,t_{k+1})}{(1-\pi_{s_{k+1}} )L(s_k,t_{k+1})}. \nonumber\end{align}

It then follows from (6.7), by (6.8) and (6.9), given $\hat{\textbf{P}}^k$ , that $\tilde{\textbf{P}}^{k+1} = \left(\tilde{P}_{k+1}^1, \tilde{P}_{k+1}^2, \ldots, \tilde{P}_{k+1}^{k+1} \right)$ satisfies

(6.10) \begin{align} \tilde{\textbf{P}}_{k+1} = \frac{1}{K_k} \hat{\textbf{P}}_k \left[ \tilde{h} (s_k;\,t_{k+1}) \textbf{M}_k^1 (s_k;\,t_{k+1}) + \{1-\tilde{h} (s_k;\,t_{k+1}) \} \textbf{M}_k^2(s_k;\,t_{k+1}) \right],\end{align}

where $K_k$ is the normalising constant given by (3.28). We note that the right-hand side of (6.10) is identical to the right-hand side of (3.27), and hence $\left\{\tilde{X}_{k+1}^\ast | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1}\right\} \stackrel{D}{=} \left\{\hat{X}_{k+1}^\ast | \textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} \right\} $ .

We complete the construction of $\hat{\textbf{X}}_{k+1}$ by deriving $\hat{{\boldsymbol{\zeta}}}^{k+1}$ . For $j=1,2,\ldots, J$ , we set

(6.11) \begin{align} \hat{\zeta}_j^{k+1} = \frac{\mathbb{E} [ \tilde{X}_{k+1}^j |\textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} ]}{\mathbb{E} [ \tilde{X}_{k+1}^\ast |\textbf{T}_{2:k+1} = \textbf{t}_{2:k+1} ]}\end{align}

and show that $\hat{\zeta}_j^{k+1}$ satisfies (3.29).

Suppose that $\hat{R}_k = l$ , i.e. there are l family groups immediately after the kth detected death. We have that $l-1$ of these family groups will not produce the $(k+1)$ th detected death. For these family groups, by Lemma 7, the distribution of the number of individuals of each type alive immediately after the $(k+1)$ th detected death at time $s_{k+1}$ is given by $\textbf{W} (s_{k+1}; - {\boldsymbol{\psi}} (s_k, t_{k+1}))$ . By differentiating (3.4) with respect to $\theta_j$ and setting ${\boldsymbol{\theta}} = \textbf{1}$ , we have, for $\tilde{\textbf{a}} \in \mathbb{R}^J$ , that

(6.12) \begin{align} \mathbb{E} [ W_j (s_{k+1};\, \tilde{\textbf{a}})] &= \left. \frac{\partial \;}{\partial \theta_j} \left[ \frac{1+ \sum_{i=1}^J \theta_i \tilde{a}_i}{1 +\sum_{i=1}^J \tilde{a}_i} \times \frac{\pi_{s_{k+1}}}{1 - (1-\pi_{s_{k+1}}) \sum_{l=1}^J \theta_l \eta_l^{s_{k+1}}} \right] \right|_{{\boldsymbol{\theta}} =\textbf{1}} \nonumber \\[3pt] & = \left. \left[ \frac{\tilde{a}_j \pi_{s_{k+1}}}{(1 +\sum_{i=1}^J \tilde{a}_i) (1 - (1-\pi_{s_{k+1}}) \sum_{l=1}^J \theta_l \eta_l^{s_{k+1}})} \right. \right. \nonumber\\[3pt] & \left. \left. \qquad + \frac{1+ \sum_{i=1}^J \theta_i \tilde{a}_i}{1 +\sum_{i=1}^J \tilde{a}_i} \times \frac{ \eta_j^{s_{k+1}} \pi_{s_{k+1}} ({-}1)^2 (1-\pi_{s_{k+1}})}{[1 - (1-\pi_{s_{k+1}}) \sum_{l=1}^J \theta_l \eta_l^{s_{k+1}}]^2} \right] \right|_{{\boldsymbol{\theta}} =\textbf{1}} \nonumber \\[3pt] & = \eta_j^{s_{k+1}} \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}} + \frac{\tilde{a}_j}{1 +\sum_{i=1}^J \tilde{a}_i}.\end{align}

Hence, it follows that

(6.13) \begin{align} \mathbb{E} [ W_j (s_{k+1}; - {\boldsymbol{\psi}} (s_k, t_{k+1}))] = \eta_j^{s_{k+1}} \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}}- \frac{\psi_j (s_k;\, t_{k+1})}{1- \psi(s_k;\,t_{k+1})}.\end{align}

Similarly, we can derive the expected number of individuals of type j immediately after the $(k+1)$ th detected death at time $s_{k+1}$ in the family group which experienced the death, using (5.29) in Lemma 8 with $\textbf{c} = \textbf{C}_{s_k;t_{k+1}}$ defined in (3.19). By (6.12), it follows that

(6.14) \begin{align} \mathbb{E} [ W_j (s_{k+1};\, \textbf{0})] +\mathbb{E} [ W_j (s_{k+1};\, \textbf{C}_{s_k;t_{k+1}})] =2 \eta_j^{s_{k+1}} \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}}+ \frac{C_{s_k,t_{k+1}}^j}{1 +C_{s_k,t_{k+1}}^\ast}.\end{align}

It follows from (6.13), (6.14), and $\nu_k=\mathbb{E} [\hat{R}_k |\textbf{T}_{2:k} = \textbf{t}_{2:k}]$ that

(6.15) \begin{align} \mathbb{E} [ \tilde{X}_{k+1}^j |\textbf{T}_{2:k+1} = \textbf{t}_{2:k+1}] = (\nu_k+1) \eta_j^{s_{k+1}} \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}} - (\nu_k -1) \frac{\psi_j (s_k;\, t_{k+1})}{1- \psi(s_k;\,t_{k+1})} + \frac{C_{s_k; t_{k+1}}^j}{1 + C_{s_k;t_{k+1}}^\ast }. \end{align}

By summing (6.15) over j, we have that

(6.16) \begin{align} \mathbb{E} [ \tilde{X}_{k+1}^\ast |\textbf{T}_{2:k+1} = \textbf{t}_{2:k+1}] = (\nu_k+1) \frac{1-\pi_{s_{k+1}}}{\pi_{s_{k+1}}} - (\nu_k -1) \frac{\psi (s_k;\, t_{k+1})}{1- \psi(s_k;\,t_{k+1})} + \frac{C_{s_k; t_{k+1}}^\ast}{1 + C_{s_k;t_{k+1}}^\ast }, \nonumber \\[3pt] \end{align}

and (3.29) follows immediately from inserting (6.15) and (6.16) into (6.11).

7. Numerical results

In this section we demonstrate the usefulness of the approximate distribution given in Section 3.5.2 using an example parameter set with time-changing parameters, birth rate, and probability of detecting a death. We simulated branching processes with $L \sim \text{Gamma} (3,3)$ up until the 500th detected death. For $k=1,2,\ldots, 500$ , let $\alpha_k$ and $\varepsilon_k$ denote the birth rate and the probability of detecting the death of a type-3 individual, respectively, between the $(k-1)$ th and kth detected death. (Since $\gamma_1 =\gamma_2 =0$ it suffices to state the probability of detecting a death of type-3 individuals.) The birth rate and detection probability change after every 100 detected deaths, with $(\alpha_k,\varepsilon_k) = (2.5,0.3)$ , $(\alpha_{100+k},\varepsilon_{100+k}) = (1.5,0.4)$ , $(\alpha_{200+k},\varepsilon_{200+k}) =(1.0,0.5)$ , $(\alpha_{300+k},\varepsilon_{300+k}) = (0.6,0.6)$ , and $(\alpha_{400+k},\varepsilon_{400+k}) = (0.4,0.7)$ $(k=1,2,\ldots,100)$ . Thus the birth rate is decreasing over time, with the branching process going from being supercritical to subcritical with the detection probability increasing. (Note that since $\mathbb{E} [L] =1$ , the basic reproduction number is equal to the birth rate.) In Figure 3, we plot the number of individuals alive, of each type and the total number, immediately after a detected death against the number of detected deaths from a single realisation of the branching process, along with the median ( $\hat{X}_k^j, j=1,2,3; \; \hat{X}_k^\ast)$ of the approximate distribution derived in Section 3.5.2. We also include the 5% and 95% quantiles of the approximate distribution, denoted by $l_k$ and $u_k$ , with $[l_k, u_k]$ shaded for $k=1,2, \ldots, 500$ . Throughout, we observe good agreement between the median of the approximate distribution and the actual number of individuals alive of each type.

Figure 3. Number of individuals alive (solid line) and median of approximate distribution $\hat{X}_k^z |\textbf{t}_{2:k}$ $(z=\ast,1,2,3)$ (dashed line) up to the 500th detected death with $L \sim \text{Gamma} (3,3)$ , $(\alpha_k,\varepsilon_k) = (2.5,0.3)$ , $(\alpha_{100+k},\epsilon_{100+k}) = (1.5,0.4)$ , $(\alpha_{200+k},\varepsilon_{200+k}) =(1.0,0.5)$ , $(\alpha_{300+k},\varepsilon_{300+k}) = (0.6,0.6)$ , and $(\alpha_{400+k},\varepsilon_{400+k}) = (0.4,0.7)$ $(k=1,2,\ldots,100)$ . The shaded area represents the probability mass between the 5% and 95% quantiles of $\hat{X}_k^z |\textbf{t}_{2:k}$ . Top left: total size of the population. Top right: number of Type 1 individuals. Bottom left: number of Type 2 individuals. Bottom right: number of Type 3 individuals.

Figure 4. P–P plots based on 100 simulations of the ordered quantiles of $\tilde{\textbf{u}}_{{\cdot},500,j}$ $(j=\ast,1,2,3)$ , where $L \sim \text{Gamma} (3,3)$ , $(\alpha_k,\varepsilon_k) = (2.5,0.3)$ , $(\alpha_{100+k},\varepsilon_{100+k}) = (1.5,0.4)$ , $(\alpha_{200+k},\varepsilon_{200+k}) =(1.0,0.5)$ , $(\alpha_{300+k},\varepsilon_{300+k}) = (0.6,0.6)$ , and $(\alpha_{400+k},\varepsilon_{400+k}) = (0.4,0.7)$ $(k=1,2,\ldots,100)$ . Top left: total number of individuals alive. Top right: number of Type 1 individuals alive. Bottom left: number of Type 2 individuals alive. Bottom right: number of Type 3 individuals alive.

We now turn to assessing the performance of the approximate distribution based on 100 realisations of the branching process described above. For $i=1,2,\ldots, 100$ , let $\textbf{t}_{2:500}^i$ denote the inter-arrival times of the detected deaths in simulation i, and let $x_{i,k,j}$ $(k=1,2,\ldots, 500; j=1,2,3,\ast)$ denote the number of individuals of type j alive (with $x_{i,k,\ast} =\sum_{j=1}^3 x_{i,k,j}$ ) immediately after the kth detected death in the ith simulation. (Note that we include only simulations which reach at least 500 detected deaths.) For $i=1,2,\ldots,100$ , $k=1,2,\ldots,500$ , and $j=1,2,3, \ast$ , let

(7.1) \begin{align} u_{i,k,j} = 0.5 [ \mathbb{P} (\hat{X}_k^j < x_{i,k,j} | \textbf{T}_{2:k} = \textbf{t}_{2:k}^i) + \mathbb{P} (\hat{X}_k^j \leq x_{i,k,j} | \textbf{T}_{2:k} = \textbf{t}_{2:k}^i)],\end{align}

the midpoint of the left- and right-hand limits of the cumulative distribution function of $\hat{X}_k^j| \textbf{T}_{2:k} = \textbf{t}_{2:k}^i$ evaluated at $x_{i,k,j}$ . Let $\textbf{u}_{{\cdot},k,j} = (u_{1,500,j}, u_{2,k,j}, \ldots, u_{100,k,j})$ , and let $\tilde{u}_{i,k,j}$ denote the ith-smallest element of $\textbf{u}_{{\cdot},k,j}$ with $\tilde{\textbf{u}}_{{\cdot},k,j} = (\tilde{u}_{1,500,j}, \tilde{u}_{2,k,j}, \ldots, \tilde{u}_{100,k,j})$ . Under Section 3.5.2, the order statistics $\tilde{\textbf{u}}_{{\cdot},k,j}$ are distributed according to the order statistics of $U \sim U(0,1)$ . In Figure 4, we present P–P plots for $\tilde{\textbf{u}}_{{\cdot},500,j}$ $(j=1,2,3,\ast)$ against the quantiles of $U \sim U(0,1)$ . The P–P plots demonstrate excellent performance of the approximate distribution for number of each type alive and the total number alive after the 500th detected death, with similar performance observed for other values of k.

In the supplementary material two more examples are provided, including an epidemic example where the birth rate in the branching process is adjusted to take account of the expected number of non-susceptible individuals in the population in a similar manner to [Reference Ball and Neal2, Section 7]. The results of the supplementary material suggest that the approximate distribution tends to become worse as the number of phases in the Erlang distribution increases and the more dramatic the changes in parameters, especially the detection probability.

Appendix A. Proof of Lemma 2.

Note from (3.7) that for $i=1,2,\ldots,J$ , $b_i \; (=b_i (t, \tau ;\,\textbf{a}) )> - (1-\pi_{t+\tau}) \eta_i^{t+\tau}$ is equivalent to

(A.1) \begin{align} & - \psi_i (t;\,\tau) + \sum_{h=1}^J a_h \left[ \zeta (t;\,\tau) p_{hi} (t, 0, \tau) - \psi_i (t;\,\tau) q_h(t, 0, \tau) \right] \nonumber \\[3pt] & \qquad > -(1- \pi_{t+\tau}) \eta_i^{t+\tau} \left[1 + \sum_{h=1}^J a_h q_h(t, 0, \tau) \right].\end{align}

By (3.3), $\phi_i (t;\,\tau) = (1-\pi_{t+\tau}) \eta_i^{t+\tau} - \psi_i (t;\,\tau)$ , so we have that (A.1) is equivalent to

(A.2) \begin{align} \phi_i (t;\,\tau) > \sum_{h=1}^J a_h \left[ - \zeta (t;\,\tau) p_{hi} (t, 0, \tau) - \phi_i (t;\,\tau) q_h(t, 0, \tau) \right].\end{align}

Since $ - \zeta (t;\,\tau) p_{hi} (t, 0, \tau) - \phi_i (t;\,\tau)q_h(t, 0, \tau) < 0$ $(h=1,2,\ldots, J)$ and $a_h> - (1-\pi_t) \eta_h^t$ , we have that

(A.3) \begin{align} & \sum_{h=1}^J a_h \left[ - \zeta (t;\,\tau) p_{hi} (t, 0, \tau) - \phi_i (t;\,\tau) q_h(t, 0, \tau) \right] \nonumber \\[3pt] & \qquad < \sum_{h=1}^J \left[- (1-\pi_t ) \eta_h^t \right] \left[ - \zeta (t;\,\tau) p_{hi} (t, 0, \tau) - \phi_i (t;\,\tau) q_h(t, 0, \tau) \right] \nonumber \\[3pt] & \qquad = (1- \pi_t) \zeta (t;\,\tau) \sum_{h=1}^J \eta_h^t p_{hi} (t, 0, \tau)+ \phi_i (t;\,\tau) (1-\pi_t ) \sum_{h=1}^J \eta_h^t q_h(t, 0, \tau).\end{align}

Using (3.2) and (3.3), it is straightforward to show that the right-hand side of (A.3) is equal to $\phi_i (t;\,\tau)$ . Therefore (A.2), and consequently $b_i > - (1-\pi_{t+\tau}) \eta_i^{t+\tau}$ $(i=1,2,\ldots, J)$ , both hold.

Given that $b_j > - (1-\pi_{t+\tau}) \eta_j^{t+\tau}$ $(j=1,2,\ldots, J)$ , the denominator in (3.8) is positive. Using (3.8), a little algebra shows that for $i=1,2,\ldots, J$ , the inequality $c_i > - (1- \pi_{t+\tau}) \eta_i^{t+\tau}$ is equivalent to

\begin{align*}\left( b_i + (1-\pi_{t+\tau}) \eta_i^{t+\tau} \right) \sum_{j=1}^J d_{t+\tau,j} \gamma_j \eta_j^{t+\tau} \gt 0,\end{align*}

which clearly holds.

Funding information

There are no funding bodies to thank in relation to the creation of this article.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/apr.2024.65.

References

Asmussen, S. (2003). Applied Probability and Queues, 2nd edn. Springer, New York.Google Scholar
Ball, F. and Neal, P. (2023). The size of a Markovian SIR epidemic given only removal data. Adv. Appl. Prob. 55, 895926.CrossRefGoogle Scholar
Black, A. J. and Ross, J. V. (2015). Computation of epidemic final size distributions. J. Theoret. Biol. 367, 159165.CrossRefGoogle ScholarPubMed
Geiger, J. and Kersting, G. (1997). Depth-first search of random trees, and Poisson point processes. In Classical and Modern Branching Processes, Springer, New York, pp. 111126.CrossRefGoogle Scholar
House, T., Ross, J. V. and Sirl, D. (2013). How big is an outbreak likely to be? Methods for epidemic final-size calculation. Proc. R. Soc. A 469, article no. 20120436, 22 pp.CrossRefGoogle Scholar
Lambert, A. and Trapman, P. (2013). Splitting trees stopped when the first clock rings and Vervaat’s transformation. J. Appl. Prob. 50, 208227.CrossRefGoogle Scholar
Lefèvre, C. and Picard, P. (2017). On the outcome of epidemics with detections. J. Appl. Prob. 54, 890904.CrossRefGoogle Scholar
Lefèvre, C., Picard, P. and Utev, S. (2020). On branching models with alarm triggerings. J. Appl. Prob. 57, 734759.CrossRefGoogle Scholar
Trapman, P. and Bootsma, M. C. J. (2009). A useful relationship between epidemiology and queueing theory: the distribution of the number of infectives at the moment of the first detection. Math Biosci. 219, 1522.CrossRefGoogle ScholarPubMed
Walker, J. N., Black, A. J. and Ross, J. V. (2019). Bayesian model discrimination for partially-observed epidemic models. Math Biosci. 317, article no. 108266, 13 pp.CrossRefGoogle ScholarPubMed
Whittle, P. (1955). The outcome of a stochastic epidemic—a note on Bailey’s paper. Biometrika 42, 116122.Google Scholar
Figure 0

Figure 1. Exploration process from time t to $t+ \tau$ in three stages, A–C, with one progenitor. Horizontal dotted lines denote start (t) and end $(t+\tau)$ points. Vertical lines denote lifetimes of individuals, with black lines denoting explored lifetimes (births revealed) and grey lines denoting unexplored lifetimes. Black squares denote births, with horizontal dashed lines identifying parent–child. Open circles denote undetected deaths. Note that individuals are revealed in the exploration process from left to right.

Figure 1

Figure 2. The JCCP associated with the branching process in Figure 1C. The open circles denote potential detected deaths which in the given realisation of the branching process are all undetected.

Figure 2

Figure 3. Number of individuals alive (solid line) and median of approximate distribution $\hat{X}_k^z |\textbf{t}_{2:k}$$(z=\ast,1,2,3)$ (dashed line) up to the 500th detected death with $L \sim \text{Gamma} (3,3)$, $(\alpha_k,\varepsilon_k) = (2.5,0.3)$, $(\alpha_{100+k},\epsilon_{100+k}) = (1.5,0.4)$, $(\alpha_{200+k},\varepsilon_{200+k}) =(1.0,0.5)$, $(\alpha_{300+k},\varepsilon_{300+k}) = (0.6,0.6)$, and $(\alpha_{400+k},\varepsilon_{400+k}) = (0.4,0.7)$$(k=1,2,\ldots,100)$. The shaded area represents the probability mass between the 5% and 95% quantiles of $\hat{X}_k^z |\textbf{t}_{2:k}$. Top left: total size of the population. Top right: number of Type 1 individuals. Bottom left: number of Type 2 individuals. Bottom right: number of Type 3 individuals.

Figure 3

Figure 4. P–P plots based on 100 simulations of the ordered quantiles of $\tilde{\textbf{u}}_{{\cdot},500,j}$$(j=\ast,1,2,3)$, where $L \sim \text{Gamma} (3,3)$, $(\alpha_k,\varepsilon_k) = (2.5,0.3)$, $(\alpha_{100+k},\varepsilon_{100+k}) = (1.5,0.4)$, $(\alpha_{200+k},\varepsilon_{200+k}) =(1.0,0.5)$, $(\alpha_{300+k},\varepsilon_{300+k}) = (0.6,0.6)$, and $(\alpha_{400+k},\varepsilon_{400+k}) = (0.4,0.7)$$(k=1,2,\ldots,100)$. Top left: total number of individuals alive. Top right: number of Type 1 individuals alive. Bottom left: number of Type 2 individuals alive. Bottom right: number of Type 3 individuals alive.

Supplementary material: File

Ball and Neal supplementary material 1

Ball and Neal supplementary material
Download Ball and Neal supplementary material 1(File)
File 655.9 KB
Supplementary material: File

Ball and Neal supplementary material 2

Ball and Neal supplementary material
Download Ball and Neal supplementary material 2(File)
File 13.8 KB