Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-02-11T13:14:57.456Z Has data issue: false hasContentIssue false

Exact sampling for some multi-dimensional queueing models with renewal input

Published online by Cambridge University Press:  15 November 2019

Jose Blanchet*
Affiliation:
Stanford University
Yanan Pei*
Affiliation:
Columbia University
Karl Sigman*
Affiliation:
Columbia University
*
*Postal address: Department of Management Science and Engineering, Stanford University, Stanford, CA 94305, USA.
**Postal address: Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027, USA.
**Postal address: Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027, USA.
Rights & Permissions [Opens in a new window]

Abstract

Using a result of Blanchet and Wallwater (2015) for exactly simulating the maximum of a negative drift random walk queue endowed with independent and identically distributed (i.i.d.) increments, we extend it to a multi-dimensional setting and then we give a new algorithm for simulating exactly the stationary distribution of a first-in–first-out (FIFO) multi-server queue in which the arrival process is a general renewal process and the service times are i.i.d.: the FIFO GI/GI/c queue with $ 2 \leq c \lt \infty$ . Our method utilizes dominated coupling from the past (DCFP) as well as the random assignment (RA) discipline, and complements the earlier work in which Poisson arrivals were assumed, such as the recent work of Connor and Kendall (2015). We also consider the models in continuous time, and show that with mild further assumptions, the exact simulation of those stationary distributions can also be achieved. We also give, using our FIFO algorithm, a new exact simulation algorithm for the stationary distribution of the infinite server case, the GI/GI/ $\infty$ model. Finally, we even show how to handle fork–join queues, in which each arriving customer brings c jobs, one for each server.

Type
Original Article
Copyright
© Applied Probability Trust 2019 

1. Introduction

In recent years, the method of exact simulation has evolved as a powerful way of sampling from stationary distributions of queueing models for which such distributions cannot be derived explicitly. The main method itself is referred to as coupling from the past (CFP), as introduced in Propp and Wilson [Reference Propp and Wilson17] for finite-state discrete-time Markov chains. Since then, the method has been generalized to cover general state-space Markov chains by using dominating processes; this is known as dominated coupling from the past (DCFP), as in Kendall [Reference Kendall and Møller15]. The main purpose of such methods is to produce a copy by simulation that has exactly (not approximately) the stationary distribution desired. These methods involve simulating processes backwards in time. In the present paper we consider using such methods for the first-in–first-out (FIFO) multi-server queue, denoted as the FIFO GI/GI/c queue, $ 2\le c<\infty$ , where c denotes the number of servers working in parallel, and arriving customers wait in one common queue (line).

The first algorithms yielding exact simulation in stationarity of the FIFO GI/GI/c queue are found in [Reference Sigman20] and [Reference Sigman21], in which Poisson arrivals are assumed, i.e. the M/G/c case. In [Reference Sigman20], a DCFP method is used, but the strong condition of super-stability is assumed, $ \rho<1$ , instead of $ \rho{<}c $ ( $ \rho=\mathbb{E}[S]/\mathbb{E}[T]$ , where T and S denote an interarrival time and service time respectively; stability only requires that $ \rho{<}c$ ). As a dominating process, the M/G/1 queue is used under processor sharing (PS) together (key) with its time-reversibility properties. In PS, there is no queue: all customers are served simultaneously but at a rate $ 1/n$ when there are $ n\ge1$ customers in service. Then in [Reference Sigman21], the general $ \rho{<}c$ case is covered by using a forward-in-time regenerative method (a general method developed in [Reference Asmussen, Glynn and Thorisson2]) and using the M/G/c model under a random assignment (RA) discipline as an upper bound – a model in which each arrival joins the ith queue with probability $ 1/c$ independently. (The general forward-in-time regenerative method in [Reference Asmussen, Glynn and Thorisson2] unfortunately always yields infinite expected termination time.) Then in [Reference Connor and Kendall8], Connor and Kendall generalized the DCFP/PS method in [Reference Sigman20] by using the RA model. They accomplished this by first exactly simulating the RA model in stationarity backwards in time under PS at each node, then reconstructing it to obtain the RA model with FIFO at each node and doing so in such a way that a sample path upper bound for the FIFO M/G/c is achieved.

As for renewal arrivals (the general FIFO GI/GI/c queue considered here) the methods used above break down for various reasons, primarily because while under Poisson arrivals the c stations under RA become independent, they are not independent for general renewal arrivals. Also, the time-reversibility property of PS no longer holds, and nor does Poisson arrivals see time averages (PASTA). Finally, under general renewal arrivals, the system may never empty once it begins operating. New methods are needed. Blanchet, Dong, and Pei [Reference Blanchet, Dong and Pei7] solved the problem by utilizing a vacation model as an upper bound. In the present paper, however, we utilize DCFP by directly simulating the RA model in reverse time (under FIFO at each node). Our method involves extending, to a multi-dimensional setting, a recent result of Blanchet and Wallwater [Reference Blanchet and Wallwater6] for exactly simulating the maximum of a negative drift random walk endowed with independent and identically distributed (i.i.d.) increments. We also remark on how our approach can lead to new results for other models too, such as multi-server queues under the last-in–first-out (LIFO) discipline, or the randomly choose next discipline, and even fork–join models (also called split and match models).

2. The FIFO GI/GI/c model

Here we set up the classic first-in–first-out (of queue) (FIFO) multi-server queueing model and its associated Markov chain known as the Kiefer–Wolfowitz workload vector (for further details, see e.g. [Reference Asmussen1, Chapter 12, page 341], and the original paper [Reference Kiefer and Wolfowitz16]). In what follows, as input to a c-server in a parallel multi-server queue with $ c\ge2$ , we have i.i.d. service times $ \{S_{n}\colon n\ge0\}$ distributed as $ G(x) = \mathbb{P}(S \le x),\ x\ge0$ , with finite and non-zero mean $ 0<\mathbb{E}[S]=1/\mu<\infty$ . Independently, the arrival times $ \{t_{n}\colon n\ge0\}$ ( $ t_{0}=0$ ) of customers to the model form a renewal process with i.i.d. interarrival times $ T_{n} = t_{n+1}-t_{n}$ , $ n\ge0$ distributed as $ A(x)=\mathbb{P}(T\le x),\ x\ge0$ , and finite non-zero arrival rate $ 0<\lambda={\mathbb{E}[T]}^{-1}<\infty$ . The FIFO GI/GI/c model has only one queue, and customers upon arrival join the end of the queue and then attend service at the station that becomes free first (just like a United States post office). Because of the FIFO assumption, the nth service time initiated for use by a server, $ S_{n}$ , is used on the nth arrival (they arrive at time $ t_{n}$ ), so one could equivalently imagine/assume that $ S_{n}$ is brought to the system by the nth arrival. In this equivalent form, we say that at time $ t_{n}$ , the workload in the system has jumped upward by the amount $ S_{n}$ . Each server is identical in that they process service times at rate 1. We let $ \mathbf{W}_{n}= (W_{n}(1),\ldots, W_{n} (c))^{T}$ denote the Kiefer–Wolfowitz workload vector, defined recursively by

(1) \begin{equation}\mathbf{W}_{n+1}=\mathcal{R} (\mathbf{W}_{n} +S_{n} \mathbf{e}-T_{n}\mathbf{f})^{+},\quad n\ge0,\end{equation}

where $ \mathbf{e}=(1,0,\ldots0)^{T}$ , $ \mathbf{f}=(1,1,\ldots, 1)^{T}$ , $ \mathcal{R}$ places a vector in ascending order, and + takes the positive part of each coordinate. The vector $ \mathbf{W}_{n}$ shows, in ascending order, how much work at time $ t_{n}$ each server will process from among all work in the system at that time not including $ S_{n}$ . Letting $ C_{n}$ denote the nth arriving customer, $ D_{n}=W_{n}(1)$ is the customer delay in the queue of $ C_{n}$ , because the server with the least amount of work will be the first to empty in front of $ C_{n}$ . Recursion (1) defines a c-dimensional Markov chain due to the given i.i.d. assumptions. The great importance of the recursion is that it yields $ \{D_{n}\colon n\ge0\}$ , which is thus a function of a Markov chain.

With stability condition $ \rho=\lambda/\mu{<}c$ , it is well known that $ \mathbf{W}_{n}$ converges in distribution as $ n\to\infty$ to a proper stationary distribution (hence so does $ D_{n}$ ). Let $ \pi$ denote this stationary distribution. Our main objective in the present paper is to provide a simulation algorithm for sampling exactly from $ \pi$ .

3. The RA GI/GI/c model

Given a c-server queueing system, the random assignment model (RA) is the case when each of the c servers forms its own FIFO single-server queue, and each arrival to the system, independent of the past, randomly chooses queue i to join with equal probability $ 1/c,\ 1\le i\le c$ . In the GI/GI/c case, we refer to this as the RA GI/GI/c model. The following is a special case of Lemma 1.3 of [Reference Asmussen1, page 342]. (Such results and others even more general are based on [Reference Wolff22], [Reference Foss11], and [Reference Foss and Chernova12].)

Lemma 1. Let $ Q_{F}(t)$ denote the total number of customers in the system at time $ t\ge0$ for the FIFO GI/GI/c model, and let $ Q_{RA}(t)$ denote the total number of customers in the system at time $ t\ge0$ for the corresponding RA GI/GI/c model in which both models are initially empty and fed with exactly the same input of renewal arrivals $ \{t_{n}\colon n\ge0\}$ and i.i.d. service times $ \{S_{n}\colon n\ge0\}$ . Assume further that for both models the service times are used by the servers in the order in which service initiations occur ( $ S_{n}$ is the service time used for the nth such initiation). Then

(2) \begin{equation} \mathbb{P}(Q_{F}(t)\le Q_{RA}(t)\ {for\ all}\ t\ge0)=1.\end{equation}

The importance of Lemma 1 is that it allows us to jointly simulate versions of the two stochastic processes $ \{Q_{F}(t)\colon t\ge0\}$ and $ \{Q_{RA}(t)\colon t\ge0\}$ while achieving a coupling such that (2) holds. In particular, whenever an arrival finds the RA model empty, the FIFO model is found empty as well. (But we need to impose further conditions if we wish to ensure that indeed the RA GI/GI/c queue will empty with certainty.) Letting time t be sampled at arrival times of customers, $ \{t_{n} \colon n\ge0\}$ , we thus also have

(3) \begin{equation} \mathbb{P} (Q_{F}(t_{n}\!-\!)\le Q_{RA}(t_{n}\!-\!)\ \hbox{for all}\ n\ge0) =1.\end{equation}

In other words, the total number in the system as found by the nth arrival is sample-path ordered as well. Note that for the FIFO model, the nth arriving customer $ C_{n}$ initiates the nth service since FIFO means ‘first-in-queue–first-out-of-queue’, where by ‘queue’ we mean the line before entering service. This means that for the FIFO model we can attach $ S_{n}$ to $ C_{n}$ upon arrival if we so wish when applying Lemma 1. For the RA model, however, customers are not served in the order in which they arrive. For example, consider $ c=2$ servers (system initially empty) and suppose $ C_{1}$ is assigned to node 1 with service time $ S_{1}$ , and $ C_{2}$ is also assigned to node 1 (before $ C_{1}$ departs) with service time $ S_{2}$ . Meanwhile, before $ C_{1}$ departs, suppose $ C_{3}$ arrives and is assigned to the empty node 2 with service time $ S_{3}$ . Then $ S_{3}$ is used for the second service initiation. For RA, the service times in order of initiation are a random permutation of the originally assigned $ \{S_{n}\}$ .

To use Lemma 1, it is crucial to simply let the server hand out service times one at a time when they are needed for a service initiation. Thus, customers waiting in a queue before starting service do not have a service time assigned until they enter service. In simulation terminology, this amounts to generating the service times in order of when they are needed.

Define the total workload at any time t as the sum of all whole and remaining service times in the system at time t. One disadvantage of generating service times only when they are needed is that it does not allow the workload to be defined – only the amount of work in service. To get around this if need be, one can simply generate service times upon arrival of customers, and give them to the server to be used in order of service initiation. The point is that when $ C_{n}$ arrives, the total work in the system jumps up by the amount $ S_{n}$ . But $ S_{n}$ is not assigned to $ C_{n}$ : it is assigned (perhaps later) to whichever customer initiates the nth service. This allows Lemma 1 to hold true for the total amount of work in the system. If we let $ \{V_{F}(t)\colon t\ge0\}$ and $ \{V_{RA}(t)\colon t\ge0\}$ denote the total workload in the two models with the service times used in the manner just explained, then in addition to Lemma 1 we have

(4) \begin{align} \mathbb{P} (V_{F}(t)\le V_{RA}(t)\ \hbox{for all}\ t\ge0) = 1,\end{align}\vspace*{-15pt}
(5) \begin{align} \mathbb{P} (V_{F}(t_{n}\!-\!)\le V_{RA}(t_{n}\!-\!)\ \hbox{for all}\ n\ge0) = 1.\end{align}

It is important, however, to note that what one cannot do is define workload at the individual nodes i by doing this, because that forces us to assign $ S_{n}$ to $ C_{n}$ so that workload at the node that $ C_{n}$ attends (i say) jumps by $ S_{n}$ and $ C_{n}$ enters service using $ S_{n}$ ; that destroys the proper coupling needed to obtain Lemma 1. We can only handle the total (sum over all c nodes) workload. In the present paper, our use of Lemma 1 is via a kind of reversal.

Lemma 2. Let $ \{S_{n}^{\prime}\}$ be an i.i.d. sequence of service times distributed as G, and assign $ S_{n}^{\prime}$ to $ C_{n}$ in the RA model. Define $ S_{n}$ as the service time used in the nth service initiation. Then $ \{S_{n}\}$ is also i.i.d. distributed as G.

Proof. The key is to note that we are reordering based only on the order in which service times begin to be used, not when they are completed (which would thus introduce a bias). The service time chosen for the next initiation either enters service immediately (e.g. it is one that is routed to an empty queue by an arriving customer) or is chosen from among those waiting in lines, and all those waiting are i.i.d. distributed as G. Let $ \skew2\hat{t}_{n}$ denote the time at which the nth service initiation begins. The value $ S_{n}$ of the nth service time chosen (at time $ \skew2\hat{t}_{n}$ ) by a server is independent of the past service time values used before time $ \skew2\hat{t}_{n}$ , and is distributed as G (the choice of service time chosen as the next to be used is not based on the value of the service time, only its position in the lines). Letting $ k(n)=$ the index of the $ \{S^{\prime}_{n}\}$ that is chosen, i.e. $ S_{n}= S^{\prime}_{k(n)}$ , it is this index (a random variable) that depends on the past, but the value $ S_{n}$ is independent of k(n) since it is a new one. Thus the $ \{S_{n}\}$ are i.i.d. distributed as G.

The point of the above Lemma 2 is that we can, if we so wish, simulate the RA model by assigning $ S_{n}^{\prime}$ to $ C_{n}$ (to be used as their service time), but then assigning $ S_{n}$ , i.e. $ S^{\prime}_{k(n)}$ , to $ C_{n}$ in the FIFO model. By doing so the requirements of Lemma 1 are satisfied and (2), (3), (4) and (5) hold. Interestingly, however, it is not possible to first simulate the RA model up to a fixed time t, and then stop and reconstruct the FIFO model up to this time t. At time t, there may still be RA customers waiting in lines and hence not enough of the $ S_{n}$ have been determined yet to construct the FIFO model. But all we have to do, if need be, is to continue the simulation of the RA model beyond t until enough $ S_{n}$ have been determined to construct fully the FIFO model up to time t.

4. Simulating exactly from the stationary distribution of the RA GI/GI/c model

By Lemma 1, the RA GI/GI/c queue, which shares the same arrival stream $ \{t_{n}\colon n\ge0\} (t_{0}=0)$ and the same service times in order of service initiations $ \{S_{n}\colon n\ge0\}$ , will serve as a sample path upper bound (in terms of total number of customers in the system and total workload) of the target FIFO GI/GI/c queue. Independent of $ \{T_{n}\colon n\ge0\}$ and $ \{S_{n}\colon n\ge0\}$ , we let $ \{U_{n}\colon n\ge0\}$ be an i.i.d. sequence of random variables from discrete uniform distribution on $ \{1,2,\ldots,c\}$ ; $ U_{n}$ represents the choice that customer $ C_{n}$ makes about which single-server queue to join under RA discipline. Let $ \mathbf{V}_{n}= (V_{n}(1),\ldots,V_{n}(c))^{T}$ denote the workload vector as found by $ C_{n}$ in the RA GI/GI/c model, and for $ i=1,\ldots,c$ , $ V_{n}(i)$ is the waiting time of the $ C_{n}$ if he chooses to join the FIFO single-server queue of server i. So, $ V_{0}(i)=0$ and

(6) \begin{equation} V_{n+1}(i)= (V_{n}(i)+S_{n}I (U_{n}=i) -T_{n})^{+}, \quad n\ge0.\end{equation}

These c processes are dependent through the common arrival times $ \{t_{n}\colon n\ge0\}$ (equivalently common interarrival times $ \{T_{n}\colon n\ge0\}$ ) and the common $ \{U_{n}\colon n\ge0\}$ random variables. Because of all the i.i.d. assumptions, $ \{\mathbf{V}_{n}\colon n\ge0\}$ forms a Markov chain. Define $ \tilde{\mathbf{S}}_{n} = (S_{n}I (U_{n}=1) ,\ldots,S_{n}I (U_{n}=c))^{T}$ and $ \mathbf{T}_{n} =T_{n}\mathbf{f}$ . Then we can express (6) in vector form as

(7) \begin{equation} \mathbf{V}_{n+1} = (\mathbf{V}_{n} +\tilde{\mathbf{S}}_{n} -\mathbf{T}_{n})^{+}, \quad n\ge0.\end{equation}

Here $ \mathbf{V}_{n}$ uses the same interarrival times $ \{T_{n}\colon n\ge0\}$ and service times $ \{S_{n}\colon n\ge0\}$ as we fed $ \mathbf{W}_{n}$ in (1). However, the coordinates of $ \mathbf{V}_{n}$ are not in ascending order, though all of them are non-negative.

Each node i as expressed in (6) can be viewed as a FIFO GI/GI/1 queue with common renewal arrival process $ \{t_{n}\colon n\ge0\}$ , but with i.i.d. service times $ \{\tilde{S}_{n}(i)=S_{n}I(U_{n}=i)\colon n\ge0\}$ . Across i, the service times $ (\tilde{S}_{n}(1),\ldots,\tilde{S}_{n}(c))$ are not independent, but they are identically distributed: marginally, with probability $ 1/c$ , $ \tilde{S}_{n}(i)$ is distributed as G, and with probability $ (c-1)/c$ it is distributed as the point mass at 0, i.e. $ \mathbb{E}[\tilde{S}(i)] = \mathbb{E}[S]/c$ . The point here is that we are not treating node i as a single-server queue endowed only with its own arrivals (a thinning of the $ \{t_{n}\colon n\ge0\}$ sequence) and its own service times i.i.d. distributed as G. Defining i.i.d. increments $ \Delta_{n}(i) = \tilde{S}_{n}(i)-T_{n}$ for $ n\ge0$ , each node i has an associated negative drift random walk $ \{R_{n}(i) \colon n\ge0\}$ with $ R_{0}(i)=0$ and

\begin{equation*} R_{n} (i)=\sum_{j=1}^{n} \Delta_{j} (i), \quad n\ge1.\end{equation*}

With $ \rho= \lambda \mathbb{E} [S] {<}c$ , we define $ \rho_{i}=\lambda \mathbb{E} [\tilde{S}(i)] =\lambda \mathbb{E} [S] /c=\rho/c {<} 1$ ; equivalently $ \mathbb{E} [\Delta(i)] {<} 0$ for all $ i=1,\ldots,c$ . Let $ V^{0}(i)$ denote a random variable with the limiting (stationary) distribution of $ V_{n}(i)$ as $ n\to\infty$ ; it is well known (due to the i.i.d. assumptions) that $ V^{0}(i)$ has the same distribution as

\begin{equation*}M(i)\triangleq {$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{m\ge 0}}$}\, R_{m}(i)\end{equation*}

for $ i=1,\ldots,c$ .

More generally, even when the increment sequence is just stationary ergodic, not necessarily i.i.d. (hence not time-reversible as in the i.i.d. case), it is the backward-in-time maximum that is used in constructing a stationary version of $ \{V_{n}(i)\}$ . We will need this backwards approach in our simulation so we go over it here; it is usually referred to as Loynes’ lemma. We extend the arrival point process $ \{t_{n}\colon n\ge0\}$ to be a two-sided point stationary renewal process $ \{t_{n}\colon n\in\mathbb{Z}\}$

\begin{equation*}\cdots t_{-2}<t_{-1}<0=t_{0}<t_{1}<t_{2}\cdots\end{equation*}

Equivalently, $ T_{n}=t_{n+1}-t_{n}$ , $ n\in\mathbb{Z}$ , form i.i.d. interarrival times; $ \{T_{n}\colon n\in\mathbb{Z}\}$ forms a two-sided i.i.d. sequence.

Similarly, the i.i.d. sequences $ \{S_{n}\colon n\ge0\}$ and $ \{U_{n}\colon n\ge0\}$ are extended to be two-sided i.i.d., $ \{S_{n}\colon n\in\mathbb{Z}\}$ and $ \{U_{n}\colon n\in\mathbb{Z}\}.$ These extensions further allow two-sided extension of the i.i.d. increment sequences $ \{\Delta_{n}(i) \colon n\in\mathbb{Z}\}$ for $ i=1,\ldots,c$ , that is,

\begin{equation*}\Delta_{n}(i)=\tilde{S}_{n}-T_{n}=S_{n}I (U_{n}=i) -T_{n},\quad n\in\mathbb{Z}.\end{equation*}

Then we define c time-reversed (increments) random walks $ \{R^{(r)}_{n}(i)\colon n\ge0\}$ for $ i=1,\ldots,c$ , by $ R_{0}^{(r)}(i)=0$ and

\begin{equation*} R_{n}^{(r)} (i)=\sum_{j=1}^{n} \Delta_{-j}(i), \quad n\ge1.\end{equation*}

A (from-the-infinite-past) stationary version of $ \{V_{n}(i)\}$ denoted by $ \{V_{n}^{0}(i)\colon n\le0\}$ is then constructed via

\begin{align*}V_{0}^{0}(i) & = \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{m \ge 0}}$}\,R_{m}^{(r)} (i),\\*V_{-1}^{0}(i) & = \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{m \ge 1}}$}\,R_{m}^{(r)} (i)- R_{1}^{(r)} (i),\\V_{-2}^{0}(i) & = \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{m \ge 2}}$}\,R_{m}^{(r)} (i)- R_{2}^{(r)} (i),\\*& \vdots\notag\\*V_{-n}^{0}(i) & = \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{m \ge n}}$}\,R_{m}^{(r)} (i)- R_{n}^{(r)} (i),\end{align*}

for all $ i=1,\ldots,c$ .

By construction, the process $ \mathbf{V}_{n}^{0} = (V_{n}^{0}(1),\ldots,V_{n}^{0}(c))^{T}$ , $ n\le0$ , is jointly stationary representing a (from-the-infinite-past) stationary version of $ \{\mathbf{V}_{n}\colon n\le0\}$ , and satisfies the forward-in-time recursion (7):

(8) \begin{equation} \mathbf{V}_{n+1}^{0} =(\mathbf{V}_{n}^{0}+\tilde{\mathbf{S}}_{n} -\mathbf{T}_{n})^{+}, \quad n\le-1.\end{equation}

Thus, by starting at $ n =0$ and walking backwards in time, we have (theoretically) a time-reversed copy of the RA model. Furthermore, $ \{\mathbf{V}_{n}^{0}\colon n\le0\}$ can be extended to include forward time $ n\ge1$ via using the recursion further:

(9) \begin{equation} \mathbf{V}_{n}^{0} =(\mathbf{V}_{n-1}^{0}+\tilde{\mathbf{S}}_{n-1} -\mathbf{T}_{n-1})^{+}, \quad n\ge1,\end{equation}

where $ \tilde{\mathbf{S}}_{n}= (S_{n}I(U_{n}=1),\ldots,S_{n}I(U_{n}=c))^{T}$ for $ n\in\mathbb{Z}$ .

In fact, once we have a copy of just $ \mathbf{V}_{0}^{0}$ , we can start off the Markov chain with it as initial condition and use (9) to obtain a forward-in-time stationary version $ \{\mathbf{V}_{n}^{0}\colon n\ge0\}$ .

The above ‘construction’, however, is theoretical. We do not yet have any explicit way of obtaining a copy of $ \mathbf{V}_{0}^{0}$ , let alone an entire from-the-infinite-past sequence $ \{\mathbf{V}_{n}^{0}\colon n\le0 \}$ . In Blanchet and Wallwater [Reference Blanchet and Wallwater6], a simulation algorithm is given that yields (when applied to each of our random walks), for each $ 1\le i\le c$ , a copy of

\begin{equation*}\{(R_{n}^{(r)} (i), V_{-n}^{0}(i)) \colon 0\le n\le N\}\end{equation*}

for any desired $ 0\le N < \infty$ including stopping times N. We modify the algorithm so that it can do the simulation jointly across the c systems, that is, we extend it to a multi-dimensional form.

In particular, it yields an algorithm for obtaining a copy of $ \mathbf{V}_{0}^{0}$ , as well as a finite segment (of length N) of a backward-in-time copy of the RA model, $ \{\mathbf{V}_{-n}^{0}\colon 0\le n\le N\}$ , a stationary into-the-past construction up to discrete time $ n=-N$ .

Finite exponential moments are not required (because only truncated exponential moments are needed, $ \mathbb{E} [{\textrm{e}}^{\gamma\Delta(i)}I\{|\Delta(i)|\le a\}]$ , which in turn allow for the simulation of the exponential tilting of truncated $ \Delta(i)$ , via acceptance/rejection). To get a finite expected termination time (at each individual node), one needs the service distribution to have finite moment slightly beyond 2: for some (explicitly known) $ \epsilon>0$ ,

\begin{equation*} \mathbb{E} [S^{2+\epsilon}] <\infty.\end{equation*}

As our first case, we consider a stopping time N such that $ \mathbf{V}_{-N}=\mathbf{0}$ . Before giving the definition of the stopping time N, we introduce the main idea of our simulation algorithm.

Let us define the maximum of a sequence of vectors. Suppose we have $ \mathbf{Z}_{1},\ldots,\mathbf{Z}_{k}$ , where $ \mathbf{Z}_{i}\in\mathbb{R}^{d}$ with $ d\ge1$ and $ k\in\mathbb{N}_{+}\cup\{\infty\}$ . Define

\begin{equation*} \textrm{max}(\textbf{Z}_{1},\ldots,\textbf{Z}_{k}) =\Big(\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{1\le i\le k}}$}Z_{i}(1),\ldots,\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{1\le i\le k}}$}Z_{i}(d)\Big)^{T}.\end{equation*}

Next define, for $ n\in\mathbb{Z}$ ,

\begin{equation*}\mathbf{U}_{n}= (I(U_{n}=1),\ldots,I(U_{n}=c))^{T}\quad \mbox{and}\quad \boldsymbol{\Delta}_{n}=\tilde{\mathbf{S}}_{n}-\mathbf{T}_{n}=S_{n}\mathbf{U}_{n}-T_{n}\mathbf{f},\end{equation*}

where $ \{U_{n}\colon n\in\mathbb{Z}\}$ are i.i.d. from discrete uniform distribution over $ \{1,2,\ldots,c\}$ , and independently $ \{T_{n}\colon n\in\mathbb{Z}\}$ are i.i.d. from distribution A (as introduced in Section 2). Our goal is to simulate the stopping time $ N\in\mathbb{N}$ such that $ \mathbf{V}^{0}_{-N}=\mathbf{0}$ , defined as

(10) \begin{equation} N=\inf\Big\{n\ge 0\colon \textbf{V}^{0}_{-n}= \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge n}}$}\,\textbf{R}_{k}^{(r)}-\textbf{R}_{n}^{(r)}=\textbf{0}\Big\},\end{equation}

that is, the first time walking in the past, that all coordinates of the workload vector are 0, jointly with $ \{(\mathbf{R}_{n}^{(r)},\mathbf{V}^{0}_{-n})\colon 0\le n\le N\}$ . (By convention, the value of any empty sum of numbers is zero, i.e. $ \sum_{j=1}^{0}a_{j}=0$ .)

To ensure that $ \mathbb{E} [N] <\infty$ , in addition to $ \rho{<}c$ (stability), it is required that $ \mathbb{P} (T >S) >0$ (see the proof of Theorem 2 in [Reference Sigman18]), for which the most common sufficient conditions are that T has unbounded support, $ \mathbb{P} (T > t) > 0,\ t \ge0$ , or S has mass arbitrarily close to 0, $ \mathbb{P} (S < t) > 0,\break t > 0$ . But as we shall show in Section 6, given we know that $ \mathbb{P} (T >S) >0$ , we can assume without loss of generality that interarrival times are bounded. It is that assumption which makes the extension of [Reference Blanchet and Wallwater6] to a multi-dimensional form easier to accomplish. Then, we show (in Sections 4.2 and 9) how to simulate from $ \pi$ even when $ \mathbb{P} (T>S) =0$ . We do that in two different ways, one as a sandwiching argument and the other involving Harris-recurrent Markov chain regenerations.

4.1. Algorithm for simulating exactly from $ \pi $ for the FIFO GI/GI/c queue: the case $ \mathbb{P} (T > S) >0$

As mentioned earlier, we will assume that $ \mathbb{P} (T > S) >0$ , so that the stable ( $ \rho< c$ ) RA and FIFO GI/GI/c Markov chains (7) and (1) will visit 0 infinitely often with certainty. (That the RA model empties infinitely often when $ \mathbb{P} (T > S) >0$ is proved, for example, in [Reference Sigman18].) We imagine that at the infinite past $ n=-\infty$ , we start both (7) and (1) from empty. We construct the RA model forwards in time, while using Lemma 2 for the service times for the FIFO model, so that Lemma 1 applies and we have it in the form of (3), for all $ t_{n}\le0$ up to and including at time $ t_{0}=0$ , at which time both models are in stationarity. We might have to continue the construction of the RA model so that $ \mathbf{W}_{0}$ (distributed as $ \pi$ ) can be constructed (e.g. enough service times have been initiated by the RA model for using Lemmas 1 and 2). Formally, one can theoretically justify the existence of such infinite-from-the-past versions (that obey Lemma 1), by use of Loynes’ lemma. Each model (when started empty) satisfies the monotonicity required to use Loynes’ lemma. In particular, noting that $ Q_{RA}(t_{n}\!-\!) = 0$ if and only if $ \mathbf{V}_{n} = \mathbf{0}$ , we conclude that if at any time n it holds that $ \mathbf{V}_{n} = \mathbf{0}$ , then $ \mathbf{W}_{n} = \mathbf{0}$ . By the Markov property, given that $ \mathbf{V}_{n} = \mathbf{0}=\mathbf{W}_{n}$ , the future is independent of the past for each model, or said differently, the past is independent of the future. This remains valid if n is replaced by a stopping time (strong Markov property).

We outline the simulation algorithm steps as follows.

  1. 1. Simulate $ \{\{(R_{n}^{(r)} (i), V_{-n}^{0}(i)) \colon 0\le n \le N\}, 1\le i\le c\}$ with N as defined in (10). If $ N=0$ , go to the next step. Otherwise, having stored all data, reconstruct $ \mathbf{V}_{n}^{0}$ forwards in time from $ n=-N$ (initially empty) until $ n=0$ , using the recursion (8). During this forward-in-time reconstruction, redefine $ S_{j}$ as the jth service initiation used by the RA model (i.e. we are using Lemma 2 to gather service times in the proper order to feed in the FIFO model, which is why we do the reconstruction). If at time $ n=0$ there have not yet been N service initiations, then continue simulating the RA model out in forward time until finally there is an Nth service initiation, and then stop. This will require, at most, simulating out to $ t_{n}$ with $ n = N^{(+)} = \min\{n\ge0 \colon\mathbf{V}_{n}^{0} =\mathbf{0}\}$ . Take the vector $ (S_{-N},S_{-N+1},\ldots,S_{-1})$ and reset $ (S_{0},S_{1},\ldots,S_{N-1}) = (S_{-N},S_{-N+1},\ldots,S_{-1})$ . Also, store the interarrival times $ (T_{-N},T_{-N+1},\ldots, T_{-1})$ , and reset $ (T_{0},\ldots,T_{N-1}) = (T_{-N},T_{-N+1},\ldots, T_{-1})$ .

  2. 2. If $ N=0$ , then set $ \mathbf{W}_{0}=\mathbf{0}$ and stop. Otherwise use (1) with $ \mathbf{W}_{0}=\mathbf{0}$ , recursively go forwards in time for N steps until obtaining $ \mathbf{W}_{N}$ , via the N reset service $ (S_{0},S_{1},\ldots,S_{N-1})$ and interarrival times $ (T_{0},\ldots,T_{N-1})$ . Reset $ \mathbf{W}_{0}=\mathbf{W}_{N}$ .

  3. 3. Output $ \mathbf{W}_{0}$ .

Detailed simulation steps are discussed in Appendix A. Let $ \tau$ denote the total number of interarrival times and service times to simulate in order to detect the stopping time N. The following proposition shows that our algorithm will terminate in finite expected time, i.e. $ \mathbb{E} [\tau] <\infty$ . The proof is given in Section B.

Proposition 1. If $ \rho=\lambda/\mu{<}c$ , $ \mathbb{P} (T{>}S) {>}0$ , and there exists some $ \epsilon{>}0$ such that $ \mathbb{E} [S^{2+\epsilon}] <\infty$ , then

\begin{equation*}\mathbb{E} [N] <\infty \quad and \quad \mathbb{E} [\tau] <\infty.\end{equation*}

4.2. A more efficient algorithm: sandwiching

In this section we no longer even need to assume that $ \mathbb{P} (T>S) >0$ . (Another method allowing for $ \mathbb{P} (T>S) =0$ involving Harris-recurrent regeneration is given later in Section 9.) Instead of waiting for the workload vector of the GI/GI/c queue under RA discipline to become $ \mathbf{0}$ , we choose an ‘inspection time’ $ t_{-\kappa}<0$ for some $ \kappa\in\mathbb{Z}_{+}$ to stop the backward simulation of the RA GI/GI/c queue, then construct two bounding processes of the target FIFO GI/GI/c queue and evolve them forwards in time, using the same stream of arrivals and service time requirements (in order of service initiations), until coalescence or time zero. In particular, we let the upper bound process be a FIFO GI/GI/c queue starting at time $ t_{-\kappa}$ with workload vector $ \mathbf{V}^{0}_{-\kappa}$ , and let the lower bound process be a FIFO GI/GI/c queue starting at the same time $ t_{-\kappa}$ from empty, i.e. with workload vector $ \mathbf{0}$ .

Let $ \mathbf{W}(t)$ denote the ordered (ascendingly) workload vector of the original FIFO GI/GI/c queueing process, starting from the infinite past, evaluated at time t. For $ t\ge t_{-\kappa}$ , we define $ \mathbf{W}^{u}_{-\kappa}(t)$ and $ \mathbf{W}^{l}_{-\kappa}(t)$ to be the ordered (ascendingly) workload vectors of the upper bound and lower bound processes, initiated at the inspection time $ t_{-\kappa}$ , evaluated at time t. By our construction and Theorem 3.3 in [Reference Connor and Kendall8],

\begin{equation*}\mathbf{W}_{-\kappa}^{u}(t_{-\kappa})=\mathcal{R} (\mathbf{V}^{0}_{-\kappa})\ge\mathbf{W}(t_{-\kappa})\ge\mathbf{W}_{-\kappa}^{l}(t_{-\kappa})=\mathbf{0},\end{equation*}

and for all $ t>t_{-\kappa}$ ,

\begin{equation*}\mathbf{W}^{u}_{-\kappa}(t)\ge\mathbf{W}(t)\ge\mathbf{W}^{l}_{-\kappa}(t),\end{equation*}

where all the above inequalities hold coordinate-wise.

Note that we can evolve the ordered workload vectors of the two bounding processes as follows: for $ t_{n-1}\le t< t_{n}$ when $ -\kappa{<}n\le-1$ ,

(11) \begin{equation}\begin{split}\mathbf{W}_{-\kappa}^{u}(t) & =\mathcal{R}(\mathbf{W}^{u}_{-\kappa}(t_{n-1})+S_{n-1}\mathbf{e}-(t-t_{n-1})\mathbf{f})^{+},\\\mathbf{W}_{-\kappa}^{l}(t) & =\mathcal{R}(\mathbf{W}^{l}_{-\kappa}(t_{n-1})+S_{n-1}\mathbf{e}-(t-t_{n-1})\mathbf{f})^{+}.\end{split}\end{equation}

Similarly, let Q(t) denote the number of customers in the original FIFO GI/GI/c queueing process, starting from the infinite past, evaluated at time t. For $ t\ge t_{-\kappa}$ , we let $ Q^{u}_{-\kappa}(t)$ and $ Q^{l}_{-\kappa}(t)$ denote the number of customers in the upper and lower bound queueing processes respectively, both initiated at the inspection time $ t_{-\kappa}$ , evaluated at time t. If at some time $ \tau\in[t_{-\kappa},0]$ we observe that $ \mathbf{W}^{u}_{-\kappa}(\tau)=\mathbf{W}^{l}_{-\kappa}(\tau)$ , then it must be true that $ \mathbf{W}(\tau)=\mathbf{W}^{u}_{-\kappa}(\tau)=\mathbf{W}^{l}_{-\kappa}(\tau)$ and $ Q(\tau)=Q_{-\kappa}^{u}(\tau)=Q_{-\kappa}^{l}(\tau)$ (because the ordered remaining workload vectors of two bounding processes can only meet when they both have idle servers). We call such time $ \tau$ ‘coalescence time’ and from then on we have full information of the target FIFO GI/GI/c queue, hence we can continue to simulate it forwards in time until time 0.

However, if coalescence does not happen by time 0, we can adopt the so-called ‘binary back-off’ method by letting the arrival time $ t_{-2\kappa}$ be our new inspection time and redo the above procedure to detect coalescence. Theorem 3.3 in [Reference Connor and Kendall8] ensures that, for any $ t_{-\kappa}\le t\le0$ ,

\begin{equation*}\mathbf{W}_{-\kappa}^{u}(t)\ge\mathbf{W}_{-2\kappa}^{u}(t)\ge\mathbf{W}(t)\ge\mathbf{W}_{-2\kappa}^{l}(t)\ge\mathbf{W}_{-\kappa}^{l}(t).\end{equation*}

We summarize the sandwiching algorithm as follows.

  1. 1. Simulate $ \{(\mathbf{R}_{n}^{(r)},\mathbf{V}_{-n}^{0})\colon 0\le n\le\kappa\}$ with all data stored.

  2. 2. Use the stored data to reconstruct $ \mathbf{V}_{n}^{0}$ forwards in time from $ n=-\kappa$ until $ n=0$ , using (8), and redefine $ S_{j}$ as the jth service initiation used by the RA model.

  3. 3. Set $ \mathbf{W}_{-\kappa}^{u}(t_{-\kappa})=\mathcal{R} (\mathbf{V}_{-\kappa}^{0})$ and $ \mathbf{W}_{-\kappa}^{l}(t_{-\kappa})=\mathbf{0}$ . Then use the same stream of interarrival times $ (T_{-\kappa},T_{-\kappa+1},\ldots,T_{-1}) $ and service times $ (S_{-\kappa},S_{-\kappa+1},\ldots,S_{-1}) $ to simulate $ \mathbf{W}^{u}_{-\kappa}(t)$ , $ \mathbf{W}^{l}_{-\kappa}(t)$ forwards in time using (11).

  4. 4. If at some time $ t\in[t_{-\kappa},0]$ we detect $ \mathbf{W}^{u}_{-\kappa}(t)=\mathbf{W}^{l}_{-\kappa}(t)$ , set $ \tau=t$ , $ \mathbf{W} (\tau)=\mathbf{W}^{u}_{-\kappa}(\tau)$ , $ Q(\tau)=\sum_{i=1}^{c}I(\mathbf{W}(\tau;\, i)>0)$ , where $ \mathbf{W}(t;\, i)$ is the ith entry of vector $ \mathbf{W}(t)$ . Then use the remaining interarrival times and service times to evolve the original FIFO GI/GI/c queue forwards in time until time $ t_{0}=0$ , output $ (\mathbf{W}(0),Q(0))$ and stop.

  5. 5. If no coalescence is detected by time 0, set $ \kappa=2\kappa$ , then continue to simulate the backward RA GI/GI/c process until $ {(}\!-\kappa{)}$ th arrival, i.e. $ \{(\mathbf{R}_{n}^{(r)},\mathbf{V}_{-n}^{0})\colon 0\le n\le\kappa\}$ , with all data stored. Go to step 2.

Next we analyze properties of the coalescence time. Define

\begin{equation*}\kappa^{*}_{-}=\inf\Big\{n\ge0\colon \inf_{t_{-n}\le t\le0}\lVert\mathbf{W}^{u}_{-n}(t)-\mathbf{W}^{l}_{-n}(t)\rVert_{\infty}=0\Big\} .\end{equation*}

If at time $ t_{-\kappa^{*}_{-}}$ we start an upper bound FIFO GI/GI/c queue with workload vector $ \mathbf{W}^{u}_{-\kappa_{-}^{*}}(t_{-\kappa_{-}^{*}})$ and a lower bound FIFO GI/GI/c queue with workload vector $ \mathbf{0}$ , they will coalesce by time $ t_{0}=0$ . Therefore if we simulate the RA system backwards in time to $ t_{-\kappa_{-}^{*}}$ , we will be able to detect a coalescence. We next show that $ \mathbb{E} [-t_{-\kappa_{-}^{*}}] <\infty$ .

By stationarity we have that $ \kappa^{*}_{-}$ is equal in distribution to

\begin{equation*}\kappa^{*}_{+}=\inf\Big\{n\ge0\colon \inf_{0\le t\le t_{n}}\lVert\mathbf{W}^{u}_{0}(t)-\mathbf{W}^{l}_{0}(t)\rVert_{\infty}=0\Big\} ,\end{equation*}

hence $ -t_{-\kappa^{*}_{-}}\overset{d}{=}t_{\kappa^{*}_{+}}$ .

Proposition 2. If $ \rho=\mathbb{E} [S] /\mathbb{E} [T] {<}c$ and there exists some $ \epsilon{>}0$ such that $ \mathbb{E} [S^{2+\epsilon}] {<}\infty$ and $ \mathbb{E} [T^{2+\epsilon}] {<}\infty$ , then

\begin{equation*}\mathbb{E}[t_{\kappa^{*}_{+}}]{<}\infty.\end{equation*}

The proof follows the same argument as in the proof of Proposition 3 in [Reference Blanchet, Dong and Pei7], so we give a brief proof outline in Section B.

5. Continuous-time stationary constructions

For a stable FIFO GI/GI/1 queue, let D denote stationary customer delay (time spent in queue); that is, it has the limiting distribution of $ D_{n+1} = (D_{n} + S_{n}-T_{n})^{+}$ as $ n\to\infty$ .

Independently, let $ S_{e}$ denote a random variable distributed as the equilibrium distribution $ G_{e}$ of service time distribution G,

\begin{equation*}G_{e}(x)=\mu\int_{0}^{x} \mathbb{P} (S>y) \,{\textrm{d}} y,\quad x\ge 0,\end{equation*}

where $ S\sim G$ . Let V(t) denote the total work in the system at time t, the sum of all whole or remaining service times in the system at time t. $ D_{n} = V(t_{n}\!-\!)$ , and one can construct $ \{V(t)\}$ via

\begin{equation*}V(t)=(D_{n} +S_{n}-(t-t_{n}))^{+},\quad t_{n}\le t < t_{n+1}.\end{equation*}

(It is to be continuous from the right with left limits.) Let V denote stationary workload; that is, it has the limiting distribution

\begin{equation*}\mathbb{P}(V\le x)=\lim_{t\to\infty} {\dfrac{1}{t}} \int_{0}^{t} \mathbb{P}(V(s)\le x)\,{\textrm{d}} s,\quad x\ge0.\end{equation*}

It is well known that the following holds (see e.g. [Reference Sigman19, Sections 6.3 and 6.4]):

\begin{equation*}\mathbb{P} (V >x) =\rho \mathbb{P} (D+S_{e} >x) ,\quad x\ge0.\end{equation*}

Letting $ F_{D}(x)=\mathbb{P} (D\le x) $ denote the probability distribution of D, letting $ \delta_{0}$ be the point mass at 0, and letting $ \ast$ be convolution of distributions, this means that the distribution of V can be written as a mixture:

\begin{equation*}(1-\rho)\delta_{0} +\rho F_{D}\ast G_{e}.\end{equation*}

This leads to the following.

Proposition 3. For a stable $ (0<\rho<1)$ FIFO GI/GI/1 queue, if $ \rho$ is explicitly known, and one can exactly simulate from D and $ G_{e}$ , then one can exactly simulate from V.

Proof.

  1. 1. Simulate a Bernoulli $ (\rho)$ random variable B.

  2. 2. If $ B=0$ , then set $ V=0$ . Otherwise, if $ B=1$ , then simulate D and independently simulate a copy $ S_{e}\sim G_{e}$ . Set $ V = D + S_{e}$ . Stop.

Another algorithm requiring the ability to simulate from $ A_{e}$ (equilibrium distribution of the interarrival time distribution A) instead of $ G_{e}$ follows from another known relation:

(12) \begin{equation} V\overset{d}{=} (D+S-T_{e})^{+},\end{equation}

where D, S, and $ T_{e}\sim A_{e}$ are independent (see e.g. equation (88) of [Reference Wolff23, page 426]). Thus, by simulating D, S, and $ T_{e}$ , simply set $ V=(D+S-T_{e})^{+}$ . Equation (12) extends analogously to the FIFO GI/GI/c model, where our objective is to exactly simulate from the time-stationary distribution of the continuous-time Kiefer–Wolfowitz workload vector, $ \mathbf{W}(t) = (W(t;1),\ldots,W(t;c))^{T},\ t\ge0$ , where it can be constructed via

\begin{equation*}\mathbf{W}(t)=\mathcal{R}(\mathbf{W}_{n} +S_{n} \mathbf{e}-(t-t_{n})\mathbf{f})^{+},\quad t_{n}\,\le\, t\, <\, t_{n+1}.\end{equation*}

It is to be continuous from the right with left limits, $ \mathbf{W}_{n} =\mathbf{W}(t_{n}-\!)$ . Total workload V(t), for example, is obtained from this via

\begin{equation*}V(t)=\sum_{i=1}^{c} W(t;\, i).\end{equation*}

Letting $ \mathbf{W}^{*}$ have the time-stationary distribution of $ \mathbf{W}(t)$ as $ t\to\infty$ , and letting $ \mathbf{W}_{0}$ have the discrete-time stationary distribution $ \pi$ and letting S, $ T_{e}$ , and $ \mathbf{W}_{0}$ be independent, then

(13) \begin{equation} \mathbf{W}^{*}\overset{d}{=} \mathcal{R}(\mathbf{W}_{0}+S\mathbf{e}-T_{e}\mathbf{f})^{+}.\end{equation}

So once we have a copy of $ \mathbf{W}_{0}$ (distributed as $ \pi$ ) from our algorithm in Section 4.1 or Section 4.2, we can easily construct a copy of $ \mathbf{W}^{*}$ as long as we can simulate from $ A_{e}$ . Of course, if arrivals are Poisson then the distribution of $ \mathbf{W}^{*}$ is identical to that of $ \mathbf{W}_{0}$ by PASTA, but otherwise we can use (13).

5.1. Numerical results

As a sanity check, we have implemented our perfect sampling algorithm in MATLAB® for the case of an $ \operatorname{Erlang}(k_{1},\lambda)/\operatorname{Erlang}(k_{2},\mu)/c$ queue. We provide our implementation codes for both algorithms in the online appendix of this paper, available at https://github.com/yanan-pei/exact-sampling-multiserver-queue.

First we consider M/M/c queues, which are special cases of $ \operatorname{Erlang}(k_{1},\lambda)/\operatorname{Erlang}(k_{2},\mu)/c$ with $ k_{1}=k_{2}=1$ . For the quantity of interest, the number of customers in the FIFO M/M/c queue at stationary, we obtain its empirical distribution from a large number of independent runs of our algorithm and compare it to the theoretical distribution which has a well-established closed form:

\begin{align*} \pi_{0} & =\bigg(\sum_{k=0}^{c-1}\dfrac{\rho^{k}}{k!}+\dfrac{\rho^{c}}{(c-1)!} \dfrac{1}{c-\rho}\bigg)^{-1},\\\pi_{k} & =\begin{cases}\pi_{0}\cdot\rho^{k}/k! & \textrm{if}\, 0\, <\, k\, < \, c,\\[5pt] \ pi_{0}\cdot\rho^{k}c^{c-k}/c! & \textrm{if\ k}\, \ge\, {c}{,}\end{cases}\end{align*}

where $ \rho=\lambda/\mu{<}c$ .

As an example, Figure 1 shows the result of such a test when $ \lambda=3$ , $ \mu=2$ , and $ c=2$ . Gray bars are the empirical results of $ 5\,000$ draws using our algorithm and black bars are the theoretical distribution number of customers in the system from stationarity. A Pearson’s chi-squared test between the theoretical and empirical distributions gives a p-value equal to $ 0.8781$ , indicating close agreement (i.e. we cannot reject the null hypothesis that there is no difference between these two distributions). For another set of parameters $ \lambda=10$ , $ \mu=2$ , and $ c=10$ , the results are shown in Figure 2 with a p-value of $ 0.6069$ for the chi-squared fitness test.

Figure 1: Number of customers for an M/M/c queue in stationarity when λ=3, μ=2, c=2.

Figure 2: Number of customers for an M/M/c queue in stationarity when λ=10, μ=2, c=10.

For the general $ \operatorname{Erlang}\!(k_{1},\lambda)/\operatorname{Erlang}\!(k_{2},\mu)/c$ queue when $ k_{1}>1$ and $ k_{2}>1$ when $ \rho/c=\lambda k_{2}/(c\mu k_{1})=0.9$ , we compare the empirical distribution of number of customers in the system at stationarity, obtained from a large number of runs of our perfect sampling algorithm, to the numerical results (with precision at least $ 10^{-4}$ ) provided in Table III of [Reference Hillier and Lo14]. The results for an $ \operatorname{Erlang}\!(2,9)/\operatorname{Erlang}\!(2,5)/c$ queue are given in Figure 3. Gray bars are the empirical results of $ 5\,000$ draws using our algorithm and black bars are the numerical values given in [Reference Hillier and Lo14], and they are very close to each other. The Pearson’s chi-squared test gives a p-value of $ 0.9464$ , so we cannot reject the null hypothesis that these two distributions agree well.

Figure 3: Number of customers for an $ \operatorname{Erlang}(k_{1},\lambda)/\operatorname{Erlang}(k_{2},\mu)/c$ queue in stationarity when $ k_{1}=2$ , $ \lambda=9$ , $ k_{2}=2$ , $ \mu=5$ , $ c=2$ , and $ \rho/c=0.9$ .

Next we run a numerical experiment comparing the computational efficiency of the first algorithm in Section 4.1 and the second sandwiching algorithm in Section 4.2. We measure the computational efficiency in two aspects. The first one is how far in the past we need to simulate the dominating process to detect coalescence (counting the total number of arrivals sampled backwards). The second aspect involves actual computation time in seconds. Figure 4 depicts such a comparison for an M/M/c queue with parameters $ \lambda=10$ , $ \mu=2$ , and $ c=10$ , from $ 5\,000$ runs. Both results indicate that the second algorithm (sandwiching) is significantly more efficient than the first one.

Figure 4: Computational efficiency comparison between two algorithms for an M/M/c queue.

Finally we study how the computational complexity of our sandwiching algorithm compares to the algorithm given in [Reference Blanchet, Dong and Pei7]. Note that these two algorithms look similar: they both use back-off strategies to run two bounding processes from some inspection time and check if they meet before time 0. The difference is that in [Reference Blanchet, Dong and Pei7] they use a so-called ‘vacation system’ to construct upper bound process, whereas we use the same queue but under RA discipline instead. In the following numerical experiment, we define the computational complexity as the total number of arrivals each algorithm samples backwards to detect coalescence. Table 1 shows how they vary with traffic intensity, $ \rho/c=\lambda/(c\mu)$ , based on $ 5\,000$ independent runs of both algorithms using the same back-off strategy with the same initial $ \kappa=1$ . The result suggests that our second algorithm (sandwiching) outperforms the one proposed in [Reference Blanchet, Dong and Pei7] as the magnitude of our computational complexity does not increase as fast as theirs when traffic intensity increases.

Table 1. Simulation results for computational complexities with varying traffic intensities for an M/M/c queue with fixed $ \mu=5$ and $ c=2$ .

6. Why we can assume that interarrival times are bounded

Lemma 3. Consider the recursion

\begin{equation*} D_{n+1}=(D_{n}+S_{n}-T_{n})^{+},\quad n\ge0,\end{equation*}

where both $ \{T_{n}\}$ and $ \{S_{n}\}$ are non-negative random variables, and $ D_{0}=0$ .

Suppose that, for another sequence of non-negative random variables $ \{\skew2\hat{T}_{n}\}$ ,

\begin{equation*}\mathbb{P} (\skew2\hat{T}_{n}\le T_{n},\ n\ge0) =1.\end{equation*}

Then, for the recursion

\begin{equation*} \skew2\hat{D}_{n+1}=(\skew2\hat{D}_{n}+S_{n}-\skew2\hat{T}_{n})^{+},\quad n\ge0,\end{equation*}

with $ \skew2\hat{D}_{0}=0$ ,

\begin{equation*} \mathbb{P} (D_{n}\le\skew2\hat{D}_{n},\ n\ge0) =1.\end{equation*}

Proof. The proof is by induction on $ n\ge0$ : because (with probability 1 in the following arguments) $ \skew2\hat{T}_{0}\le T_{0}$ , we have

\begin{equation*}D_{1}=(S_{0}-T_{0})^{+}\le(S_{0}-\skew2\hat{T}_{0})^{+}=\skew2\hat{D}_{1}.\end{equation*}

Now suppose the result holds for some $ n\ge0$ . Then $ D_{n}\le\skew2\hat{D}_{n}$ and by assumption $ \skew2\hat{T}_{n}\le T_{n}$ ; hence

\begin{equation*}D_{n+1}=(D_{n}+S_{n}-T_{n})^{+}\le(\skew2\hat{D}_{n}+S_{n}-\skew2\hat{T}_{n})^{+}=\skew2\hat{D}_{n+1},\end{equation*}

and the proof is complete.

Proposition 4. Consider the stable RA GI/GI/c model in which $ \mathbb{P} (T>S) >0$ . In order to use this model to simulate from the corresponding stationary distribution of the FIFO GI/GI/c model as explained in the Section 4.1 , without loss of generality we can assume that the interarrival times $ \{T_{n}\}$ are bounded: there exists $ b>0$ such that

\begin{equation*}\mathbb{P} (T_{n}\le b,\ n\ge0) =1.\end{equation*}

Proof. By stability, $ c\mathbb{E} [T] >\mathbb{E} [S] $ , and by assumption $ \mathbb{P} (T>S) >0$ . If the $ \{T_{n}\}$ are not bounded, then for $ b>0$ , define $ \skew2\hat{T}_{n}=\min\{T_{n},b\}$ , $ n\ge0$ , i.e. truncated $ T_{n}$ . Choose b sufficiently large that $ c\mathbb{E}[\skew2\hat{T}]>\mathbb{E} [S] $ and $ \mathbb{P} (\skew2\hat{T}>S) >0$ still hold. Now use the $ \{\skew2\hat{T}_{n}\}$ in place of the $ \{T_{n}\}$ to construct an RA model, denoted by $ \widehat{RA}$ . Denote this by

\begin{equation*}{\hat{\mathbf{V}}}_{n} = (\skew2\hat{V}_{n}(1),\ldots, \skew2\hat{V}_{n} (c)) ,\end{equation*}

where it satisfies the recursion (7) in the form

\begin{equation*}{\hat{\mathbf{V}}}_{n+1} = ({\hat{\mathbf{V}}}_{n} + {\tilde{\mathbf{S}}}_{n}-{\hat{\mathbf{V}}}_{n})^{+},\quad n\ge0,\end{equation*}

where $ {\hat{\mathbf{T}}}_{n}=\hat{T}_{n}\cdot\mathbf{f}$ .

Starting from $ \mathbf{V}_{0}={\hat{\mathbf{V}}}_{0}=\mathbf{0}$ , then from Lemma 3, it holds (coordinate-wise) that

\begin{equation*}\mathbf{V}_{n}\le {\hat{\mathbf{V}}}_{n},\quad n\ge0,\end{equation*}

and thus, if for some $ n\ge0$ it holds that $ {\hat{\mathbf{V}}}_{n}=\mathbf{0}$ , then $ \mathbf{V}_{n}=\mathbf{0}$ and hence $ \mathbf{W}_{n}=\mathbf{0}$ (as explained in our previous section). Since b was chosen ensuring that $ c\mathbb{E} [\skew2\hat{T}] >\mathbb{E} [S] $ and $ \mathbb{P} (\skew2\hat{T}>S) >0$ , $ \{{\hat{\mathbf{V}}}_{n}\}$ is a stable RA GI/GI/c queue that will indeed empty infinitely often. Thus we can use it to do the backward-in-discrete-time stationary construction until it empties, at time (say) $ -\skew2\hat{N}$ , where $ \skew2\hat{N}=\min \{n\ge0\colon{\hat{\mathbf{V}}}_{-n}=0\} $ . Then, we can reconstruct the original RA model (starting empty at time $ -\skew2\hat{N}$ ) using the (original untruncated) $ \skew2\hat{N}$ interarrival times $ (T_{-\skew2\hat{N}},T_{-\skew2\hat{N}+1},\ldots, T_{-1}) $ in lieu of $ (\skew2\hat{T}_{-\skew2\hat{N}},\skew2\hat{T}_{-\skew2\hat{N}+1},\ldots, \skew2\hat{T}_{-1}) $ , so as to collect $ \skew2\hat{N}$ reordered $ S_{n}$ needed in the construction of $ \mathbf{W}_{0}$ for the FIFO model.

Remark 1. One would expect the reconstruction of the original RA model in the above proof to be unnecessary, that instead we only need to reconstruct the $ \widehat{RA}$ model until we have $ \skew2\hat{N}$ service initiations from it, as opposed to $ \skew2\hat{N}$ service initiations from the original RA model. Although this might be true, the subtle problem is that the order in which service times are initiated in the $ {\widehat{RA}}$ model will typically be different than for the original RA model; they have different arrival processes (counter-examples are easy to construct). Thus it is not clear how one can utilize Lemma 1 and Lemma 2 and so on. One would need to generalize Lemma 1 to account for truncated arrival times used in the RA model, but not the FIFO model, in perhaps a form such as a variation of (3),

\begin{equation*} \mathbb{P} (Q_{F}(t_{n}\!-\!)\le Q_{\widehat{RA}}(\skew2\hat{t}_{n}\!-\!)\ \hbox{for all}\ n\ge0) =1,\end{equation*}

where $ \{\skew2\hat{t}_{n}\}$ is the truncated renewal process. We have not explored this further.

7. Infinite server systems and other service disciplines

In this section we sketch how one can utilize our FIFO GI/GI/c results to obtain exact sampling of some other models including the infinite server queue, and the multi-server queue under other disciplines.

In [Reference Blanchet and Dong4] an exact simulation algorithm is presented for simulating from the stationary distribution of the infinite server queue, the GI/GI/ $ \infty$ . Here we sketch how to utilize our new FIFO GI/GI/c results to accomplish this by using a FIFO GI/GI/c model as an upper bound. The GI/GI/ $ \infty$ model has an infinite number of servers, there is no line, every arrival enters service immediately upon arrival; the nth customer arrives at time $ t_{n}$ and departs at time $ t_{n}+S_{n}$ .

For $ 0<\rho=\lambda/\mu{<}\infty$ , this model is always stable. Note that any $ c>\rho$ can be chosen for this construction. A larger value of c will result in a smaller number of arrivals necessary to detect coalescence, up to a certain point, since there will be less congestion in the bounding systems and the customers will leave faster. On the other hand, the actual simulation time may increase just as a consequence of simulating a c-dimensional random walk. We suggest a rule consistent with square-root staffing, $ c=\rho+\sqrt{\rho}$ , since this is well known to trade quality and capacity costs, which in our setting precisely translate to trading faster coalescence with cost-per-replication costs (see [Reference Halfin and Whitt13]).

Letting $ V_{\infty}(t)$ denote the total amount of work in the GI/GI/ $ \infty$ model, and letting $ V_{c}(t)$ denote the total amount of work in the (necessarily stable) FIFO GI/GI/c model being fed exactly the same input (of service times and interarrival times), and both starting initially empty, the following is easily established:

\begin{equation*}\mathbb{P} (V_{\infty}(t)\le V_{c}(t)\ \hbox{for all}\ t\ge0) =1,\end{equation*}

hence

\begin{equation*}\mathbb{P} (V_{\infty}(t_{n}\!-\!)\le V_{c}(t_{n}\!-\!)\ \hbox{for all}\ n\ge0) =1.\end{equation*}

(Note that both models use the service times in the same order of initiation, which makes the coupling easy from the start.)

Thus, if, for example, $ \mathbb{P} (T>S) >0$ , then the FIFO model will empty and can be used to detect times when the GI/GI/ $ \infty$ model will empty. Let $ L_{\infty}(t_{n}\!-\!)$ denote the total number of busy servers in the GI/GI/ $ \infty$ model as found by $ C_{n}$ .

Simulating the FIFO model backwards in time in stationarity (using our previous algorithm), until it first empties, can then be used to detect a time when the GI/GI/ $ \infty$ model is empty, and then one can construct it back up to time $ t=0$ to obtain a stationary copy of $ V_{\infty}(t_{n}\!-\!)$ and of $ L_{\infty}(t_{n}\!-\!)$ .

Now we consider alternatives disciplines to FIFO for the GI/GI/c model. It is immediate that when service times are generated only when needed by a server, the total number of customers in the system process $ \{Q(t)\}$ remains the same under FIFO as under last-in–first-out (LIFO), in which the next customer to enter service is the one at the bottom of the line, or random selection next (RS), in which the next customer to enter service from the line is selected at random by the server. Thus, they all share the same stationary distribution of Q(t) as $ t\to\infty$ , as well as the stationary distribution of $ Q(t_{n}\!-\!)$ as $ n\to\infty$ . Let $ Q_{0}$ have this limiting (as $ n\to\infty$ ) distribution. This fact can be used to exactly simulate, for example, stationary delay D under LIFO or RS (they are not the same as for FIFO). The method (sketch) is as follows. Simulate a copy of $ Q_{0}$ , jointly with the remaining service times of those in service, by assuming FIFO. This represents the distribution of the system as found in stationarity (at time 0) by arrival $ C_{0}$ . Consider RS, for example. If the line is empty, then define $ D_{RS}=0$ ; $ C_{0}$ enters service immediately. Otherwise, place $ C_{0}$ in the line, and continue simulating but now using RS instead of FIFO. As soon as $ C_{0}$ enters service, stop and define $ D_{RS}$ as that length of time.

8. Fork–join models

The RA recursion (7),

(14) \begin{equation} \mathbf{V}_{n+1} =(\mathbf{V}_{n} +\mathbf{{S}}_{n} -\mathbf{T}_{n})^{+},\quad n\ge0,\end{equation}

is actually a special case for the modeling of fork–join (FJ) queues (also called split and match) with c nodes. In an FJ model, each arrival is a ‘job’ with c components, the ith component requiring service at the ith FIFO queue. So upon arrival at time $ t_{n}$ , the job splits into its c components to be served. As soon as all c components have completed service, then and only then does the job depart. Such models are useful in manufacturing applications. The nth job ( $ C_{n}$ ) thus arrives with a service time vector attached of the form $ \mathbf{S}_{n} =({S}_{n}(1),\ldots,{S}_{n}(c))$ . Let us assume that the vectors are i.i.d., but otherwise each vector’s coordinates can have a general joint distribution; for then (14) still forms a Markov chain. We will denote this model as the GI/GI/c-FJ model. The sojourn time of the ith component is given by $ V_{n}(i)+S_{n}(i)$ , and thus the sojourn time of the nth job, $ C_{n}$ , is given by

\begin{equation*} H_{n}= \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{1\le i\le c}}$} \{V_{n}(i)+S_{n}(i)\}.\end{equation*}

Of great interest is obtaining the limiting distribution of $ H_{n}$ as $ n\to\infty$ ; we denote a random variable with this distribution as $ H^{0}$ . FJ models are notoriously difficult to analyze analytically. Even the special case of Poisson arrivals and i.i.d. exponential service times is non-trivial because of the dependence of the c queues through the common arrival process. (A classic paper is that of Flatto [Reference Flatto and Hahn10].) In fact, when $ c\ge3$ , only bounds and approximations are available. As for exact simulation, there is a paper by Hongsheng Dai [Reference Dai9] in which Poisson arrivals and independent exponential service times are assumed. Because of the continuous-time Markov chain (CTMC) model structure, Dai was able to construct (simulate) the time-reversed CTMC to use in a coupling from the past algorithm. But with general renewal arrivals and or general distribution service times, such CTMC methods can no longer be used.

Our simulation method for the RA model outlined in Section 4, however, yields an exact copy of $ H^{0}$ for the general GI/GI/c-FJ model, under the condition that there exists $ {\boldsymbol{\theta}}>\mathbf{0}$ , $ {\boldsymbol{\theta}}\in\mathbb{R}^{c}$ such that

\begin{equation*}\mathbb{E}[\!\exp({\boldsymbol{\theta}}^{T}(\mathbf{S}_{1}-\mathbf{T}_{1}))]<\infty.\end{equation*}

First we simulate $ \mathbf{V}_{0}^{0}$ exactly using exponential change of measure method introduced in [Reference Blanchet and Chen3] (we use the same technique for multi-dimensional simulation in Section 4.1), then simulate a vector of service times $ \mathbf{S}= (S(1),\ldots,S(c)) $ independently and set

\begin{equation*}H^{0}= \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{1\le i\le c}}$}\, \{V_{0}^{0}(i)+S(i)\}.\end{equation*}

Even when the service time components within $ \mathbf{S}$ are independent, or the case when service time distributions are assumed to have a finite moment generating function (in a neighborhood of the origin), such results are new and non-trivial.

9. The case when $ \mathbb{P}(T>S)=0$ : Harris-recurrent regeneration

For a stable FIFO GI/GI/c queue, the stability condition can be rewritten as $ \mathbb{E} [T_{1}+\cdots+ T_{c}] > \mathbb{E}[S]$ , which implies also that $ \mathbb{P} (T_{1}+\cdots+ T_{c}>S) >0$ . Thus assuming that $ \mathbb{P} (T>S) >0$ is not necessary for stability. When $ \mathbb{P} (T>S) =0$ , the system will never empty again after starting, and so using consecutive visits to $ \mathbf{0}$ as regeneration points is not possible. But the system does regenerate in a more general way via the use of Harris-recurrent Markov chain theory; see [Reference Sigman18] for details and history of this approach. The main idea is that while the system will not empty infinitely often, the number in system process $ \{Q_{F}(t_{n}\!-\!)\colon n\ge0\}$ will visit an integer $ 1\le j\le c-1$ infinitely often.

For illustration here, we will consider the $ c=2$ case (for the general case $ c\ge2$ the specific regeneration points analogous to what we present here are carefully given in equation (4.6) of [Reference Sigman18, page 396]). Let us assume that $ 1<\rho<2$ . (Note that if $ \rho<1$ , then equivalently $ \mathbb{E} [T] >\mathbb{E} [S] $ and so $ \mathbb{P} (T>S) >0$ ; that is why we rule out $ \rho<1$ here.) We now assume that $ \mathbb{P} (T>S) =0$ . This implies that for $ {\underline{s}}\triangleq\inf\{s>0\colon\mathbb{P} (S>s) >0\}$ and $ {\overline t}\triangleq\sup\{t>0\colon\break \mathbb{P} (T>t) >0\}$ , we must have $ 0<{\overline t}<{\underline{s}}<\infty$ . It is shown in [Reference Sigman18] that for $ \epsilon>0$ sufficiently small, the following event will happen infinitely often (in n) with probability 1:

(15) \begin{equation} \{Q_{RA}(t_{n}\!-\!)=1,\ V_{n}(1)=0,\ V_{n}(2)\le\epsilon,\ T_{n}>\epsilon,\ U_{n}=1\} .\end{equation}

If n is such a time, then at time $ n+1$ we have

(16) \begin{equation} \{Q_{RA}(t_{n+1}\!-\!)=1,\ V_{n+1}(2)=0,\ V_{n+1}(1)=(S_{n}-T_{n})\}.\end{equation}

The point is that $ C_{n}$ finds one server (server 1) empty, and the other queue with only one customer in it, and that customer is in service with a remaining service time $ \le\epsilon$ . $ C_{n}$ then enters service at node 1 with service time $ S_{n}$ ; but since $ T_{n}>\epsilon$ , $ C_{n+1}$ arrives finding the second queue empty, and the first server has remaining service time $ S_{n}-T_{n}$ conditional on $ T_{n}>\epsilon$ . Under the coupling of Lemma 1, the same will be so for the FIFO model (see Remark 2 below). At such a time n,

(17) \begin{equation} \{Q_{F}(t_{n}\!-\!)=1,\ W_{n}(1)=0,\ W_{n}(2)\le\epsilon,\ T_{n}>\epsilon\} ,\end{equation}

and at time $ n+1$ we have

(18) \begin{equation} \{Q_{F}(t_{n+1}\!-\!)=1,\ W_{n}(1)=0,\ W_{n}(2)=(S_{n}-T_{n})\} .\end{equation}

Equations (16) and (18) define positive recurrent regeneration points for the two models (at time $ n+1$ ); the consecutive times at which regenerations occur form a (discrete-time) positive recurrent renewal process (see [Reference Sigman18]).

To put this to use, we change the stopping time N given in (10) to

\begin{align*} N+1 & =\min\{n\ge1\colon Q_{RA}^{0}(t_{-(n+1)}\!-\!)=1,\ V_{-(n+1)}^{0}(1)=0,\\& V_{-(n+1)}^{0}(2)\le\epsilon,\ T_{-(n+1)} > \epsilon,\ U_{-(n+1)}=1\}.\notag\end{align*}

Then we do our reconstructions for the algorithm in Section 4.1 by starting at time $ -N$ , with both models starting with the same starting value

(19) \begin{equation}\{Q_{RA}(t_{-N}\!-\!)=1,\ V_{-N}^{0}(2)=0,\ V_{-N}^{0}(1)=(S_{-(N+1)}-T_{-(N+1)}) \mid T_{-(N+1)}>\epsilon\},\end{equation}
(20) \begin{equation}\{Q_{F}(t_{-N}\!-\!)=1,\ W_{-N}(1)=0,\ W_{-N}(2)=(S_{-(N+1)}-T_{-(N+1)}) \mid T_{-(N+1)}>\epsilon\}.\end{equation}

Remark 2. The service time used in (19) and (20) for coupling via Lemma 2, $ S_{-(N+1)}$ , is in fact identical for both systems for the following subtle reason. At time $ -(N+1)$ , both systems have only one customer in the system, and thus the total work is in fact equal to the remaining service time, so we use (5) to conclude that both remaining service times (even if different) are $ \leq\epsilon$ (for example, that is why (17) follows from (15)). Meanwhile, $ C_{-(N+1)}$ enters service immediately across both systems, so it is indeed the same service time $ S_{-(N+1)}$ used for this initiation. Coalescence is detected in finite expected time because of the positive recurrence property underlying the definition of the regeneration points from (16) and (18).

Appendix A. Detailed algorithm steps in Section 4.1

To simulate the process $ \{(\mathbf{R}_{n}^{(r)},\mathbf{V}_{-n}^{0})\colon 0\leq n\leq N\}$ with the time N defined in (10) as

\begin{equation*}N=\inf \bigg\{n\geq0\colon \mathbf{V}_{-n}^{0}= \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\geq n}}$}\, \mathbf{R}_{k}^{(r)}-\mathbf{R}_{n}^{(r)}=\mathbf{0}\bigg\} ,\end{equation*}

we must sample the running time maxima (entry by entry) of the c-dimensional random walk

\begin{equation*}\mathbf{R}_{n}^{(r)}=\sum_{i=1}^{n}\boldsymbol{\Delta}_{-i}=\sum_{i=1}^{n}(\tilde{\mathbf{S}}_{-i}-\mathbf{T}_{-i}),\quad n\geq0.\end{equation*}

We will find a sequence of random times $ \{N_{n}\colon n\geq1\}$ such that

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\leq k\leq N_{n}}}$}\,\mathbf{R}_{k}^{(r)}\geq \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\geq N_{n}}}$}\, \mathbf{R}_{k}^{(r)}.\end{equation*}

Hence, we will be able to find the running time maxima by only sampling the random walk on a finite time interval, that is, $ N_{n}$ is such that

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\geq n}}$}\,\mathbf{R}_{k}^{(r)}= \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\leq k\leq N_{n}}}$}\,\mathbf{R}_{k}^{(r)}.\end{equation*}

To achieve this, we first decompose the random walk into two random walks and then construct a sequence of ‘milestone’ events for each of these two random walks to detect the $ N_{n}$ . We will elaborate the detailed implementations in the following context.

Because of the stability condition $ \rho=\lambda/\mu{<}c$ , we can find some value $ a\in(1/\mu,c/\lambda)$ . For any $ n\ge0$ , define

(21) \begin{align} \mathbf{X}_{-n} & =\sum_{j=1}^{n} (S_{-j}-a) \mathbf{U}_{-j},\end{align}
(22) \begin{align} \mathbf{Y}_{-n} & =\sum_{j=1}^{n} (a\mathbf{U}_{-j}-\mathbf{T}_{-j}) ,\end{align}

hence $ \mathbf{R}^{(r)}_{n}=\sum_{j=1}^{n}\boldsymbol{\Delta}_{-j}=\mathbf{X}_{-n}+\mathbf{Y}_{-n}$ and $ \textrm{max}_{k\ge n}\mathbf{R}_{k}^{(r)}=\textrm{max}_{k\ge n}(\mathbf{X}_{-n}+\mathbf{Y}_{-n})$ .

For all $ n\ge0$ , let

(23) \begin{align}N_{n}^{X} & =\inf\bigg\{n^{\prime}\ge n\colon \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge n^{\prime}}}$}\,\mathbf{X}_{-k}\le\mathbf{X}_{-n}\bigg\},\end{align}
(24) \begin{align} N_{n}^{Y} & =\inf\bigg\{n^{\prime}\ge n\colon \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge n^{\prime}}}$}\,\mathbf{Y}_{-k}\le\mathbf{Y}_{-n}\bigg\},\end{align}
(25) \hspace*{-50pt}\begin{align} N_{n} & =\textrm{max}\{N_{n}^{X},N_{n}^{Y}\}.\hspace*{67pt}\end{align}

Then, by the definitions above,

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$}\,\mathbf{R}_{k}^{(r)}\le\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$}\,\mathbf{X}_{-k}+\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$}\,\mathbf{Y}_{-k}\le\mathbf{X}_{-n}+\mathbf{Y}_{-n}=\mathbf{R}_{n}^{(r)}.\end{equation*}

Therefore, to get the running time maximum $ \textrm{max}_{k\ge n}\mathbf{R}^{(r)}_{k}$ for each $ n\ge0$ , we only need to sample the random walk from step n to $ N_{n}$ , because

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge n}}$}\,\mathbf{R}_{k}^{(r)}=\textrm{max}\bigg\{\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\le k\le N_{n}}}$}\,\mathbf{R}_{k}^{(r)},\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\ge N_{n}}}$}\,\mathbf{R}_{k}^{(r)}\bigg\} = \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\le k\le N_{n}}}$} \mathbf{R}^{(r)}_{k}.\end{equation*}

Next we describe how to sample $ N_{n}$ along with the multi-dimensional random walks

\begin{equation*}\{\mathbf{X}_{-n}\colon n\ge0\}\quad\text{and}\quad\{\mathbf{Y}_{-n}\colon n\ge0\}.\end{equation*}

A.1. Simulation algorithm for the process $ \{\mathbf{Y}_{-{\mathbf{\mathit{n}}}}\colon {\mathbf{\mathit{n}}} \ge 0\}$

We first consider simulating the c-dimensional random walk $ \{\mathbf{Y}_{-n}\colon n\geq0\}$ with $ \mathbf{Y}_{0}=\mathbf{0}$ . For each $ j\geq1$ , $ \mathbb{E} [a\mathbf{U}_{-j}-\mathbf{T}_{-j}] <\mathbf{0}$ , we can simulate the running time maximum $ \textrm{max}_{k\geq n}\mathbf{Y}_{-k}$ jointly with the path $ \{\mathbf{Y}_{-k}\colon 0\leq k\leq n\}$ via the method developed in [Reference Blanchet and Chen3], with the following assumptions.

Assumption (A1). There exists $ {\boldsymbol{\theta}}>\mathbf{0}$ , $ {\boldsymbol{\theta}}\in\mathbb{R}^{c}$ such that

\begin{equation*}\mathbb{E} [\exp ({\boldsymbol{\theta}}^{T}(a\mathbf{U}_{-j}-\mathbf{T}_{-j})) ]<\infty.\end{equation*}

Assumption (A1b). Suppose that in every dimension $ i=1,\ldots,c$ , there exists $ \theta^{*}\in(0,\infty)$ such that

\begin{equation*}\phi_{i}(\theta^{*})\,:\!=\log \mathbb{E} [\exp (\theta^{*} (aI(U_{-j}=i)-T_{-j})) ] =0.\end{equation*}

Because for each $ j\ge1$ , $ aI(U_{-j}=i)-T_{-j}$ are marginally identically distributed across i, so $ \theta^{*}$ would work for all $ i=1,\ldots,c$ .

Remark 3. In our setting, since $ \mathbf{U}_{-j}$ is bounded, assumption (A1) always holds. Assumption (A1b) is known as Cramer’s condition in the large deviations literature, and it is a strengthening of assumption (A1). We shall explain briefly at the end of this section that it is possible to relax this assumption to (A1) by modifying the algorithm without affecting the exactness or computational cost of the algorithm. For the moment we continue to describe the main algorithmic idea under assumption (A1b).

For any $ \mathbf{s}\in\mathbb{R}^{c}$ and $ \mathbf{b}\in\mathbb{R}^{c}_{+}$ , define

\begin{align*} T_{\mathbf{b}} & =\inf\{n\ge0\colon Y_{-n}(i)>b(i)\ \mbox{for some} i\in\{1,\ldots,c\}\},\\ T_{-\mathbf{b}}& =\inf\{n\ge0\colon Y_{-n}(i)<-b(i)\ \mbox{for all} i=1,\ldots,c\},\\ P_{\mathbf{s}}({\cdot})& =\mathbb{P} (\cdot\mid \mathbf{Y}_{0}=\mathbf{s}) .\end{align*}

We will use these definitions in Algorithm LTGM given in Section A.1.1.

We next construct a sequence of upward and downward ‘milestone’ events for this multi-dimensional random walk. The construction is completely analogous to the classical ladder height decomposition of one-dimensional random walks: we introduce a parameter, m, in order to facilitate a certain acceptance/rejection step to be explained in the next subsection. Let

(26) \begin{equation}m=\lceil\log(c)/\theta^{\ast}\rceil.\end{equation}

Define $ D_{0}=0$ and $ \Gamma_{0}=\infty$ . For $ k\geq1$ , let

(27) \begin{align}D_{k} & =\inf\{n\geq D_{k-1}\vee\Gamma_{k-1}I (\Gamma_{k-1}<\infty) \colon Y_{-n}(i){<}Y_{-D_{k-1}}(i)-m\ \mbox{for all} i\},\end{align}
(28) \hspace*{-50pt}\begin{align} \Gamma_{k} & =\inf\{n\geq D_{k}\colon Y_{-n}(i){>}Y_{-D_{k}}(i)+m\ \mbox{for some} i\}.\hspace*{90pt}\end{align}

Note that, by convention, $ \Gamma_{k}I (\Gamma_{k}<\infty) =0$ if $ \Gamma_{k}=\infty$ for any $ k\geq0$ . We let $ \mathbf{B}\in\mathbb{R}^{c}$ , initially set as $ (\infty,\ldots,\infty)^{T}\in\mathbb{R}^{c}$ , to be the running time upper bound of process $ \{\mathbf{Y}_{-n}\colon n\geq0\}$ . Let $ \mathbf{m}=m\mathbf{f}$ , where $ \mathbf{f}=(1,\ldots,1)^{T}$ as defined in Section 2. From the construction of ‘milestone’ events in (27) and (28), we know that if $ \Gamma_{k}=\infty$ for some $ k\geq1$ , the process will never cross over the level $ \mathbf{Y}_{-D_{k}}+\mathbf{m}$ after $ D_{k}$ coordinate-wise, that is, for $ i=1,\ldots,c$ ,

\begin{equation*}Y_{-n}(i)\leq Y_{-D_{k}}(i)+m \quad \text{{for all}} n\geq D_{k}.\end{equation*}

Hence, in this case we update the upper bound vector $ \mathbf{B}=\mathbf{Y}_{-D_{k}}+\mathbf{m}$ .

A.1.1. Global maximum simulation

Define

\begin{equation*}\Lambda=\inf\{D_{k}\colon \Gamma_{k}=\infty,k\geq1\}.\end{equation*}

By the construction of ‘milestone’ events, for all $ n\geq\Lambda$ ,

\begin{equation*}\mathbf{Y}_{-n}\leq\mathbf{Y}_{-\Lambda}+\mathbf{m}<\mathbf{0}=\mathbf{Y}_{0}.\end{equation*}

Hence, we can evaluate the global maximum level of the process $ \{\mathbf{Y}_{-n}\colon n\geq0\}$ to be

\begin{equation*} \mathbf{M}_{0}\,:\!= \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\geq0}}$}\,\mathbf{Y}_{-k}=\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{0\leq k\leq\Lambda}}$}\,\mathbf{Y}_{-k},\end{equation*}

and we give the detailed sampling procedure in the following algorithm. The algorithm has elements, such as sampling from $ P_{\mathbf{0}}(T_{\mathbf{m}}<\infty)$ , which will be explained below.

Algorithm 1. (LTGM.) Simulate the global maximum of c-dimensional process $ \{\mathbf{Y}_{-n}\colon n\ge0\}$ jointly with the subpath and the subsequence of ‘milestone’ events.

Input: $ a\in(1/\mu,c/\lambda)$ satisfies assumption (A1b), m as in (26).

  1. 1. (Initialization) Set $ n=0$ , $ \mathbf{Y}_{0}=\mathbf{0}$ , $ \mathbf{D}=[0]$ , $ \boldsymbol{\Gamma}=[\infty]$ , $ \mathbf{L}=\mathbf{0}$ , and $ \mathbf{B}=\infty\mathbf{f}$ .

  2. 2. Generate $ U\sim \textrm{Unif}\{1,\ldots,c\}$ and let $ \mathbf{U}=(I(U=1),\ldots,I(U=c))^{T}$ . Independently sample $ T\sim A$ and let $ \mathbf{T}=T\mathbf{f}$ . Set $ n=n+1$ , $ \mathbf{Y}_{-n}=\mathbf{Y}_{-(n-1)}+a\mathbf{U}-\mathbf{T}$ , $ U_{-n}=U$ , and $ T_{-n}= T$ .

  3. 3. If there is some $ 1\le i\le c$ such that $ Y_{-n}(i)\ge L(i)-m$ , then go to step 2; otherwise set $ \mathbf{D}=[\mathbf{D},n]$ and $ \mathbf{L}=\mathbf{Y}_{-n}$ .

  4. 4. Independently sample $ J\sim \operatorname{Ber} (P_{\mathbf{0}} (T_{\mathbf{m}}<\infty)) $ .

  5. 5. If $ J=1$ , simulate a new conditional path $ \{(\mathbf{y}_{-k},u_{-k},t_{-k}) \colon 1\le k\le T_{\mathbf{m}}\}$ with $ \mathbf{y}_{0}=\mathbf{0}$ , following the conditional distribution of $ \{\mathbf{Y}_{-k}\colon 0\le k\le T_{\mathbf{m}}\}$ given $ T_{\mathbf{m}}<\infty$ . Set $ \mathbf{Y}_{-(n+k)}=\mathbf{Y}_{-n}+\mathbf{y}_{-k}$ , $ U_{-(n+k)}= u_{-k}$ , $ T_{-(n+k)}= t_{-k}$ for $ 1\le k\le T_{m}$ . Set $ n= n+T_{\mathbf{m}}$ , $ \boldsymbol{\Gamma}=[\boldsymbol{\Gamma},n]$ . Go to step 2.

  6. 6. If $ J=0$ , set $ \Lambda=n$ , $ \boldsymbol{\Gamma}=[\boldsymbol{\Gamma},\infty]$ , and $ \mathbf{B}=\mathbf{L}+\mathbf{m}$ .

  7. 7. Output $ \{(\mathbf{Y}_{-k},U_{-k},T_{-k}) \colon 1\le k\le\Lambda\} $ , $ \mathbf{D}$ , $ \boldsymbol{\Gamma}$ , and global maximum $ \mathbf{M}_{0}=\textrm{max}_{0\le k\le\Lambda}\mathbf{Y}_{-k}$ .

Now we explain how to execute steps 4 and 5 in the above algorithm. The procedure is similar to the multi-dimensional procedure given in [Reference Blanchet and Chen3], so we describe it briefly here. As $ P_{\mathbf{0}}({\cdot})$ denotes the canonical probability, we let $ P^{*}_{\mathbf{0}}({\cdot})=P_{\mathbf{0}}(\cdot|T_{\mathbf{m}}<\infty)$ . Our goal is to simulate from the conditional law of $ \{\mathbf{Y}_{-k}\colon 0\le k\le T_{\mathbf{m}}\} $ given that $ T_{\mathbf{m}}<\infty$ and $ \mathbf{Y}_{0}=\mathbf{0}$ , i.e. to simulate from $ P_{\mathbf{0}}^{*}$ . We will use acceptance/rejection by letting $ P^{\prime}_{\mathbf{0}}({\cdot})$ denote the proposal distribution. A typical element $ \omega^{\prime}$ sampled under $ P_{\mathbf{0}}^{\prime}({\cdot})$ is of the form $ \omega^{\prime}=((\mathbf{Y}_{-k}\colon k\ge0),\operatorname{index})$ , where $ \operatorname{index}\in\{1,\ldots,c\}$ , and it indicates the direction we pick to do exponential tilting; it is the coordinate in which we change its measure to increase the chance of its hitting the upward ‘milestone’ as defined in (28). Given the value of $ \operatorname{index}$ , the process $ (\mathbf{Y}_{-k}\colon k\ge0)$ remains a random walk. We now describe $ P_{\mathbf{0}}^{\prime}$ by explaining how to sample $ \omega^{\prime}$ . First,

(29) \begin{equation} P_{\mathbf{0}}^{\prime}(\operatorname{index}=i)=w_{i}\,:\!=\dfrac{1}{c}.\end{equation}

Then, conditioning on $ \operatorname{index}=i$ , for every set $ A\in\sigma (\{\mathbf{Y}_{-k}\colon 0\le k\le n\}) $ ,

(30) \begin{equation} P_{\mathbf{0}}^{\prime} (A\mid\operatorname{index}=i) =E_{\mathbf{0}} [ \exp (\theta^{*}Y_{-n}(i))I_{A}].\end{equation}

To obtain the induced distribution for U and T, we study the moment generating function induced by definition (30). Given $ {\boldsymbol{\eta}}\in\mathbb{R}^{c}$ in a neighborhood of the origin,

\begin{align*}&\dfrac{E_{\mathbf{0}} [\exp ({\boldsymbol{\eta}}^{T}(a\mathbf{U}-\mathbf{T}) +\theta^{*}e_{i}^{T}(a\mathbf{U}-\mathbf{T})) ]}{E_{\mathbf{0}} [\exp (\theta^{*}e_{i}^{T}(a\mathbf{U}-\mathbf{T})) ]} \\&\qquad =\dfrac{E_{\mathbf{0}} [\exp (({\boldsymbol{\eta}}+\theta^{*}e_{i})^{T} a\mathbf{U}) ]}{E_{\mathbf{0}} [\exp (\theta^{*}e_{i}^{T} a\mathbf{U}) ]}\cdot\dfrac{E_{\mathbf{0}} [\exp (-({\boldsymbol{\eta}}+\theta^{*}e_{i})^{T}\mathbf{T}) ]}{E_{\mathbf{0}} [\exp (-\theta^{*}e_{i}^{T}\mathbf{T}) ]}.\end{align*}

The above expression indicates that under $ P_{\mathbf{0}}^{\prime}({\cdot})$ , T and U are independent. Moreover, we have

\begin{equation*}E_{\mathbf{0}} [\exp (\theta^{*}e_{i}^{T}a\mathbf{U}) ] =\dfrac{\exp({\theta^{*}a})+c-1}{c}.\end{equation*}

Therefore,

(31) \begin{equation} P_{\mathbf{0}}^{\prime} (U=j\mid \operatorname{index}=i) =\begin{cases}\dfrac{\exp (\theta^{*}a)}{\exp (\theta^{*}a) +c-1}& \text{if} j =i{,}\\[9pt]\dfrac{1}{\exp (\theta^{*}a) +c-1} & \text{if} j\neq i{.}\end{cases}\end{equation}

On the other hand, conditional on $ \operatorname{index}=i$ , the distribution of a generic interarrival time T is obtained by exponential tilting such that

(32) \begin{equation} {\textrm{d}} P_{\mathbf{0}}(T\mid \operatorname{index}=i) ={\textrm{d}} P_{\mathbf{0}}(T)\cdot\dfrac{\exp(-\theta^{*}T)}{E_{\mathbf{0}} [\exp(-\theta^{*}T)]} ={\textrm{d}} P_{\mathbf{0}}(T)\cdot\dfrac{\exp(a\theta^{*})+c-1}{c\exp(\theta^{*}T)},\end{equation}

where the second equation follows from assumption (A1b).

Following assumption (A1b) and because $ \operatorname{Var}(aI(U_{-j}=i)-T_{-j})>0$ , by convexity,

\begin{equation*} E_{\mathbf{0}}^{\prime} [ Y_{-n}(\operatorname{index})] =\sum_{i=1}^{c}E_{\mathbf{0}} [ Y_{-n}(i)\exp (\theta^{*}Y_{-n}(i))] P_{\mathbf{0}}^{\prime} (\operatorname{index}=i) =\dfrac{1}{c}\sum_{i=1}^{c}\dfrac{{\textrm{d}}\phi_{i} (\theta^{*})}{{\textrm{d}}\theta} >0,\end{equation*}

so $ Y_{-n} (\operatorname{index}) \to\infty$ as $ n\to\infty$ almost surely under $ P_{\mathbf{0}}^{\prime}({\cdot})$ , hence $ T_{\mathbf{m}}<\infty$ with probability one under $ P_{\mathbf{0}}^{\prime}({\cdot})$ . Now, to verify that $ P_{\mathbf{0}}({\cdot})$ is a valid proposal for acceptance/rejection method, we must verify that $ {\textrm{d}} P_{\mathbf{0}}^{*}/{\textrm{d}} P_{\mathbf{0}}^{\prime}$ is bounded by a constant, that is,

\begin{align*}& \dfrac{{\textrm{d}} P_{0}^{*}}{{\textrm{d}} P_{0}^{\prime}} (\mathbf{Y}_{-k}\colon 0\le k\le T_{\mathbf{m}}) \\& \qquad = \dfrac{1}{P_{0} (T_{\mathbf{m}}<\infty)}\cdot \dfrac{{\textrm{d}} P_{0}}{{\textrm{d}} P_{0}^{\prime}} (\mathbf{Y}_{-k}\colon0\le k\le T_{\mathbf{m}}) \\& \qquad = \dfrac{1}{P_{0} (T_{\mathbf{m}}<\infty)}\cdot \dfrac{1}{\sum_{i=1}^{c}w_{i}\exp (\theta^{*}Y_{-T_{\mathbf{m}}}(i))}\\& \qquad \le \dfrac{1}{P_{0} (T_{\mathbf{m}}<\infty)}\cdot \dfrac{c}{\exp (\theta^{*}m)}\\& \qquad < \dfrac{1}{P_{0} (T_{\mathbf{m}}<\infty)},\end{align*}

where the last inequality is guaranteed by (26). So, acceptance/rejection is valid.

Moreover, the overall probability of accepting the proposal is precisely $ P_{\mathbf{0}}(T_{\mathbf{m}}<\infty)$ . Thus, we execute not only step 5 but simultaneously also step 4. We use this acceptance/rejection method to replace steps 4 and 5 in Algorithm LTGM as follows.

  1. 4′. Sample $ \{(\mathbf{y}_{-k},u_{-k},t_{-k}) \colon 0\le k\le T_{\mathbf{m}}) \}$ with $ \mathbf{y}_{0}=\mathbf{0}$ from $ P_{0}^{\prime} (\cdot) $ as indicated via (29), (31), and (32). Sample a Bernoulli J with success probability

    \begin{equation*}\dfrac{c}{\sum_{i=1}^{c}\exp (\theta^{*}y_{-T_{\mathbf{m}}}(i))}.\end{equation*}
  2. 5′. If $ J=1$ , set $ \mathbf{Y}_{-(n+k)}=\mathbf{Y}_{-n}+\mathbf{y}_{-k}$ , $ U_{-(n+k)}=u_{-k}$ , $ T_{-(n+k)}=t_{-k}$ for $ 1\le k\le T_{\mathbf{m}}$ . Set $ n=n+T_{\mathbf{m}}$ and $ \boldsymbol{\Gamma}=[\boldsymbol{\Gamma},n]$ . Go to step 2.

A.1.2. Simulate $ \{\mathbf{Y}_{-n}\colon n\ge0\}$ jointly with ‘milestone’ events

In this section we provide an algorithm to sequentially simulate the multi-dimensional random walk $ \{\mathbf{Y}_{-n}\colon n\ge0\}$ along with its downward and upward ‘milestone’ events as defined in (27) and (28). We first extend Lemma 3 in [Reference Blanchet and Sigman5] to a multi-dimensional version as follows.

Lemma 4. Let $ 0<\mathbf{a}<\mathbf{b}\le\infty\mathbf{f}$ (coordinate-wise) and consider any sequence of bounded positive measurable functions $ f_{k}\colon \mathbb{R}_{c\times(k+1)}\to[0,\infty)$ ,

\begin{align*} & E_{0} [ f_{T_{-\mathbf{a}}}(\mathbf{Y}_{0},\ldots, \mathbf{Y}_{-T_{-\mathbf{a}}})\mid T_{\mathbf{b}}=\infty] \\&\qquad = \dfrac{E_{0} [ f_{T_{-\mathbf{a}}}(\mathbf{Y}_{0},\ldots,\mathbf{Y}_{-T_{-\mathbf{a}}})\cdot I(Y_{-j}(i)\le b(i),0\le j< T_{-\mathbf{a}},1\le i\le c)] \cdot P_{\mathbf{Y}_{-T_{-\mathbf{a}}}}(T_{\mathbf{b}}=\infty)}{P_{0} (T_{\mathbf{b}}=\infty)}.\end{align*}

Therefore, if $ P_{0}^{**}({\cdot})\,:\!=P_{0}(\cdot\mid T_{\mathbf{b}}=\infty)$ , then

\begin{equation*} \dfrac{{\textrm{d}} P_{0}^{**}}{{\textrm{d}} P_{0}}=\dfrac{I (Y_{-j}(i){\le} b(i)\ {for\ all}\ j{<}T_{-\mathbf{a}},1{\le} i{\le} c) \cdot P_{\mathbf{Y}_{-T_{-\mathbf{a}}}}(T_{\mathbf{b}}=\infty)}{P_{0}(T_{\mathbf{b}}=\infty)}{\le}\dfrac{1}{P_{0} (T_{\mathbf{b}}=\infty)}.\end{equation*}

Lemma 4 enables us to sample a downward patch by using the acceptance/rejection method with the nominal distribution $ {P}_{0}$ as proposal. Suppose our current position is $ \mathbf{Y}_{-D_{j}}$ (for some $ j\ge1$ ) and we know that the process will never go above the upper bound $ \mathbf{B}$ (coordinate-wise). Next we simulate the path up to time $ D_{j+1}$ . If we can propose a downward patch $ (\mathbf{y}_{-1},\ldots,\mathbf{y}_{-T_{-m}})\,:\!= (\mathbf{Y}_{-1},\ldots,\mathbf{Y}_{-T_{-\mathbf{m}}}) $ , under the unconditional probability given $ \mathbf{y}_{0}=\mathbf{0}$ and $ \mathbf{y}_{-k}\le\mathbf{m}$ for $ 1\le k\le T_{-\mathbf{m}}$ , then we accept it with probability $ P_{0} (T_{\boldsymbol{\sigma}}=\infty) $ , where $ {\boldsymbol{\sigma}}=\mathbf{B}-\mathbf{Y}_{-D_{j}}-\mathbf{y}_{-T_{-\mathbf{m}}}$ . A more efficient way to sample is to sequentially generate $ (\mathbf{y}_{-1},\ldots,\mathbf{y}_{-\Lambda}) $ with $ \mathbf{y}_{0}=\mathbf{0}$ as long as $ \mathbf{m}_{0}\,:\!=\textrm{max}_{0\le k\le\Lambda}\mathbf{y}_{-k}\le\mathbf{m}$ coordinate-wise, then concatenate the sequence to the previously sampled subpath. We give the efficient implementation procedure in the next algorithm.

Algorithm 2. (LTRW.) Continue to sample the process $ \{(\mathbf{Y}_{-k},U_{-k},T_{-k})\colon 0\le k\le n\}$ jointly with the partially sampled ‘milestone’ event lists $ \mathbf{D}$ and $ \boldsymbol{\Gamma}$ , until a stopping criterion is met.

Input: a, m, previously sampled partial process $ \{(\mathbf{Y}_{-j},U_{-j},T_{-j})\colon 0\le j\le l\}$ , partial ‘milestone’ sequences $ \mathbf{D}$ and $ \boldsymbol{\Gamma}$ , and stopping criterion $ \mathcal{H}$ . (Note that if there is no previous simulated random walk, we initialize $ l=0$ , $ \mathbf{D}=[0]$ , and $ \boldsymbol{\Gamma}=[\infty]$ .)

  1. 1. Set $ n=l$ . If $ n=0$ , call Algorithm LTGM to get $ \Lambda$ , $ \{(\mathbf{Y}_{-k},U_{-k},T_{-k})\colon 0\le k\le\Lambda\}$ , $ \mathbf{D}$ and $ \boldsymbol{\Gamma}$ . Set $ n=\Lambda$ .

  2. 2. While the stopping criterion $ \mathcal{H}$ is not satisfied:

    1. (a) Call Algorithm LTGM to get $ \tilde{\Lambda}$ , $ \{(\tilde{\mathbf{Y}}_{-j},\tilde{U}_{-j},\tilde{T}_{-j})\colon 0\le j\le\tilde{\Lambda}\}$ , $ \tilde{\mathbf{D}}$ , $ \tilde{\boldsymbol{\Gamma}}$ , and $ \tilde{\mathbf{M}}_{0}$ .

    2. (b) If $ \tilde{\mathbf{M}}_{0}\le\mathbf{m}$ , accept the proposed sequence and concatenate it to the previous subpath, that is, set $ \mathbf{Y}_{-(n+j)}=\mathbf{Y}_{-n}+\tilde{\mathbf{Y}}_{-j}$ , $ U_{-(n+j)}=\tilde{U}_{-j}$ , $ T_{-(n+j)}=\tilde{T}_{-j}$ for $ 1\le j\le\tilde{\Lambda}$ . Update the sequences of ‘milestone’ events to be $ \mathbf{D}=[\mathbf{D},n+\tilde{\mathbf{D}}(2\,:\,\text{end})]$ , $ \boldsymbol{\Gamma}=[\boldsymbol{\Gamma},n+\tilde{\boldsymbol{\Gamma}}(2\,:\,\text{end})]$ and set $ n=n+\tilde{\Lambda}$ .

  3. 3. Output $ \{(\mathbf{Y}_{-k},U_{-k},T_{-k})\colon 0\le k\le n\}$ with updated ‘milestone’ event sequences $ \mathbf{D}$ and $ \boldsymbol{\Gamma}$ .

For $ n\ge0$ , define

(33) \begin{align}d_{1}(n) & =\inf\{D_{k}\ge n\colon \mathbf{Y}_{-D_{k}}\le\mathbf{Y}_{-n}\},\end{align}
(34) \begin{align} d_{2}(n) & =\inf\{D_{k}> d_{1}(n)\colon \Gamma_{k}=\infty\} ,\end{align}

and $ d_{2}(n)$ is an upper bound of $ N_{n}^{Y}$ defined in (24) because

\begin{equation*}\max\limits_{k\ge d_{2}(n)}\mathbf{Y}_{-k}\le\mathbf{Y}_{-d_{2}(n)}+\mathbf{m}<\mathbf{Y}_{-d_{1}(n)}\le\mathbf{Y}_{-n}.\end{equation*}

Remark 4. Since assumption (A1b) is a strengthening of assumption (A1), we can accommodate our algorithms under the general assumption (A1). The implementation details are the same as those mentioned in the remark section of [Reference Blanchet and Chen3, page 15].

A.2. Simulation algorithm for the process $ \{\mathbf{X}_{-{\mathbf{\mathit{n}}}}\colon {\mathbf{\mathit{n}}}\ge 0\}$

Recall from (21) that, for $ n\ge0$ ,

\begin{equation*}X_{-n}(i)=\sum_{j=1}^{n}(S_{-j}-a)I(U_{-j}=i)\quad \mbox{for} i=1,\ldots,c.\end{equation*}

Define

(35) \hspace*{-50pt}\begin{align} N_{k}(i)& =\sum_{j=1}^{k}I (U_{-j}=i),\hspace*{73pt}\end{align}
(36) \begin{align} L_{n}(i)& =\inf \{k\ge0\colon N_{k}(i)=n\} \ (L_{0} (i)=0),\end{align}
(37) \hspace*{-50pt}\begin{align} \hat{S}_{-n}^{(i)}& =S_{-L_{n}(i)},\hspace*{98pt}\end{align}

for $ k\ge0$ , $ n\ge0$ , and $ i=1,\ldots,c$ . $ N_{k}(i)$ denotes the total number of customers routed to server i among the first k arrivals counting backwards in time. $ L_{n}(i)$ denotes the index of the nth customer that gets routed to server i in the common arrival stream, counting backwards in time. $ \hat{S}_{-n}^{(i)}$ denotes the service time of the nth customer that gets routed to server i, counting backwards in time.

For each $ i=1,\ldots,c$ , let $ \{\hat{X}_{-n}^{(i)}\colon n\ge0\}$ with $ \hat{X}_{0}^{(i)}=0$ be an auxiliary process such that

(38) \begin{equation} \hat{X}_{-n}^{(i)}\,:\!=\sum_{j=1}^{n} (\hat{S}_{-j}^{(i)}-a) =X_{-L_{n}(i)}(i).\end{equation}

For $ n\ge0$ and $ 1\le i\le c$ , define

(39) \begin{equation} \hat{N}_{n}(i)=\inf \Big\{n^{\prime}\ge N_{n}(i)\colon \textrm{max}_{k\ge n^{\prime}}\hat{X}_{-k}^{(i)}\le\hat{X}_{-N_{n}(i)}^{(i)}\Big\} ,\end{equation}

and hence, by definition, in (23), we have

(40) \begin{equation} N_{n}^{X}=\textrm{max} \{L_{\hat{N}_{n}(1)}(1),\ldots,L_{\hat{N}_{n}(c)}(c)\} .\end{equation}

First we develop simulation algorithms for each of the c one-dimensional auxiliary processes $ \{(\hat{X}^{(i)}_{-n}\colon n\ge0)\colon 1\le i\le c\}$ . Next we use the common server allocation sequence $ \{U_{-n}\colon n\ge0\}$ (sampled jointly with the process $ \{\mathbf{Y}_{-n}\colon n\ge0\}$ in Section A.1) with (35), (36), and (37) to find $ N_{n}^{X}$ via (40) for each $ n\ge0$ .

A.2.1. ‘Milestone’ construction and global maximum simulation

For each one-dimensional auxiliary process $ \{\hat{X}^{(i)}_{-n}\colon n\ge0\}$ with $ i=1,\ldots,c$ , we adopt the algorithm developed in [Reference Blanchet and Wallwater6] by choosing any $ m^{\prime}>0$ and $ L^{\prime}\ge1$ properly and define the sequences of upward and downward ‘milestone’ events by letting $ D_{0}^{(i)}=0$ , $ \Gamma_{0}^{(i)}=\infty$ , and for $ j\ge1$ ,

(41) \begin{align}D^{(i)}_{j} & =\inf\Big\{n^{(i)}\ge\Gamma^{(i)}_{j-1}I(\Gamma_{j-1}^{(i)}<\infty)\vee D_{j-1}^{(i)}\colon \hat{X}^{(i)}_{-n^{(i)}}<\hat{X}^{(i)}_{-D_{j-1}^{(i)}}-L^{\prime}m^{\prime}\Big\},\end{align}
(42) \hspace*{-50pt}\begin{align} \Gamma^{(i)}_{j} & =\inf\Big\{n^{(i)}\ge D_{j}^{(i)}\colon \hat{X}^{(i)}_{-n^{(i)}}-\hat{X}^{(i)}_{-D_{j}^{(i)}}>m^{\prime}\Big\},\hspace*{97pt}\end{align}

with the convention that if $ \Gamma_{j}^{(i)}=\infty$ , then $ \Gamma_{j}^{(i)}I(\Gamma_{j}^{(i)}<\infty)=0$ for any $ j\ge0$ .

For each $ i=1,\ldots,c$ , define

\begin{equation*} \Lambda^{(i)}=\inf\{D^{(i)}_{k}\colon \Gamma^{(i)}_{k}=\infty,k\ge1\}.\end{equation*}

By the ‘milestone’ construction in (41) and (42), for all $ n\ge\Lambda^{(i)}$ ,

\begin{equation*}\hat{X}^{(i)}_{-n}\le\hat{X}^{(i)}_{-\Lambda^{(i)}}+m^{\prime}< 0=\hat{X}^{(i)}_{0}.\end{equation*}

Therefore we can evaluate the global maximum of the infinite-horizon process $ \{\hat{X}^{(i)}_{-n}\colon n\ge0\}$ in finite steps, that is,

\begin{equation*} M_{0}^{(i)}\,:\!=\max\limits_{k\ge0}\hat{X}^{(i)}_{-k}=\max\limits_{0\le k\le\Lambda^{(i)}}\hat{X}^{(i)}_{-k}.\end{equation*}

We summarize the simulation details in the following algorithm.

Algorithm 3. (GGM.) Simulate the global maximum of the one-dimensional process

\begin{equation*}\{(\hat{X}_{-n}^{(i)},\hat{S}_{-n}^{(i)})\colon n\ge0\}\end{equation*}

jointly with the subpath and the subsequence of ‘milestone’ events.

Input: a, $ m^{\prime}$ , $ L^{\prime}$ .

  1. 1. ( $ {Initialization}$ ) Set $ n=0$ , $ \hat{X}^{(i)}_{0}=0$ , $ \mathbf{D}^{(i)}=[0]$ , $ \boldsymbol{\Gamma}^{(i)}=[\infty]$ , and $ L^{(i)}=0$ .

  2. 2. Generate $ S\sim G$ . Set $ n=n+1$ , $ \hat{X}^{(i)}_{-n}=\hat{X}^{(i)}_{-(n-1)}+S$ , and $ \hat{S}^{(i)}_{-n}=S$ .

  3. 3. If $ \hat{X}_{-n}^{(i)}\ge L^{(i)}-L^{\prime}m^{\prime}$ , go to step 2; otherwise set $ \mathbf{D}^{(i)}=[\mathbf{D}^{(i)},n]$ and $ L^{(i)}=\hat{X}_{-n}^{(i)}$ .

  4. 4. Call Algorithm 1 of [Reference Blanchet and Wallwater6, page 10] and obtain $ (J,\omega)$ .

  5. 5. If $ J=1$ , set

    \begin{equation*} \hat{X}^{(i)}_{-(n+l)}=L^{(i)}+\omega(l),\quad \hat{S}^{(i)}_{-(n+l)}=\hat{X}^{(i)}_{-(n+l)}-\hat{X}^{(i)}_{-(n+l-1)}+a \quad \textrm{for}\ l=1, \ldots, \textrm{length}(\omega).\end{equation*}
    Set $ n=n+\operatorname{length}(\omega)$ , $ \boldsymbol{\Gamma}^{(i)}=[\boldsymbol{\Gamma}^{(i)},n]$ and go to step 2.
  6. 6. If $ J=0$ , set $ \Lambda^{(i)}=n$ and $ \boldsymbol{\Gamma}^{(i)}=[\boldsymbol{\Gamma}^{(i)},\infty]$ .

  7. 7. Output

    \begin{equation*} \{(\hat{X}^{(i)}_{-k},\hat{S}^{(i)}_{-k})\colon 1\le k\le\Lambda^{(i)}\},\end{equation*}
    $ \mathbf{D}^{(i)}$ , $ \boldsymbol{\Gamma}^{(i)}$ , and global maximum $ M_{0}^{(i)}=\textrm{max}_{0\le k\le\Lambda^{(i)}}\hat{X}^{(i)}_{-k}$ .

A.2.2. Simulate $ \{\mathbf{X}_{-n}\colon n\ge0\}$ jointly with ‘milestone’ events

In this section we first explain how to sample the auxiliary one-dimensional processes $ \{\hat{X}^{(i)}_{-n}\colon n\ge0\}$ along with the ‘milestone’ events defined in (41) and (42). Next we will need the service allocation information $ \{U_{-n}\colon n\ge0\}$ , from the simulation procedure of process $ \{\mathbf{Y}_{-n}\colon n\ge0\}$ , to recover the multi-dimensional process of interest $ \{\mathbf{X}_{-n}\colon n\ge0\}$ via (38).

The following algorithm gives the the sampling procedure for each auxiliary one-dimensional process $ \{\hat{X}^{(i)}_{-n}\colon n\ge0\}$ for $ i=1,\ldots,c$ . The simulation steps are the same as the procedure given in Algorithm 3 of [Reference Blanchet and Wallwater6, page 16].

Algorithm 4. (GRW.) Continue to sample the process $ \{(\hat{X}^{(i)}_{-k},\hat{S}^{(i)}_{-k})\colon 0\le k\le n\}$ jointly with the partially sampled ‘milestone’ event lists $ \mathbf{D}^{(i)}$ and $ \boldsymbol{\Gamma}^{(i)}$ , until a stopping criterion is met.

Input: a, $ m^{\prime}$ , $ L^{\prime}$ , previously sampled partial process $ \{(\hat{X}^{(i)}_{-j},\hat{S}^{(i)}_{-j})\colon 0\le j\le l\}$ , partial ‘milestone’ sequences $ \mathbf{D}^{(i)}$ and $ \boldsymbol{\Gamma}^{(i)}$ , and stopping criterion $ \mathcal{H}^{(i)}$ . (Note that if there is no previously simulated random walk, we initialize $ l=0$ , $ \mathbf{D}^{(i)}=[0]$ , and $ \boldsymbol{\Gamma}^{(i)}=[\infty]$ .)

  1. 1. Set $ n=l$ . If $ n=0$ , call Algorithm GGM to get $ \Lambda^{(i)}$ , $ \{(\hat{X}^{(i)}_{-k},\hat{S}^{(i)}_{-k})\colon 0\le k\le\Lambda^{(i)}\}$ , $ \mathbf{D}^{(i)}$ , and $ \boldsymbol{\Gamma}^{(i)}$ . Set $ n=\Lambda^{(i)}$ .

  2. 2. While the stopping criterion $ \mathcal{H}^{(i)}$ is not satisfied:

    1. (a) Call Algorithm GGM to get $ \tilde{\Lambda}^{(i)}$ , $ \{(\tilde{X}^{(i)}_{-j},\tilde{S}^{(i)}_{-j})\colon 0\le j\le\tilde{\Lambda}^{(i)}\}$ , $ \tilde{\mathbf{D}}$ , $ \tilde{\boldsymbol{\Gamma}}$ , and $ \tilde{M}_{0}^{(i)}$ .

    2. (b) If $ \tilde{M}_{0}^{(i)}\le m^{\prime}$ , accept the proposed sequence and concatenate it to the previous subpath, that is, set $ \hat{X}_{-(n+j)}^{(i)} =\hat{X}^{(i)}_{-n}+\tilde{X}^{(i)}_{-j}$ , $ \hat{S}^{(i)}_{-(n+j)} =\tilde{S}_{-j}^{(i)}$ for $ 1\le j\le\tilde{\Lambda}^{(i)}$ . Update the sequences of ‘milestone’ events to be $ \mathbf{D}^{(i)}=[\mathbf{D}^{(i)},n +\tilde{\mathbf{D}}^{(i)}(2\colon \text{end})]$ , $ \boldsymbol{\Gamma}^{(i)} =[\boldsymbol{\Gamma}^{(i)},n+\tilde{\boldsymbol{\Gamma}}^{(i)}(2\colon \text{end})]$ and set $ n=n+\tilde{\Lambda}^{(i)}$ .

  3. 3. Output $ \{(\hat{X}^{(i)}_{-k},\hat{S}^{(i)}_{-k})\colon 0\le k\le n\}$ with updated ‘milestone’ event sequences $ \mathbf{D}^{(i)}$ and $ \boldsymbol{\Gamma}^{(i)}$ .

With the service allocation information $ \{U_{-n}\colon n\ge0\}$ , we can construct the c-dimensional process $ \{\mathbf{X}_{-n}\colon n\ge0\}$ ( $ \mathbf{X}_{0}=\mathbf{0}$ ) from the auxiliary processes $ \{(\hat{X}^{(i)}_{-n},\hat{S}^{(i)}_{-n})\colon n\ge0\}$ , $ i=1,\ldots,c$ . For $ n\ge1$ ,

(43) \hspace*{-50pt}\begin{align} S_{-n} & =\hat{S}^{(U_{-n})}_{\sum_{j=1}^{n}I(U_{-j}=U_{-n})},\hspace*{60pt}\end{align}
(44) \begin{align} X_{-n}(i) & =\begin{cases}[c]{lr}X_{-(n-1)}(i) & \text{if} i\neq U_{-n}{,}\\X_{-(n-1)}+S_{-n}-a & \text{if} i=U_{-n}.\end{cases}\end{align}

By the definition of ‘milestone’ events in (41) and (42), for each $ n\ge0$ , let

(45) \begin{align}d_{1}^{(i)}(n) & =\inf\Big\{D_{k}^{(i)}\ge n\colon \hat{X}_{-D_{k}^{(i)}}^{(i)}\le\hat{X}^{(i)}_{-n}\Big\},\end{align}
(46) \begin{align} d_{2}^{(i)}(n) & =\inf\big\{D_{k}^{(i)}>d_{1}^{(i)}(n)\colon \Gamma_{k}^{(i)}=\infty\big\}.\end{align}

Since

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge d_{2}^{(i)}(N_{n}(i))}}$} \hat{X}^{(i)}_{-k}\le\hat{X}^{(i)}_{-d_{2}^{(i)}(N_{n}(i))}+m^{\prime}<\hat{X}^{(i)}_{-d_{1}^{(i)}(N_{n}(i))}\le\hat{X}_{-N_{n}(i)}^{(i)},\end{equation*}

we conclude that $ \hat{N}_{n}(i)\le d_{2}^{(i)}(N_{n}(i))$ and hence

\begin{equation*}N_{n}^{X}\le\textrm{max}\Big\{L_{d_{2}^{(1)}(N_{n}(1))}(1),\ldots,L_{d_{2}^{(c)}(N_{n}(c))}(c)\Big\}.\end{equation*}

A.3. Simulation algorithm for $ \{\mathbf{R}_{n}^{(r)}\colon n\ge0\}$ and coalescence detection

We shall combine the simulation algorithms in Sections A.1 and A.2 for processes

\begin{equation*}\{((\hat{X}^{(i)}_{-n},\hat{S}^{(i)}_{-n})\colon n\ge0),1\le i \le c\}\quad\text{and}\quad\{(\mathbf{Y}_{-n}, U_{-n},T_{-n})\colon n\ge0\}\end{equation*}

together to exactly simulate the multi-dimensional random walk $ \{\mathbf{R}^{(r)}_{n}\colon n\ge0\}$ until coalescence time N defined in (10). To detect the coalescence, we start from $ n=0$ to compute $ d_{2}(n)$ and $ d_{2}^{(i)}(N_{n}(i))$ (as defined in (34) and (36) respectively). If

(47) \begin{equation} \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\le k\le d_{2}(n)}}$} \mathbf{Y}_{-k}=\mathbf{Y}_{-n},\end{equation}

and

(48) \begin{equation} \raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{N_{n}(i)\le k\le d_{2}^{(i)}(N_{n}(i))}}$} \hat{X}^{(i)}_{-k}=\hat{X}^{(i)}_{-N_{n}(i)}\end{equation}

for all $ i=1,\ldots,c$ , we set the coalescence time $ N\leftarrow n$ and stop. Otherwise we increase n by 1 and repeat the above procedure until the first time that (47) and (48) are satisfied.

In the following algorithm we give the simulation procedure to detect coalescence while sampling the time-reversed multi-dimensional process $ \{\mathbf{R}^{(r)}_{n}\colon n\ge0\}$ .

Algorithm 5. (CD.) Sample the coalescence time N jointly with the process $ \{\mathbf{R}^{(r)}_{n}\colon n\ge0\}$ .

Input: a, m, $ m^{\prime}$ , $ L^{\prime}$ .

  1. 1. (Initialization) Set $ n=0$ . Set $ l=0$ , $ \mathbf{Y}_{0}=\mathbf{0}$ , $ \mathbf{D}=[0]$ , and $ \boldsymbol{\Gamma}=[\infty]$ . Set $ l_{i}=0$ , $ \hat{X}^{(i)}_{0}=0$ , $ \mathbf{D}^{(i)}=[0]$ , and $ \boldsymbol{\Gamma}^{(i)}=[\infty]$ for all $ i=1,\ldots,c$ .

  2. 2. Call Algorithm LTRW to further sample $ \{(\mathbf{Y}_{-j}, U_{-j},T_{-j})\colon 0\le j\le l\}$ , $ \mathbf{D}$ , and $ \boldsymbol{\Gamma}$ with the stopping criterion $ \mathcal{H}$ being $ \sum_{j=1}^{l}I(U_{-j}=i)>l_{i}$ for all $ i=1,\ldots,c$ and $ \mathbf{Y}_{-\mathbf{D}(\text{end}-1)}\le\mathbf{Y}_{-n}$ .

  3. 3. For each $ i=1,\ldots,c$ :

    1. (a) Set $ n_{i}=\sum_{j=1}^{n}I(U_{-j}=i)$ .

    2. (b) Call Algorithm GRW to further sample $ \{(\hat{X}^{(i)}_{-k}, \hat{S}^{(i)}_{-k})\colon 0\le k\le l_{i}\}$ , $ \mathbf{D}^{(i)}$ , and $ \boldsymbol{\Gamma}^{(i)}$ with the stopping criterion $ \mathcal{H}^{(i)}$ being $ \sum_{j=1}^{l}I(U_{-j}=i)\le l_{i}$ and $ \hat{X}^{(i)}_{-\mathbf{D}^{(i)}(\text{end}-1)}\le\hat{X}^{(i)}_{-n_{i}}$ .

  4. 4. If $ \textrm{max}_{n\le k\le\mathbf{D}(\text{end})}\mathbf{Y}_{-k}\le\mathbf{Y}_{-n}$ and $ \textrm{max}_{n_{i}\le k\le\mathbf{D}^{(i)}(\text{end})}\hat{X}^{(i)}_{-k}\le\hat{X}^{(i)}_{-n_{i}}$ for all $ i=1,\ldots,c$ , go to the next step. Otherwise set $ n=n+1$ and go to step 2.

  5. 5. For $ 1\le k\le n$ , recover $ S_{-k}$ and $ \mathbf{X}_{-k}$ from the auxiliary processes via (43) and (44).

  6. 6. Output coalescence time $ N=n$ , the sequence $ \{(U_{-k},T_{-k},S_{-k})\colon 0\le k\le n\}$ , and process $ \{\mathbf{R}^{(r)}_{k}\colon 0\le k\le n\}$ .

Appendix B. Proofs

B.1. Proof of Proposition 1

Proof.Firstly, $ \mathbb{E}[N]<\infty$ holds true under assumptions $ \rho{<}c$ and $ \mathbb{P}(T>S)>0$ (proved in [Reference Sigman18]). Next we shall prove the computational cost $ \tau$ has finite expectation as well.

For $ n\ge0$ , we have $ N_{n}^{X}$ , $ N_{n}^{Y}$ , and $ N_{n}$ defined in (23)–(25) such that

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$}\,\mathbf{R}^{(r)}_{k}\le\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$}\,\mathbf{X}_{-k}+\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$}\,\mathbf{Y}_{-k}\le\mathbf{X}_{n}+\mathbf{Y}_{n}=\mathbf{R}_{n}^{(r)}.\end{equation*}

Therefore, in order to evaluate the running time maximum over the infinite horizon $ \textrm{max}_{k\ge n}\mathbf{R}_{k}^{(r)}$ , it only requires sampling from n to $ N_{n}$ backwards in time, that is,

\begin{equation*}\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge n}}$}\,\mathbf{R}_{k}^{(r)}=\textrm{max} \bigg\{\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\le k\le N_{n}}}$}\,\mathbf{R}_{k}^{(r)},\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{k\ge N_{n}}}$} \mathbf{R}_{k}^{(r)}\bigg\} =\raisebox{-5pt}{$\stackrel{\text{\normalsize\hspace*{-3pt} max}}{_{n\ge k\le N_{n}}}$}\,\mathbf{R}_{k}^{(r)}.\end{equation*}

An easy upper bound for $ \tau$ is given by $ \tilde{\tau}=\sum_{n=0}^{N}N_{n}$ . By Wald’s identity, it suffices to show that $ \mathbb{E} [N_{n}] <\infty$ for any $ n\ge0$ .

By the ‘milestone’ events construction for multi-dimensional process $ \{\mathbf{Y}_{-n}\colon n\ge0\}$ in (27), (28) and because $ d_{2}(n)$ is an upper bound of $ N_{n}^{Y}$ , $ \mathbb{E} [N_{n}^{Y}] \le \mathbb{E} [d_{2}(n)] <\infty$ follows directly from elementary properties of compound geometric random variables (see Theorem 1 of [Reference Blanchet and Chen3]).

For the other process $ \{\mathbf{X}_{-n}\colon n\ge0\}$ , we simulate each of its c entries separately, that is, $ \{\{\hat{X}^{(i)}_{-n}\colon n\ge0\}\colon 1\le i\le c\}$ in Section A.2. Equation (40) gives

\begin{equation*}N_{n}^{X}=\textrm{max} \{L_{\hat{N}_{n}(1)}(1),\ldots,L_{\hat{N}_{n}(c)}(c)\} \le\sum_{i=1}^{c}L_{\hat{N}_{n}(i)}(i) {,}\end{equation*}

where $ \hat{N}_{n}(i)$ is defined in (39). By Theorem 2.2 of [Reference Blanchet and Wallwater6], $ \mathbb{E} [\hat{N}_{n}(i)] <\infty$ . Because

\begin{equation*}L_{\hat{N}_{n}(i)}(i)=\inf\Bigg\{k\ge0\colon \sum_{j=1}^{k}I(U_{-j}=i)=\hat{N}_{n}(i)\Bigg\}\sim \operatorname{NegBinomial}\bigg(\hat{N}_{n}(i);\ 1-\dfrac{1}{c}\bigg) +\hat{N}_{n}(i),\end{equation*}

hence

\begin{equation*}\mathbb{E} [L_{\hat{N}_{n}(i)}(i)] =(c-1)\mathbb{E} [\hat{N}_{n}(i)] +\mathbb{E} [\hat{N}_{n}(i)] =c\mathbb{E} [\hat{N}_{n}(i)] <\infty,\end{equation*}

and

\begin{equation*}\mathbb{E} [N_{n}^{X}] \le\sum_{i=1}^{c}\mathbb{E} [L_{\hat{N}_{n}(i)}(i)] <\infty.\end{equation*}

Therefore

\begin{equation*}\mathbb{E} [N_{n}] \le \mathbb{E} [N_{n}^{X}] +\mathbb{E} [N_{n}^{Y}] < \infty.\end{equation*}

B.2. Proof of Proposition 2

Proof. By Wald’s identity, it suffices to show that $ \mathbb{E} [\kappa^{*}_{+}] <\infty$ because $ \mathbb{E} [T] <\infty$ . Next we only provide a proof outline here since it follows the same argument as in the proof of Proposition 3 in [Reference Blanchet, Dong and Pei7].

Firstly, we construct a sequence of events $ \{\Omega_{k}\colon k\ge1\}$ , which leads to the occurrence of $ \kappa^{*}_{+}$ . Secondly, we split the process $ \{\mathbf{W}_{0}^{u}(t_{n})\colon n\ge0\}$ into cycles with bounded expected cycle length. We also ensure the probability that the event happens during each cycle is bounded from below by a constant, which allows us to bound the number of cycles we need to check before finding $ \kappa^{*}_{+}$ by a geometric random variable. Finally, we could establish an upper bound for $ \mathbb{E} [\kappa^{*}_{+}] $ by applying Wald’s identity again.

Acknowledgements

Support from the NSF via grants CMMI-1538217, DMS-1820942, and DMS-1838676 is gratefully acknowledged.

References

Asmussen, S. (2008). Applied Probability and Queues (Applications of Mathematics: Stochastic Modelling and Applied Probability 51). Springer.Google Scholar
Asmussen, S., Glynn, P. W. and Thorisson, H. (1992). Stationarity detection in the initial transient problem. ACM Trans. Model. Comput. Simul. 2, 130157.CrossRefGoogle Scholar
Blanchet, J. and Chen, X. (2015). Steady-state simulation of reflected Brownian motion and related stochastic networks. Ann. Appl. Prob. 25, 32093250.CrossRefGoogle Scholar
Blanchet, J. and Dong, J. (2015). Perfect sampling for infinite server and loss systems. Adv. Appl. Prob. 47, 761786.CrossRefGoogle Scholar
Blanchet, J. H. and Sigman, K. (2011). On exact sampling of stochastic perpetuities. J. Appl. Prob. 48, 165182.CrossRefGoogle Scholar
Blanchet, J. and Wallwater, A. (2015). Exact sampling of stationary and time-reversed queues. ACM Trans. Model. Comput. Simul. 25, 26.CrossRefGoogle Scholar
Blanchet, J., Dong, J. and Pei, Y. (2018). Perfect sampling of GI/GI/c queues. Queueing Systems 90, 133.CrossRefGoogle Scholar
Connor, S. B. and Kendall, W. S. (2015). Perfect simulation of M/G/c queues. Adv. Appl. Prob. 47, 10391063.CrossRefGoogle Scholar
Dai, H. (2011). Exact Monte Carlo simulation for fork–join networks. Adv. Appl. Prob. 43, 484503.CrossRefGoogle Scholar
Flatto, L. and Hahn, S. (1984). Two parallel queues created by arrivals with two demands, I. SIAM J. Appl. Math. 44, 10411053.CrossRefGoogle Scholar
Foss, S. (1980). Approximation of multichannel queueing systems. Siberian Math. J. 21, 851857.CrossRefGoogle Scholar
Foss, S. G. and Chernova, N. I. (2001). On optimality of the FCFS discipline in multiserver queueing systems and networks. Siberian Math. J. 42, 372385.CrossRefGoogle Scholar
Halfin, S. and Whitt, W. (1981). Heavy-traffic limits for queues with many exponential servers. Operat. Res. 29, 567588.CrossRefGoogle Scholar
Hillier, F. S. and Lo, F. D. (1972). Tables for multiple-server queueing systems involving Erlang distributions. Technical report, Stanford University, CA.Google Scholar
Kendall, W. S. and Møller, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. Appl. Prob. 32, 844865.CrossRefGoogle Scholar
Kiefer, J. and Wolfowitz, J. (1955). On the theory of queues with many servers. Trans. Amer. Math. Soc. 78, 118.CrossRefGoogle Scholar
Propp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9, 223252.3.0.CO;2-O>CrossRefGoogle Scholar
Sigman, K. (1988). Regeneration in tandem queues with multiserver stations. J. Appl. Prob. 25, 391403.CrossRefGoogle Scholar
Sigman, K. (1995). Stationary Marked Point Processes: An Intuitive Approach (Stochastic Modeling Series 2). Taylor & Francis.Google Scholar
Sigman, K. (2011). Exact simulation of the stationary distribution of the FIFO M/G/c queue. J. Appl. Prob. 48, 209213.CrossRefGoogle Scholar
Sigman, K. (2012). Exact simulation of the stationary distribution of the FIFO M/G/c queue: the general case for. Queueing Systems 70, 3743.CrossRefGoogle Scholar
Wolff, R. W. (1987). Upper bounds on work in system for multichannel queues. J. Appl. Prob. 24, 547551.CrossRefGoogle Scholar
Wolff, R. W. (1989). Stochastic Modeling and the Theory of Queues (Prentice Hall International Series in Industrial and Systems Engineering 14). Prentice Hall, Englewood Cliffs, NJ.Google Scholar
Figure 0

Figure 1: Number of customers for an M/M/c queue in stationarity when λ=3, μ=2, c=2.

Figure 1

Figure 2: Number of customers for an M/M/c queue in stationarity when λ=10, μ=2, c=10.

Figure 2

Figure 3: Number of customers for an $ \operatorname{Erlang}(k_{1},\lambda)/\operatorname{Erlang}(k_{2},\mu)/c$ queue in stationarity when $ k_{1}=2$ , $ \lambda=9$ , $ k_{2}=2$ , $ \mu=5$ , $ c=2$ , and $ \rho/c=0.9$ .

Figure 3

Figure 4: Computational efficiency comparison between two algorithms for an M/M/c queue.

Figure 4

Table 1. Simulation results for computational complexities with varying traffic intensities for an M/M/c queue with fixed $ \mu=5$ and $ c=2$ .