Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-02-11T09:20:12.094Z Has data issue: false hasContentIssue false

Steady-state analysis of load-balancing algorithms in the sub-Halfin–Whitt regime

Published online by Cambridge University Press:  16 July 2020

Xin Liu*
Affiliation:
Arizona State University
Lei Ying*
Affiliation:
Arizona State University
*
*Postal address: School of Electrical, Computer and Energy Engineering, 436 Goldwater Center for Science and Engineering, 650 E Tyler Mall, Arizona State University, Tempe, AZ 85287, USA.
*Postal address: School of Electrical, Computer and Energy Engineering, 436 Goldwater Center for Science and Engineering, 650 E Tyler Mall, Arizona State University, Tempe, AZ 85287, USA.
Rights & Permissions [Opens in a new window]

Abstract

We study a class of load-balancing algorithms for many-server systems (N servers). Each server has a buffer of size $b-1$ with $b=O(\sqrt{\log N})$, i.e. a server can have at most one job in service and $b-1$ jobs queued. We focus on the steady-state performance of load-balancing algorithms in the heavy traffic regime such that the load of the system is $\lambda = 1 - \gamma N^{-\alpha}$ for $0<\alpha<0.5$ and $\gamma > 0,$ which we call the sub-Halfin–Whitt regime ($\alpha=0.5$ is the so-called Halfin–Whitt regime). We establish a sufficient condition under which the probability that an incoming job is routed to an idle server is 1 asymptotically (as $N \to \infty$) at steady state. The class of load-balancing algorithms that satisfy the condition includes join-the-shortest-queue, idle-one-first, join-the-idle-queue, and power-of-d-choices with $d\geq \frac{r}{\gamma}N^\alpha\log N$ (r a positive integer). The proof of the main result is based on the framework of Stein’s method. A key contribution is to use a simple generator approximation based on state space collapse.

Type
Research Papers
Copyright
© Applied Probability Trust 2020

1. Introduction

This paper studies the steady-state performance of load-balancing algorithms in many-server systems. We consider a system with N identical servers with buffer size $b-1$ such that $b=O\left(\sqrt{\log N}\right)$; in other words, each server can hold at most b jobs, one job in service and $b-1$ jobs in a buffer. We assume jobs arrive according to a Poisson process with rate $\lambda N$, where $\lambda = 1 - \gamma N^{-\alpha}$ for $0<\alpha<0.5$ and $\gamma >0$, and have independent and identically distributed (i.i.d.) exponential service times with mean 1. When a job arrives, the load balancer immediately routes the job to one of the servers. If the server’s buffer is full, the job is discarded. We study a class of load-balancing algorithms, which includes join-the-shortest-queue (JSQ), idle-one-first (I1F) [Reference Gupta and Walton9], join-the-idle-queue (JIQ) [Reference Lu, Xie, Kliot, Geller, Larus and Greenberg11, Reference Stolyar14], and power-of-d-choices (Pod) with $d\geq \frac{r}{\gamma}N^\alpha\log N$ [Reference Mitzenmacher12, Reference Vvedenskaya, Dobrushin and Karpelevich17], and establish moment bounds on some function of the queue lengths. From the moment bounds, we show that under JSQ, I1F, JIQ, and Pod with $d\geq \frac{r}{\gamma}N^\alpha\log N$, both the probability that a job is routed to a non-idle server and the expected waiting time per job are $\smash{O\big(\frac{b}{N^{r(0.5-\alpha)}}\big)}$, where r is any positive integer such that $\smash{r \leq \frac{\log N}{72(b-1)^2}}$.

1.1. Related work and our contributions

Performance analysis of many-server systems is one of the most fundamental and widely studied problems in queueing theory. The stationary distribution of the classic M/M/N system (called the Erlang C model) was one of the earliest subjects studied. For systems with distributed queues where each server maintains a separate queue, it is well known that the JSQ algorithm is delay optimal [Reference Weber19, Reference Winston20] under fairly general conditions. However, the exact stationary distribution of many-server systems under JSQ remains an open problem. A recent breakthrough in this area is [Reference Eschenfeldt and Gamarnik6], which shows that in the Halfin–Whitt regime ($\alpha=0.5$) the diffusion-scaled process converges to a two-dimensional diffusion limit, from which it can be shown that most servers have one job in service and $\Theta\big(\sqrt{N}\big)$ servers have two jobs (one in service and one in a buffer). This seminal work has led to several significant developments: (i) [Reference Braverman3] proved that the stationary distribution indeed converges to the stationary distribution of the two-dimensional diffusion limit based on Stein’s method; (ii) via stochastic coupling, [Reference Mukherjee, Borst, van Leeuwaarden and Whiting13] showed that the diffusion limit of Pod converges to that of JSQ in the Halfin–Whitt regime at the process level (over finite time) when $d=\Theta\big(\sqrt{N}\log N\big)$; and (iii) when $\alpha<1/6$, [Reference Liu and Ying10] proved that the waiting probability of a job is asymptotically zero with $\smash{d=\Omega\left(\frac{\log N}{1-\lambda}\right)}$ at the steady state, based on Stein’s method. Interested readers can find a comprehensive survey of recent results in [Reference van der Boor, Borst, van Leeuwaarden and Mukherjee16].

Let $S_i$ denote the fraction of servers with at least i jobs, including the one in service, at steady state. In this paper we prove that if a load-balancing algorithm routes an incoming job to an idle server with probability at least $1-\frac{1}{\sqrt{N}\,}$ when the fraction of busy servers is no more than $\smash{\lambda+\frac{{\bar k}\log N}{\sqrt{N}}}$, then the following bound holds for any positive integer $\smash{r \leq \frac{\log N}{72(b-1)^2}}$:

\begin{align*}\mathbb{E}\Bigg[\Bigg(\max\Bigg\{\sum_{i=1}^b S_i-\lambda -\frac{{\bar k}\log N}{\sqrt{N}},0\Bigg\}\Bigg)^r\Bigg]\leq 10\left(\frac{5r(b-1)}{\sqrt{N}\log N}\right)^r, \qquad {\bar k}=1+\frac{1}{2(b-1)}.\end{align*}

This result implies that

\begin{align*}\mathbb{E}\Bigg[\sum_{i=1}^b {S}_i\Bigg]\leq \lambda +\frac{{11\lambda b+11}}{N^{r(0.5-\alpha)}}, \end{align*}

i.e. the expected queue length per server exceeds $\lambda$ by at most $\frac{{11\lambda b+11}}{N^{r(0.5-\alpha)}}$, and that under JSQ, I1F, JIQ, and Pod ($d\geq \frac{r}{\gamma}N^{\alpha}\log N$), the stationary probability that an incoming job is routed to a non-idle server is asymptotically zero (as $N \to \infty$), which will be proved in Corollary 2.1.

To the best of our knowledge, there are only a few papers that deal with the steady-state analysis of many-server systems with distributed queues [Reference Banerjee and Mukherjee1, Reference Braverman3, Reference Liu and Ying10]. [Reference Banerjee and Mukherjee1] and [Reference Braverman3] analyze the steady-state distribution of JSQ in the Halfin–Whitt regime, and [Reference Liu and Ying10] studies Pod with $\alpha<1/6$. This paper complements all three, as it applies to a class of load-balancing algorithms and to any sub-Halfin–Whitt regime. Table 1 summarizes the comparison between our results and existing ones in the literature. Since the existing works focused on steady-state queue length, we also present our results in terms of ${\textrm E}[S_i]$ for comparison purposes.

Table 1: Our contributions and related work.

Similar to [Reference Braverman3] and [Reference Liu and Ying10], the result of this paper is proved using the mean-field approximation (fluid-limit approximation) based on Stein’s method. The execution of Stein’s method here, however, is quite different from [Reference Braverman3, Reference Liu and Ying10].

In our proof we consider a simple fluid system with arrival rate $\lambda$ and departure rate $\lambda+\frac{\log N}{\sqrt{N}}$, so

(1)\begin{equation}\dot{x} = -\frac{\log N}{\sqrt{N}}.\end{equation}

x can be viewed as a fluid approximation of the normalized total queue length $\sum_{i=1}^b S_i$, and $\dot x$ is the derivative of x with respect to time t. The dynamic of this fluid system (1) is a good approximation of the generator of the stochastic system only when the normalized service rate of the stochastic system is close to $\lambda+\frac{\log N}{\sqrt{N}}$, i.e. when $S_1\approx \lambda+\frac{\log N}{\sqrt{N}}$. Our analysis consider three regimes of the state space:

  1. Regime 1: $S_1$ is close to $\lambda+\frac{\log N}{\sqrt{N}}$. In this regime, the simple fluid system can approximate the generator of the stochastic system. Via Stein’s method, we can quantify the approximation error.

  2. Regime 2: $\sum_{i=2}^b S_i \leq \frac{c\log N}{\sqrt{N}}$ for some $c>0$. Since $S_1\leq 1$, in this regime the normalized total queue length is close to one.

  3. Regime 3: The state is in neither regime 1 nor regime 2. In this case we apply the tail bound in [Reference Bertsimas, Gamarnik and Tsitsiklis2] to prove that the probability it occurs is small, and negligible as N increases. This is equivalent to the state-space-collapse argument, which shows that at steady state, the system ‘lives’ in a lower-dimensional space instead of in the full state space.

Pioneered in [Reference Stolyar15] (and called the drift-based fluid limits method) for fluid-limit analysis and in [Reference Braverman and Dai4, Reference Braverman, Dai and Feng5] for steady-state diffusion approximation, the power of Stein’s method for steady-state approximations has been recognized in a number of recent papers [Reference Braverman3, Reference Braverman and Dai4, Reference Braverman, Dai and Feng5, Reference Gast7, Reference Gast and Van Houdt8, Reference Stolyar15, Reference Ying21, Reference Ying22].

The surprising part of our analysis is that the simple fluid system, which only ‘partially’ approximates the generator of the stochastic system, is sufficient for executing Stein’s method when combining with the state-space-collapse approach. The advantage of using such a simple fluid system is that Stein’s equation can be solved easily, which is often the key difficulty of applying Stein’s method for queueing systems.

Finally, we would like to note that all the proofs in this paper are elementary. Therefore, this paper is another example that demonstrates the power of Stein’s method for analyzing complex queueing systems with elementary probability methods.

2. Model and main results

Consider a many-server system with N homogeneous servers, where job arrival follows a Poisson process with rate $\lambda N$ and service times are i.i.d. exponential random variables with rate 1. We consider the sub-Halfin–Whitt regime such that $\lambda=1-\gamma N^{-\alpha}$ for some $0<\alpha<0.5$ and $\gamma >0$. As shown in Figure 1, each server maintains a separate queue and we assume a buffer size of $b-1$ (i.e. each server can have one job in service and $b-1$ jobs in the queue).

Figure 1. Load balancing in many-server systems.

Let $S_i(t)$ denote the fraction of servers with at least i jobs at time $t \geq 0$. Under the finite buffer assumption with buffer size $b-1$, we define $S_i(t) = 0$ for all $i \geq b+1$ and all $t\geq 0$ for notational convenience. Furthermore, define the set ${\mathbb S}\subseteq {\mathbb R}^b$ such that

\begin{equation*}\mathbb S = \{ s\in{\mathbb R}^b \mid 1\geq s_1\geq \cdots \geq s_{b}\geq 0\hbox{ and } Ns_{i}\in \mathbb N\ \text{for all } i\}.\end{equation*}

We then have $S(t) = [S_1(t), S_2(t), \ldots ,S_b(t)]^\top \in \mathbb S$ for any $t\geq 0$. We consider a class of load-balancing algorithms which route each incoming job to a server upon its arrival based on S(t) so that ($S(t)\,{:}\,t\geq 0$) is a finite-state and irreducible continuous-time Markov chain (CTMC), which implies that ($S(t)\,{:}\,t\geq 0$) has a unique stationary distribution. Let $S \in \mathbb S$ be the random variables having the stationary distribution of ($S(t)\,{:}\,t\geq 0$). Note that $\lambda$, ($S(t)\,{:}\,t\geq 0$), and S all depend on N, the number of servers in the system. Let $A_1(s)$ denote the probability that an incoming job is routed to a busy server when the system is in state $s \in \mathbb S$, i.e.

\begin{equation*}A_1(s)= \mathbb{P}(\text{an incoming job is routed to a busy server} \mid {S(t)=s}).\end{equation*}

The main result of this paper is the following theorem.

Theorem 2.1. Assume $\lambda=1-\gamma N^{-\alpha}$ for $0<\alpha<0.5$ and $\gamma >0$, and $b \leq 1+\frac{\sqrt{\log N}}{{9}}$. If a load-balancing algorithm guarantees $A_1(s)\leq \frac{1}{\sqrt{N}\,}$ for any $s \in \mathbb S$ such that $s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}$, then for any integers N and r such that $\smash{N\geq \big(\frac{{4}{\bar k}\log N}{\gamma}\big)^{\frac{1}{^{0.5-\alpha} }}}$ and $1\leq r\leq {\frac{\log N}{72 (b-1)^2}}$, the following bound holds at steady state:

\begin{align*}\mathbb{E}\Bigg[\Bigg(\max\Bigg\{\sum_{i=1}^{b} S_i-\lambda -\frac{{\bar k}\log N}{\sqrt{N}},0\Bigg\}\Bigg)^r\Bigg]\leq 10\Bigg(\frac{5r(b-1)}{\sqrt{N}\log N}\Bigg)^r,\end{align*}

where ${\bar k = 1+\frac{1}{2(b-1)}}$.

Note that the expectation in Theorem 2.1 is with respect to the stationary distribution of the CTMC ($S(t)\,{:}\,t\geq 0$) according to the definition of S. The condition $A_1(s)\leq \frac{1}{\sqrt{N}\,}$ when $\smash{s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}}$ requires the following: for any given state s in which at least the fraction $\frac{\gamma}{N^{\alpha}}-\frac{{\bar k}\log N}{\sqrt{N}}$ of servers are idle, an incoming job should be routed to an idle server with probability at least $\smash{1-\frac{1}{\sqrt{N}\,}}$. Note that $\smash{N\geq \left(\frac{{4}{\bar k}\log N}{\gamma}\right)^{\frac{1}{^{0.5-\alpha} }}}$ implies $\smash{\frac{\gamma}{N^\alpha} \geq \frac{4{\bar k}\log N}{\sqrt{N}}}$, which guarantee that $\lambda+\frac{{\bar k}\log N}{\sqrt{N}} < 1$ and $\frac{\gamma}{N^{\alpha}} > \frac{{\bar k}\log N}{\sqrt{N}}$. There are several well-known policies that satisfy this condition:

  1. JSQ routes an incoming job to the least-loaded server in the system, so $A_1(s)=0$ when $\smash{s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}}$.

  2. I1F routes an incoming job to an idle server, if available, or to a server with one job, if available. Otherwise, the job is routed to a randomly selected server. Therefore, $A_1(s)=0$ when $s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}$.

  3. JIQ routes an incoming job to an idle server if possible; otherwise, it routes the job to a server chosen uniformly at random. Therefore, $A_1(s)=0$ when $\smash{s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}}$.

  4. Pod samples d servers uniformly at random and dispatches the job to the least-loaded server among the d servers. Ties are broken uniformly at random. Given $d \geq \frac{{r}}{\gamma}N^\alpha\log N,$$\smash{A_1(s)\leq \frac{1}{\sqrt{N}\,}}$ when $\smash{s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}}$.

A direct consequence of Theorem 2.1 is asymptotic zero waiting ($N \to \infty$) at steady state. Let $\mathcal W$ denote the event that an incoming job is routed to a busy server, and $p_{\mathcal W}$ denote the probability of this event at steady state. Let $\mathcal B$ denote the event that an incoming job is blocked (discarded) and $p_{\mathcal B}$ denote the probability of this event at steady state. Note that ${\mathcal B}\subseteq {\mathcal W}$, because an incoming job is blocked when being routed to a busy server with b jobs. Furthermore, let W denote the waiting time of jobs that are not blocked at steady state. We have the following results based on the main theorem.

Corollary 2.1. Assume $\lambda=1-\gamma N^{-\alpha}$ for $0<\alpha<0.5$ and $\gamma >0,$ and $b \leq 1+ \frac{\sqrt{\log N}}{{9}}$. If a load-balancing algorithm guarantees $A_1(s)\leq \frac{1}{N^{0.5r}}$ for any $s \in \mathbb S$ such that $\smash{s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}}$, then the following results hold for any integers N and r such that $\smash{N \geq \big(\frac{{4}{\bar k}\log N}{\gamma}\big)^{\frac{1}{0.5-\alpha}}}$ and $1\leq r\leq \frac{\log N}{72(b-1)^2}$ at steady state:

\begin{equation*}{Waiting\ time\ per\ job:}\quad \mathbb{E}\left[W\right]\leq \frac{11b}{{\gamma^r}N^{r(0.5-\alpha)}} ,\end{equation*}
\begin{equation*}{Waiting\ probability:}\quad p_{\mathcal {\mathcal W}}\leq \frac{11}{{\gamma^r}N^{r(0.5-\alpha)}} ,\end{equation*}
\begin{equation*}{Fraction\ of\ busy\ servers:}\quad \lambda - \frac{11}{{\gamma^r}N^{r(0.5-\alpha)}}\leq \mathbb{E}\left[S_1\right]\leq \lambda ,\end{equation*}
(2)\begin{equation}{Number\ of\ buffered\ jobs\ per\ server:}\quad \mathbb{E}\left[\sum_{i=2}^b S_i\right]\leq \frac{11\lambda b + 11}{{\gamma^r}N^{r(0.5-\alpha)}}. \end{equation}

The proof of this corollary is an application of the Markov inequality and Little’s law, and can be found in Section 4. We note that the corollary above requires $A_1(s)\leq \frac{1}{N^{0.5 r}}$ for any $s \in \mathbb S$ such that $s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}$, which is more restrictive than the assumption in the theorem that only requires $A_1(s)\leq \frac{1}{\sqrt{N}\,}$ for the same s. However, it is easy to verify that JSQ, I1F, JIQ, and Pod with $d\geq \frac{r}{\gamma}N^\alpha \log N$ also satisfy this condition. We further remark that the probability of waiting and the expected waiting time are both $O\big(\frac{{b}}{N^{r(0.5-\alpha)}}\big)$. Under the assumption that $b=O\big(\sqrt{\log N}\big)$ for any positive integer r, we can find a sufficiently large N such that r satisfies the condition in the corollary. The significance of this is that it implies that the waiting probability and the mean waiting time decay faster than any polynomial function of $1/N$ in the sub-Halfin–Whitt regime. Furthermore, from (2), we have

\begin{equation*}\mathbb{E}\Bigg[\sum_{i=2}^b NS_i\Bigg]\leq \frac{11\lambda b + 11}{{\gamma^r}N^{r(0.5-\alpha)-1}}.\end{equation*}

Note that $\sum_{i=2}^b NS_i$ is the total number of jobs in the buffers at steady state, so our result shows that, for sufficiently large N, not only is the expected number of buffered jobs per server almost zero, but so also is the total number of buffered jobs in all N servers.

3. Proof of Theorem 2.1

In this section, we present the proof, based on Stein’s method, of our main theorem. As modularized in [Reference Braverman and Dai4], this approach includes three key ingredients: generator approximation, gradient bounds, and state space collapse (SSC).

3.1. Generator approximation

Define $e_i \in \mathbb R^b$ to be a b-dimensional vector such that the ith entry is $1/N$ and all other $b-1$ entries are zero. Furthermore, define $A_i(s)$ to be the probability that an incoming job is routed to a server with at least i jobs when the system is in state s, i.e.

\begin{equation*}A_i(s)=\mathbb{P}(\hbox{an incoming job is routed to a server with at least \textit{i} jobs} \mid {S(t)}=s).\end{equation*}

From this definition, we have $A_0(s)=1$. Given the definition above, the CTMC transits from state s to $s+e_i$ with rate $\lambda N\left(A_{i-1}(s)-A_i(s)\right)$, which occurs when an arrival comes and is routed to a server with $i-1$ jobs, and transits from state s to $s-e_i$ with rate $N(s_i-s_{i+1})$, which occurs when a job leaves a server with i jobs.

Let G be the generator of the CTMC ($S(t)\,{:}\,t\geq 0$). Given a function $f\,{:}\, \mathbb S \to \mathbb R$, we have

\begin{align*}G f(s) = & \sum_{i=1}^{b} \left(\lambda N (A_{i-1}(s)-A_i(s)) (\,f(s + e_i) - f(s)) \right.\nonumber\\[5pt] & + N (s_i - s_{i+1}) (\,f(s - e_i)-f(s))). \end{align*}

For any bounded function $f\,{:}\, \mathbb S \to \mathbb R$ we have $\mathbb{E}[ G f(S) ] = 0$, which can be easily verified by using the global balance equations and the fact that S represents the steady state of the CTMC.

To understand the steady-state performance of a load-balancing algorithm, we will establish moment bounds on the following function:

\begin{equation*}\max\left\{\sum_{i=1}^b S_i-\lambda-\frac{k \log N}{\sqrt{N}},0\right\},\end{equation*}

where $\smash{\bar k-\frac{r}{\sqrt{N}\log N}\leq k \leq \bar k.}$ The moment bounds are used to bound the probability that the total number of jobs in the system $\big(N\sum_{i=1}^b S_i\big)$ exceeds $N\lambda+k\sqrt{N}\log N$ at steady state, and can also be used to bound the probability that an incoming job is routed to an idle server in Corollary 2.1.

We consider a simple fluid system with arrival rate $\lambda$ and departure rate $\lambda+\frac{\log N}{\sqrt{N}}$, i.e.

\begin{equation*}\dot{x}=-\frac{\log N}{\sqrt{N}},\end{equation*}

and a function g(x) which is the solution of the following Stein’s equation [Reference Ying21]:

(3)\begin{align}g'(x) \Bigg(-\frac{\log N}{\sqrt{N}}\Bigg) = \Bigg(\max\Bigg\{x-\lambda-\frac{k \log N}{\sqrt{N}},0\Bigg\}\Bigg)^r \quad \text{for all } x, \end{align}

where $g'(x) = \frac{{\textrm d} g(x)}{{\textrm d} x}$ and r is a positive integer. The left-hand side of (3) is applying the generator of the simple fluid system to the function g(x), i.e.

\begin{equation*}\frac{{\textrm d} g(x)}{{\textrm d} t}=g'(x)\dot{x}=g'(x) \left(-\frac{\log N}{\sqrt{N}}\right).\end{equation*}

It is easy to verify that the solution to (3) is

(4)\begin{equation}g(x)=-\frac{\sqrt{N}}{(r+1)\log N}\left(x-\lambda-\frac{k\log N}{\sqrt{N}}\right)^{r+1} \textbf{1}_{x\geq \lambda+\frac{k\log N}{\sqrt{N}}},\end{equation}

and

(5)\begin{equation}g'(x)=-\frac{\sqrt{N}}{\log N}\left(x-\lambda-\frac{k\log N}{\sqrt{N}}\right)^{r} \textbf{1}_{x\geq \lambda+\frac{k\log N}{\sqrt{N}}}.\end{equation}

We note that the simple fluid system is a one-dimensional system and the stochastic system is b dimensional. In order to couple these two systems, we define

(6)\begin{equation}f(s)=g\left(\sum_{i=1}^b s_i\right),\end{equation}

and use this f(s) in Stein’s method.

Since $\sum_{i=1}^b s_i\leq b$ for $s\in \mathbb{S},$f(s) is bounded for $s\in\mathbb{S}.$ So,

(7)\begin{equation}\mathbb{E}[Gf(S)]=E\left[Gg\left(\sum_{i=1}^b S_i\right)\right]=0.\end{equation}

Now define

\begin{equation*}{h_k}(x)=\max\left\{x-\lambda-\frac{k \log N}{\sqrt{N}},0\right\}.\end{equation*}

Based on (3) and (7), we obtain

(8)\begin{align}\mathbb{E}\Bigg[{h^r_k}\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg] &= \mathbb{E}\Bigg[ g'\Bigg(\sum_{i=1}^b S_i\Bigg) \Bigg(-\frac{\log N}{\sqrt{N}}\Bigg) - Gg\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg].\end{align}

Note that, according to the definitions of f(s) in (6) and $e_j$, we have

\begin{equation*}f(s+e_j)=g\Bigg(\sum_{i=1}^b s_i+\frac{1}{N}\Bigg)\end{equation*}

and

\begin{equation*}f(s-e_j)=g\Bigg(\sum_{i=1}^b s_i-\frac{1}{N}\Bigg)\end{equation*}

for any $1\leq j \leq b.$ Therefore,

\begin{align*}Gg\Bigg(\sum_{i=1}^b s_i\Bigg)&=N\lambda(1-A_b(s))\Bigg(g\Bigg(\sum_{i=1}^b s_i+\frac{1}{N}\Bigg)-g\Bigg(\sum_{i=1}^b s_i\Bigg)\Bigg)\\& \quad + Ns_1\Bigg(g\Bigg(\sum_{i=1}^b s_i-\frac{1}{N}\Bigg)-g\Bigg(\sum_{i=1}^b s_i\Bigg)\Bigg).\end{align*}

Substituting the equation above into (8), we have

(9)\begin{align}& \mathbb{E}\Bigg[h^r_k\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg]\nonumber\\ & \quad= \mathbb{E}\Bigg[g'\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg(-\frac{\log N}{\sqrt{N}}\Bigg)-N\lambda(1-A_b(S))\Bigg(g\Bigg(\sum_{i=1}^b S_i+\frac{1}{N}\Bigg)-g\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg) \nonumber\\ & \qquad\qquad\qquad\qquad\qquad\qquad\quad -NS_1\Bigg(g\Bigg(\sum_{i=1}^b S_i-\frac{1}{N}\Bigg)-g\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg)\Bigg]. \end{align}

Let $\eta = \lambda +\frac{k\log N}{\sqrt{N}}$ to simplify the notation. From the closed forms of g and g' in (4) and (5), note that, for any $x < \eta$, $g(x) = g'\left(x\right)=0$. Also note that when $x>\eta+\frac{1}{N}$,

\begin{equation*}g'(x)=-\frac{\sqrt{N}}{\log N}\left(x-\lambda-\frac{k\log N}{\sqrt{N}}\right)^{r},\end{equation*}

so for $x>\eta+\frac{1}{N},$

\begin{equation*}g''(x)=-\frac{r\sqrt{N}}{\log N}\left(x-\lambda-\frac{k\log N}{\sqrt{N}}\right)^{r-1}.\end{equation*}

By using the mean-value theorem in the region $\big[\eta-\frac{1}{N}, \eta+\frac{1}{N}\big]$ and Taylor’s theorem in the region $\big(\eta+\frac{1}{N}, \infty\big)$, we have

(10)\begin{align}g\Big(x+\frac{1}{N}\Big)-g(x) & = \Big(g\Big(x+\frac{1}{N}\Big)-g(x)\Big) \Big(\textbf{1}_{\eta-\frac{1}{N} \leq x \leq \eta+\frac{1}{N}} + \textbf{1}_{ x > \eta+\frac{1}{N}} \Big) \nonumber\\[5pt] & = \frac{g'(\xi)}{N}\textbf{1}_{\eta-\frac{1}{N} \leq x \leq \eta+\frac{1}{N}} + \left(\frac{g'(x)}{N} + \frac{g''(\zeta)}{2N^2}\right) \textbf{1}_{ x > \eta+\frac{1}{N}} , \end{align}
(11)\begin{align}g\Big(x-\frac{1}{N}\Big)-g(x) & = \Big(g\Big(x-\frac{1}{N}\Big)-g(x)\Big) \Big(\textbf{1}_{\eta-\frac{1}{N} \leq x \leq \eta+\frac{1}{N}} + \textbf{1}_{ x > \eta+\frac{1}{N}} \Big) \nonumber\\[5pt] & = -\frac{g'(\tilde{\xi})}{N}\textbf{1}_{\eta-\frac{1}{N} \leq x \leq \eta+\frac{1}{N}} + \Big(-\frac{g'(x)}{N} + \frac{g''(\tilde{\zeta})}{2N^2}\Big) \textbf{1}_{ x > \eta+\frac{1}{N}} , \end{align}

where $\xi, \zeta \in \big(x,x+\frac{1}{N}\big)$ and $\tilde{\xi}, \tilde{\zeta} \in \big(x-\frac{1}{N},x\big)$. Substituting (10) and (11) into the generator difference in (9), we have

(12)\begin{align}\hspace*{-90pt}\mathbb{E}&\Bigg[h^r_k\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg]\nonumber\\[5pt] & = \mathbb{E}\Bigg[g'\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg(\lambda A_b(S)- \lambda-\frac{\log N}{\sqrt{N}}+S_1\Bigg)\textbf{1}_{\sum_{i=1}^b S_i> \eta +\frac{1}{N}}\Bigg] \hspace*{90pt}\end{align}
(13)\begin{align}&\quad + \mathbb{E}\Bigg[\Bigg(g'\Bigg(\sum_{i=1}^b S_i\Bigg)\Bigg(-\frac{\log N}{\sqrt{N}}\Bigg)-\lambda(1-A_b(S))g'(\xi) + S_1g'(\tilde{\xi})\Bigg)\textbf{1}_{\eta-\frac{1}{N} \leq \sum_{i=1}^b S_i \leq \eta+\frac{1}{N}}\Bigg] \end{align}
(14)\begin{align}& \quad - \mathbb{E}\left[\frac{1}{2N}(\lambda(1-A_b(S))g''(\zeta) + S_1g''(\tilde{\zeta}))\textbf{1}_{\sum_{i=1}^b S_i > \eta+\frac{1}{N}}\right]. \end{align}

Note that in (12) and (14), $\xi,\zeta \in \Big(\sum_{i=1}^b S_i,\sum_{i=1}^b S_i+\frac{1}{N}\Big)$ and $\tilde{\xi},\tilde{\zeta}\in \Big(\sum_{i=1}^b S_i-\frac{1}{N},\break \sum_{i=1}^b S_i\Big)$ are random variables whose values depend on $\sum_{i=1}^b S_i.$ We do not include $\sum_{i=1}^b S_i$ in the notation for simplicity.

Next, we study g' and g'' to bound the terms in (13) and (14), and SSC to bound the term in (12).

3.2. Gradient bounds

We summarize the bounds on g' and g'' in the following two lemmas.

Lemma 3.1. For any $x\in\left[ \lambda +\frac{k\log N}{\sqrt{N}} -\frac{2}{N}, \lambda +\frac{k\log N}{\sqrt{N}} +\frac{2}{N}\right]$, we have

\begin{equation*}|g'(x)|\leq \frac{2^r}{N^{r-0.5}\log N}.\end{equation*}

Proof. For any $x\in\left[ \lambda +\frac{k\log N}{\sqrt{N}} -\frac{2}{N}, \lambda +\frac{k\log N}{\sqrt{N}} +\frac{2}{N}\right]$, from the closed-form expression of g' in (5) we have

\begin{equation*}\hspace*{50pt}|g'(x)| \leq \frac{\sqrt{N}}{\log N}\left|x-\lambda -\frac{k\log N}{\sqrt{N}}\right|^r\leq \frac{\sqrt{N}}{\log N} \left(\frac{2}{N}\right)^r=\frac{2^r}{N^{r-0.5}\log N}. \hspace*{50pt}{\square}\end{equation*}

Lemma 3.2. For $x>\lambda +\frac{k\log N}{\sqrt{N}}$, we have

\begin{align*}|g''(x)|\leq \frac{r\sqrt{N}}{\log N} h^{r-1}_k(x).\end{align*}

Proof. For $x> \lambda +\frac{k\log N}{\sqrt{N}},$ we have

\begin{equation*}g'(x)=\frac{\left(x-\lambda -\frac{k\log N}{\sqrt{N}}\right)^r}{-\frac{\log N}{\sqrt{N}}},\end{equation*}

which implies

\begin{eqnarray*}g''(x)= \frac{r\left(x-\lambda -\frac{k\log N}{\sqrt{N}}\right)^{r-1}}{-\frac{\log N}{\sqrt{N}}} \end{eqnarray*}

and

\begin{equation*}\hspace*{27pt}|g''(x)|=\left|\frac{r\left(x-\lambda -\frac{k\log N}{\sqrt{N}}\right)^{r-1}}{-\frac{\log N}{\sqrt{N}}}\right|\leq\frac{r\sqrt{N}}{\log{N}}\left(\max\left\{x-\lambda -\frac{k\log N}{\sqrt{N}}, 0\right\}\right)^{r-1}. \hspace*{27pt}{\square}\end{equation*}

Based on Lemma 3.1, we bound the term (13):

(15)\begin{align} & \mathbb{E}\Bigg[\Bigg(g'\left(\sum_{i=1}^b S_i\right)\left(-\frac{\log N}{\sqrt{N}}\right)-\lambda(1-A_b(S))g'(\xi) + S_1g'(\tilde{\xi})\Bigg)\textbf{1}_{\eta-\frac{1}{N} \leq \sum_{i=1}^b S_i \leq \eta+\frac{1}{N}}\Bigg]\nonumber\\[5pt] & \qquad\qquad\qquad\leq \left(\lambda+\frac{\log N}{\sqrt{N}}+1\right)\frac{2^r}{N^{r-0.5}\log N} \leq \frac{2^{r+1}}{N^{r-0.5}\log N}, \end{align}

where $\lambda+\frac{\log N}{\sqrt{N}} \leq 1$ in the first inequality according to the assumption $N \geq \left(\frac{4{\bar k}\log N}{\gamma}\right)^{\frac{1}{0.5-\alpha}}$ in Theorem 2.1.

Based on Lemma 3.2, we bound the term (14):

(16)\begin{align}& -\mathbb{E}\left[\frac{1}{2N}(\lambda(1-A_b(S))g''(\zeta) + S_1g''(\tilde{\zeta}))\textbf{1}_{ \sum_{i=1}^b S_i > \eta+\frac{1}{N}}\right]\nonumber\\[5pt] & \qquad\qquad\qquad \leq \mathbb{E}\left[\frac{1}{2N}(\lambda|g''(\zeta)| + S_1|g''(\tilde{\zeta})|)\textbf{1}_{ \sum_{i=1}^b S_i > \eta+\frac{1}{N}}\right] \leq \frac{r \mathbb{E}\left[{h^{r-1}_k}\left(\sum_{i=1}^b S_i +{\frac{1}{N}}\right)\right]}{\sqrt{N}\log N}. \end{align}

3.3. State space collapse

In this section we consider the term in (12):

(17)\begin{align}\mathbb{E}&\left[g'\left(\sum_{i=1}^{b} S_i\right)\left(\lambda A_b(S)-\lambda-\frac{\log N}{\sqrt{N}} +S_1\right) \textbf{1}_{\sum_{i=1}^{b} S_i \gt \eta + \frac{1}{N}}\right]\nonumber\\[8pt] & = \mathbb{E}\Bigg[-\frac{\sqrt{N}}{\log N}\left(\sum_{i=1}^{b} S_i-\lambda-\frac{k\log N}{\sqrt{N}} \right)^r\left(\lambda A_b(S)-\lambda- \frac{\log N}{\sqrt{N}} + S_1\right)\textbf{1}_{\sum_{i=1}^{b} S_i \gt \eta+ \frac{1}{N}}\Bigg] \nonumber\displaybreak\\ & = \mathbb{E}\left[\frac{\sqrt{N}}{\log N}\left(\sum_{i=1}^{b} S_i-\lambda-\frac{k\log N}{\sqrt{N}} \right)^r\left(\lambda+ \frac{\log N}{\sqrt{N}} - S_1-\lambda A_b(S)\right)\textbf{1}_{\sum_{i=1}^{b} S_i \gt \eta + \frac{1}{N}}\right] \nonumber\\[5pt] & \leq \mathbb{E}\left[\frac{\sqrt{N}}{\log N}\left(\sum_{i=1}^{b} S_i-\lambda-\frac{k\log N}{\sqrt{N}} \right)^r\left(\lambda+ \frac{\log N}{\sqrt{N}} - S_1\right)\textbf{1}_{\sum_{i=1}^{b} S_i \gt \eta+\frac{1}{N}}\right] , \end{align}

where the last inequality holds because

\begin{equation*}\left(\sum_{i=1}^{b} s_i-\lambda-\frac{k\log N}{\sqrt{N}}\right)^r\textbf{1}_{\sum_{i=1}^{b} s_i \gt \eta + \frac{1}{N}}\geq 0\end{equation*}

for any $s \in \mathbb S$.

We define a Lyapunov function $V\,{:}\,\mathbb S \to \mathbb R$ to be

(18)\begin{align}V(s)=\min\left\{\sum_{i=2}^{b} s_i, \lambda+\frac{k\log N}{\sqrt{N}} -s_1 \right\}. \end{align}

Lemma 3.3. Under any load-balancing algorithm such that $A_1(s)\leq \frac{1}{\sqrt{N}\,}$ when $s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}$, we have, for $N \geq \left(\frac{{4}{\bar k}\log N}{\gamma}\right)^{\frac{1}{0.5-\alpha}}$, that

\begin{align*}\triangledown V(s)\leq -\frac{1}{2(b-1)}\frac{\log N}{\sqrt{N}}+\frac{2}{\sqrt{N}\,} \end{align*}

for any state $s \in \mathbb S$ such that $V(s)\geq \frac{\log N}{\sqrt{N}}$.

Proof. For the Lyapunov function defined in (18), the Lyapunov drift is

\begin{align*}\triangledown V(s)&=\mathbb{E}\left[GV(S) \mid S=s\right]\\& = \sum_{i=1}^{b}\left(\lambda N (A_{i-1}(s)-A_i(s)) (V(s + e_i) - V(s)) + N (s_i - s_{i+1}) (V(s - e_i)-V(s))\right). \end{align*}

Given $V(s)\geq \frac{\log N}{\sqrt{N}}$, we consider the following two cases.

Case 1: Assume $\sum_{i=2}^{b} s_i\leq \lambda+\frac{k\log N}{\sqrt{N}}-s_1$. Note that

\begin{alignat*}{2}V(s+e_1) & \leq \sum_{i=2}^{b} s_i, & V(s-e_1) & = \sum_{i=2}^{b} s_i,\\V(s+e_j) & \leq \sum_{i=2}^{b} s_i+\frac{1}{N}, \qquad & V(s-e_j) & = \sum_{i=2}^{b} s_i - \frac{1}{N}, \quad \text{for all } 2 \leq j \leq b.\end{alignat*}

Furthermore, $V(s)=\sum_{i=2}^{b} s_i\geq \frac{\log N}{\sqrt{N}}$, which implies $s_2\geq \frac{1}{b-1}\frac{\log N}{\sqrt{N}}$ because $s_2\geq s_3\geq \cdots \geq s_b.$ Therefore, we have

\begin{align*}\triangledown V(s)\leq\lambda(A_1(s)- A_b(s))-s_2\leq - \frac{1}{b-1}\frac{\log N}{\sqrt{N}} + \frac{1}{\sqrt{N}\,},\end{align*}

where the last inequality holds because $\sum_{i=1}^{b} s_i\leq \lambda+\frac{k\log N}{\sqrt{N}}$ implies that $s_1\leq \lambda+\frac{{\bar k}\log N}{\sqrt{N}}$, which further implies that $A_1(s)\leq \frac{1}{\sqrt{N}\,}$.

Case 2: Assume $\sum_{i=2}^{b} s_i>\lambda+\frac{k\log N}{\sqrt{N}}-s_1$. Note that

\begin{alignat*}{2}V(s + e_1) & = \lambda+\frac{k\log N}{\sqrt{N}}-s_1- \frac{1}{N}, &\,\, V(s - e_1) & \leq \lambda+\frac{k\log N}{\sqrt{N}}-s_1+\frac{1}{N}, \\[8pt] V(s + e_j) & = \lambda+\frac{k\log N}{\sqrt{N}}-s_1, & V(s- e_j) & \leq \lambda+\frac{k\log N}{\sqrt{N}}-s_1, \text{ for all } 2 \leq j \leq b.\end{alignat*}

In this case $\sum_{i=2}^b s_i\geq V(s)=\lambda+\frac{k\log N}{\sqrt{N}}-s_1\geq \frac{\log N}{\sqrt{N}}$, which also implies $s_2\geq \frac{1}{b-1}\frac{\log N}{\sqrt{N}}$. Therefore, we have

\begin{align*}\triangledown V(s) & \leq -\lambda(1-A_1(s))+(s_1-s_2)\\[4pt] & = s_1-s_2-\lambda+\lambda A_1(s)\\[4pt] & \leq (k-1)\frac{\log N}{\sqrt{N}}-s_2+\lambda A_1(s)\\[4pt] & \leq \left(k-1-\frac{1}{b-1}\right)\frac{\log N}{\sqrt{N}}+ \frac{1}{\sqrt{N}\,} \\[4pt] & {\leq} -\frac{1}{2(b-1)}\frac{\log N}{\sqrt{N}}+ \frac{1}{\sqrt{N}\,} ,\end{align*}

where the second inequality holds because $s_1\leq \lambda+(k-1)\frac{\log N}{\sqrt{N}}$ and it implies $A_1(s) \leq \frac{1}{\sqrt{N}\,}$, the last inequality holds because $s_2\geq \frac{1}{b-1}\frac{\log N}{\sqrt{N}}$, and the last equality holds because $1+\frac{1}{2(b-1)}-\frac{r}{\sqrt{N}\log N} \leq k \leq 1+\frac{1}{2(b-1)}$.

Before moving forward, we present the following result from [Reference Bertsimas, Gamarnik and Tsitsiklis2]. The following version of the lemma is from [Reference Wang, Maguluri, Srikant and Ying18], but the result was proved in [Reference Bertsimas, Gamarnik and Tsitsiklis2].

Lemma 3.4. Let $(X (t)\,{:}\, t \geq 0)$ be a continuous-time Markov chain over a countable state space $\mathbb X$. Suppose that it is irreducible, nonexplosive, and positive recurrent, and X denotes the steady state of $(X (t)\,{:}\, t \geq 0)$. Consider a Lyapunov function $V\,{:}\, \mathbb X \to \mathbb{R}^{+}$ and define the drift of V at a state $i \in \mathbb X$ as

\begin{equation*}\Delta V(i) = \sum_{i' \in \mathcal X: i' \neq i} q_{i i'} (V(i') - V(i)),\end{equation*}

where $q_{ii'}$ is the transition rate from i to i'. Suppose that the drift satisfies the following conditions:

  1. (i) There exist constants $\gamma > 0$ and $B > 0$ such that $\Delta V (i) \leq -\gamma$ for any $i \in \mathbb X$ with $V(i) > B$.

  2. (ii) $\nu_{\max} := \sup\limits_{i,i'\in \mathbb X: q_{i i'} >0} |V(i') - V(i)|< \infty$.

  3. (iii) $\bar q := \sup\limits_{i \in \mathbb X} (-q_{ii}) < \infty$.

Then, for any non-negative integer j, we have

\begin{equation*}\mathbb{P}\left(V(X) > B + 2 \nu_{\max} j\right) \leq \left(\frac{q_{\max}\nu_{\max}}{q_{\max}\nu_{\max} + \gamma}\right)^{j+1},\end{equation*}

where

\begin{equation*}q_{\max}=\sup\limits_{i \in \mathbb X} \sum_{i' \in \mathbb X: V(i) < V(i')} q_{ii'}.\end{equation*}

Based on the drift analysis in Lemmas 3.3 and 3.4 (Lemma B.1 in [Reference Wang, Maguluri, Srikant and Ying18]), we have the following tail bound on V(S).

Lemma 3.5. Given the Lyapunov function defined in (18) and denoting $\tilde{k} = 1+\frac{1}{4(b-1)}$, we have

\begin{equation*}\mathbb{P}\left(V(S)\geq \frac{\tilde{k}\log N}{\sqrt{N}}\right)\leq \exp\Bigg\{-\frac{\log^2 N}{32(b-1)^2}+\frac{\log N}{{8}(b-1)}\Bigg\}.\end{equation*}

Proof. From Lemma 3.3, we have

\begin{align*}B=\frac{\log N}{\sqrt{N}} \quad \text{and}\quad \gamma=\frac{1}{2(b-1)}\frac{\log N}{\sqrt{N}}-\frac{1}{\sqrt{N}\,},\end{align*}

and it is easy to verify that $q_{\max}\leq N$ and $v_{\max}\leq \frac{1}{N}$.

Based on Lemma 3.4 with $j=\frac{\sqrt{N}\log N}{8(b-1)}$, we have

\begin{align*}\mathbb{P}\left(V(S)\geq \frac{\tilde{k}\log N}{\sqrt{N}}\right) & \leq \left(\frac{1}{1+\frac{1}{2(b-1)}\frac{\log N}{\sqrt{N}}-\frac{1}{\sqrt{N}\,}}\right)^{\frac{\sqrt{N}\log N}{8(b-1)}+1}\\& \leq \left(1 - \frac{1}{4(b-1)}\frac{\log N}{\sqrt{N}}+\frac{1}{2\sqrt{N}\,}\right)^{\frac{\sqrt{N}\log N}{8(b-1)}}\\ & \leq \exp\Bigg\{-\frac{\log^2 N}{32(b-1)^2}+\frac{\log N}{16(b-1)}\Bigg\} , \end{align*}

where the second inequality holds because $\frac{1}{2(b-1)}\frac{\log N}{\sqrt{N}}\leq1 + \frac{1}{\sqrt{N}\,}$ for large N.

Given the SSC result in Lemma 3.5, we now bound (12) (continuing from (17)) by considering two regimes, $V(s)\leq \frac{\tilde{k}\log N}{\sqrt{N}}$ and $V(s) > \frac{\tilde{k}\log N}{\sqrt{N}}$, as follows:

(19)\begin{align}\mathbb{E}&\left[\frac{\sqrt{N}}{\log N}\left(\sum_{i=1}^{b} S_i-\lambda -\frac{k\log N}{\sqrt{N}}\right)^r\left(\lambda+\frac{\log N}{\sqrt{N}}-S_1\right)\textbf{1}_{\sum_{i=1}^{b} S_i> \eta+\frac{1}{N}}\right]\nonumber\\& = \mathbb{E}\Bigg[\frac{\sqrt{N}}{\log N}\left(\sum_{i=1}^{b} S_i-\lambda -\frac{k\log N}{\sqrt{N}}\right)^r\left(\lambda+\frac{\log N}{\sqrt{N}}-S_1\right)\textbf{1}_{V(S)\leq \frac{\tilde{k}\log N}{\sqrt{N}}}\textbf{1}_{\sum_{i=1}^{b} S_i> \eta+\frac{1}{N}}\Bigg]\end{align}
(20)\begin{align}&\,\,\,\quad + \mathbb{E}\Bigg[\frac{\sqrt{N}}{\log N}\left(\sum_{i=1}^{b} S_i-\lambda -\frac{k\log N}{\sqrt{N}}\right)^r\left(\lambda+\frac{\log N}{\sqrt{N}}-S_1\right)\textbf{1}_{V(S)>\frac{\tilde{k}\log N}{\sqrt{N}}}\textbf{1}_{\sum_{i=1}^{b} S_i> \eta+\frac{1}{N}}\Bigg].\quad\end{align}

To bound (19), we consider state s such that $\textbf{1}_{V(s)\leq \frac{\tilde{k}\log N}{\sqrt{N}}}=1$ and $\textbf{1}_{\sum_{i=1}^{b} s_i> \eta+\frac{1}{N}} = 1$, because otherwise (19)$\,=0$. For any state s such that $\smash{\sum_{i=1}^{b} s_i> \eta +\frac{1}{N}= \lambda +\frac{k\log N}{\sqrt{N}}+\frac{1}{N}}$, we have

(21)\begin{align}V(s)=\lambda+\frac{k\log N}{\sqrt{N}}-s_1. \end{align}

Given (21), $V(s) \leq \frac{\tilde{k} \log N}{\sqrt{N}}$ means

\begin{align*}\lambda+\frac{\log N}{\sqrt{N}}-s_1 \leq (\tilde{k} - k + 1 ) \frac{\log N}{\sqrt{N}}\leq \left(1 - \frac{1}{5(b-1)} \right) \frac{\log N}{\sqrt{N}}.\end{align*}

Therefore, we have

(22)\begin{align}(\linkref{disp19}{19}) \leq \left(1 - \frac{1}{5(b-1)} \right)\mathbb{E}\left[\left(\max\left\{\sum_{i=1}^{b} S_i-\lambda -\frac{k\log N}{\sqrt{N}},0\right\}\right)^r\right]. \end{align}

To bound (20), we have

(23)\begin{align}(\linkref{disp20}{20}) & \leq \frac{{b^r}\sqrt{N}}{\log N} \mathbb{E}\left[\textbf{1}_{V(S)>\frac{\tilde{k}\log N}{\sqrt{N}}}\right] \nonumber\\[8pt] & \leq \frac{{b^r}\sqrt{N}}{\log N} \exp\Bigg\{-\frac{\log^2 N}{32(b-1)^2}+\frac{\log N}{16(b-1)}\Bigg\} , \end{align}

where the first inequality holds because $\sum_{i=1}^{b} s_i-\lambda -\frac{k\log N}{\sqrt{N}} \leq \sum_{i=1}^{b} s_i \leq b$ and $\lambda+\frac{\log N}{\sqrt{N}}-s_1 \leq 1$ for large N, and the second inequality holds due to Lemma 3.5.

Based on (22) and (23), we obtain the following upper bound on (12):

(24)\begin{align}& \mathbb{E}\left[g^{\prime}\left(\sum_{i=1}^{b} S_i\right)\left(\lambda B(S)-\lambda-\frac{\log N}{\sqrt{N}}+S_1\right)\textbf{1}_{\sum_{i=1}^{b} S_i \gt \eta}\right]\nonumber\\[2pt] & \qquad\qquad\leq \left(1 - \frac{1}{5(b-1)} \right)\mathbb{E}\left[\left(\max\left\{\sum_{i=1}^{b} S_i-\lambda -\frac{k\log N}{\sqrt{N}},0\right\}\right)^r\right]\nonumber\\[5pt] & \qquad\qquad\qquad+ \frac{{b^r}\sqrt{N}}{\log N} \exp\Bigg\{-\frac{\log^2 N}{32(b-1)^2}+\frac{\log N}{16(b-1)}\Bigg\}.\end{align}

3.4. Higher moment bounds

Given the results (24), (15), and (16), which bound (12), (13), and (14), respectively, we have

(25)\begin{align}\mathbb{E}\left[h^r_k\left(\sum_{i=1}^{b} S_i\right)\right]& \leq \left(1 - \frac{1}{5(b-1)}\right) \mathbb{E}\left[h^r_k\left(\sum_{i=1}^{b} S_i\right)\right] \nonumber \\[5pt] & \quad + \frac{{b^r}\sqrt{N}}{\log N} \exp\Bigg\{-\frac{\log^2 N}{32(b-1)^2}+\frac{\log N}{16(b-1)}\Bigg\}\nonumber\displaybreak\\ & \quad + \frac{2^{r+1}}{N^{r-0.5}\log N} + \frac{r}{\sqrt{N}\log N}\mathbb{E}\left[h^{r-1}_k\left(\sum_{i=1}^b S_i\right)\right] \nonumber\\[5pt] & \leq \left(1 - \frac{1}{5(b-1)}\right) \mathbb{E}\left[h^r_k\left(\sum_{i=1}^{b} S_i\right)\right] + \frac{1+2^{r+1}}{N^{r-0.5}\log N} \end{align}
(26)\begin{align}& \qquad\quad\,\, + \frac{r}{\sqrt{N}\log N}\mathbb{E}\left[h^{r-1}_k\left(\sum_{i=1}^b S_i+{\frac{1}{N}}\right)\right] ,\end{align}

where the second inequality holds because, given $0\leq b \leq 1 + \frac{\sqrt{\log N}}{9}$ and $1\leq r\leq \frac{\log N}{72(b-1)^2}$, we have

\begin{align*}\frac{{b^r}\sqrt{N}}{\log N} \exp\Bigg\{-\frac{\log^2 N}{32(b-1)^2}+\frac{\log N}{16(b-1)}\Bigg\}& \leq \frac{{b^r}\sqrt{N}}{\log N} \exp\Bigg\{-\frac{\log^2 N}{36(b-1)^2}\Bigg\} \\[8pt] \leq \frac{{b^r}\sqrt{N}}{\log N} {\textrm e}^{-2r\log N}& = \frac{{b^r}}{N^{2r-0.5} \log N} \leq \frac{1}{N^{r-0.5}\log N}.\end{align*}

Finally, by moving the first term in (25) to the left-hand side of the inequality and then multiplying by $4(b-1)$ on both sides of the inequality, we obtain

\begin{align*}\mathbb{E}\left[h^{r}_k\left(\sum_{i=1}^b S_i\right)\right]\leq \frac{4(1+2^{r+1})(b-1)}{N^{r-0.5}\log N}+\frac{4r(b-1)}{\sqrt{N}\log N} \mathbb{E}\left[h^{r-1}_k\left(\sum_{i=1}^b S_i + {\frac{1}{N}}\right)\right],\end{align*}

where $\bar k - \frac{r}{\sqrt{N}\log N} \leq k \leq \bar k$.

Define $w_r = \frac{5r(b-1)}{\sqrt{N}\log N}$ and $z_r = \frac{5(1+2^{r+1})(b-1)}{N^{r-0.5}\log N}$. The inequality above can be written as

\begin{align*}\mathbb{E}\left[h^{r}_k\left(\sum_{i=1}^b S_i\right)\right]\leq w_r \mathbb{E}\left[h^{r-1}_k\left(\sum_{i=1}^b S_i+ {\frac{1}{N}}\right)\right] + z_r.\end{align*}

By recursively applying this inequality, we obtain

\begin{align*}\mathbb{E}\left[\left(\max\left\{\sum_{i=1}^{b} S_i-\lambda -\frac{{\bar k}\log N}{\sqrt{N}},0\right\}\right)^r\right] &\leq \prod_{j=1}^r w_j + \sum_{i=1}^{r} z_i \left(\prod_{j=i+1}^r w_j\right),\end{align*}

where we define $\prod_{j=r+1}^r w_j=1$ for notational convenience. We note that $z_i \prod_{j=i+1}^r w_j$ is a decreasing sequence in i for $2 \leq i \leq r$, because

\begin{align*}\frac{z_i \prod_{j=i+1}^r w_j}{z_{i-1} \prod_{j=i}^r w_j} = \frac{z_{i}}{z_{i-1} w_{i}}= \frac{1+2^{i+1}}{1+2^{i}} \frac{1}{4i(b-1)}\frac{\log N}{\sqrt{N}}\leq \frac{1}{2i(b-1)}\frac{\log N}{\sqrt{N}} \leq 1,\end{align*}

and

\begin{equation*}\prod_{j=1}^r w_j \leq z_1 \prod_{j=2}^r w_j\end{equation*}

because $w_1=\frac{5(b-1)}{\sqrt{N}\log N} \leq z_1=\frac{25(b-1)}{\sqrt{N}\log N}$. Therefore,

(27)\begin{align}\mathbb{E}\left[h^{r}_k\left(\sum_{i=1}^{b} S_i\right)\right]& \leq (r+1) z_1 \prod_{j=2}^r w_j \nonumber\\[5pt] & \leq (r+1) z_1(w_r)^{r-1} \end{align}
(28)\begin{align}&\qquad\qquad\qquad\qquad \leq 10 \left(\frac{5r(b-1)}{\sqrt{N}\log N}\right)^r , \end{align}

where the inequality (27) holds because $w_i$ is increasing in i, and (28) holds by substituting $z_1$ and $w_r$. This concludes the proof.

4. Proof of Corollary 2.1

Based on the moment bound in Theorem 2.1, we study the waiting probability $p_{\mathcal W}$ and the waiting time $\mathbb{E}[W]$, $\mathbb{E}[S_1]$, and $\mathbb{E}[\sum_{i=2}^b S_i]$ for JSQ, I1F, and JIQ. The analysis for Pod is similar and will be provided later. We begin with the waiting probability $p_{\mathcal W}$:

(29)\begin{align}p_{\mathcal W}& = \mathbb{P}\left(S_1=1\right)\leq \mathbb{P}\left(\sum_{i=1}^bS_i\geq 1\right) \nonumber\\[5pt] & \leq \mathbb{P}\left(h^r_{\bar k}\left(\sum_{i=1}^bS_i\right)\geq \left(\frac{\gamma}{N^{\alpha}}-\frac{\bar k\log N}{\sqrt{N}}\right)^r\right)\nonumber\\[5pt] & \leq \frac{\mathbb{E}\left [h^r_{\bar k}\left(\sum_{i=1}^bS_i\right)\right]}{\left(\frac{\gamma}{N^{\alpha}}-\frac{\bar k\log N}{\sqrt{N}}\right)^r} \end{align}
(30)\begin{align}\hspace*{-32pt} \leq \frac{\mathbb{E}\left [h^r_{\bar k}\left(\sum_{i=1}^bS_i\right)\right]}{\left(\frac{\gamma}{2N^{\alpha}}\right)^r} \hspace*{32pt}\end{align}
(31)\begin{align}\hspace*{-25pt}\leq 10\left(\frac{10r(b-1)}{{\gamma} N^{0.5-\alpha}\log N}\right)^r \hspace*{25pt}\end{align}
(32)\begin{align}\hspace*{-50pt} \leq \frac{10}{{\gamma^r}N^{r(0.5-\alpha)}} , \hspace*{35pt}\end{align}

where (29) is a result of Markov’s inequality, (30) holds because $N \geq \left(\frac{{4}\bar k\log N}{\gamma}\right)^{\frac{1}{0.5-\alpha}}$ implies $\frac{\bar k\log N}{\sqrt{N}} \leq \frac{\gamma}{2N^{\alpha}}$, (31) holds by substituting (28), and (32) holds because $r\leq \frac{\log N}{72(b-1)^2}$ implies $\log N \geq 10r(b-1).$

From $p_{\mathcal W}$, we can obtain an upper bound on $\mathbb{E}[W]$:

\begin{equation*}\mathbb{E}[W] = \mathbb{E}[W \mid \text{a job routed to busy servers} ] {\times} p_{\mathcal W}\leq b p_{\mathcal W} ,\end{equation*}

where the last inequality holds because the expected waiting time for a job routed to a busy server is at most $b-1$.

Moreover, for jobs that are not discarded, the average queueing delay according to Little’s law is

\begin{equation*}\mathbb{E}[W]=\frac{\mathbb{E}\left[\sum_{i=1}^bS_i\right]}{\lambda(1-p_{\mathcal B})}-1.\end{equation*}

Therefore, we have

\begin{align*}\mathbb{E}\left[\sum_{i=1}^b S_i\right] & = \lambda\left(1-p_{\mathcal B}\right)\left( \mathbb{E}[W]+1\right) \\[5pt] & \leq \lambda \mathbb{E}[W]+\lambda \\[5pt] & \leq \lambda b \cdot p_{\mathcal W} + \lambda \\[5pt] & \leq \frac{10\lambda b}{{\gamma^r} N^{r(0.5-\alpha)}} + \lambda.\end{align*}

Further, according to the work conservation law, we have the following lower bound on $\mathbb{E}[S_1]$:

\begin{equation*}\mathbb{E}[S_1] = \lambda(1-p_{\mathcal B})\geq \lambda(1-p_{\mathcal W})\geq \lambda - \frac{10}{{\gamma^r} N^{r(0.5-\alpha)}} ,\end{equation*}

which yields an upper bound on $\mathbb{E}\left[\sum_{i=2}^b {S}_i\right]$:

\begin{equation*}\mathbb{E}\left[\sum_{i=2}^b {S}_i\right]\leq \frac{10\lambda b + 10}{{\gamma^r} N^{r(0.5-\alpha)}}.\end{equation*}

The analysis for Pod with ${d\geq \frac{r}{\gamma}N^\alpha \log N}$ is similar, except the waiting probability $p_{\mathcal W}$ in the first step becomes

(33)\begin{align}p_{\mathcal W} & = \mathbb{P}\left(\mathcal W \, \left| \, S_1\leq 1-\frac{\gamma}{2N^{\alpha}}\right.\right)\mathbb{P}\left(S_1\leq 1-\frac{\gamma}{2N^{\alpha}}\right)\nonumber\\[5pt] & \quad + \mathbb{P}\left(\mathcal W \, \left| \, S_1> 1-\frac{\gamma}{2N^{\alpha}}\right.\right)\mathbb{P}\left(S_1> 1-\frac{\gamma}{2N^{\alpha}}\right)\nonumber\\[5pt] & \leq \mathbb{P}\left(\mathcal W \, \left| \, S_1\leq 1-\frac{\gamma}{2N^{\alpha}}\right.\right)+\mathbb{P}\left(S_1> 1-\frac{\gamma}{2N^{\alpha}}\right)\nonumber\\[5pt] & \leq {\left(1-\frac{\gamma}{2N^{\alpha}}\right)^{\frac{r}{\gamma}N^\alpha \log N}}+\mathbb{P}\left(\sum_{i=1}^b S_i>1-\frac{\gamma}{2N^{\alpha}}\right)\nonumber\\[5pt] & \leq {N^{-\frac{r}{2}}} + \mathbb{P}\left(h^r_{\bar k}\left(\sum_{i=1}^bS_i\right)\geq \left(\frac{\gamma}{2N^{\alpha}}-\frac{\bar k\log N}{\sqrt{N}}\right)^r\right)\end{align}
(34)\begin{align}\hspace*{-34pt} \leq \frac{1}{N^{0.5r}} + \frac{\mathbb{E}\left [h^r_{\bar k}\left(\sum_{i=1}^bS_i\right)\right]}{\left(\frac{\gamma}{2N^{\alpha}}-\frac{\bar k\log N}{\sqrt{N}}\right)^r} \hspace*{34pt}\end{align}
(35)\begin{align}\hspace*{-30pt}\leq \frac{1}{N^{0.5r}} + \frac{\mathbb{E}\left [h^r_{\bar k}\left(\sum_{i=1}^bS_i\right)\right]}{\left(\frac{\gamma}{4N^{\alpha}}\right)^r}\hspace*{30pt}\end{align}
(36)\begin{align}\hspace*{-23pt}\leq \frac{1}{N^{0.5r}} + 10\left(\frac{20r(b-1)}{\gamma N^{0.5-\alpha}\log N}\right)^r \hspace*{23pt}\end{align}
(37)\begin{align}\hspace*{-120pt}& \leq \frac{1}{N^{0.5r}} + \frac{10}{\gamma^r N^{r(0.5-\alpha)}} \nonumber \\[4pt] & \leq \frac{11}{\gamma^r N^{r(0.5-\alpha)}} ,\hspace*{120pt}\end{align}

where (33) holds because $\big(1-\frac{1}{x}\big)^x\leq \frac{1}{{\textrm e}}$ for $x\geq 1$, (34) is a result of the Markov inequality, (35) holds because $N \geq \left(\frac{{4}\bar k\log N}{\gamma}\right)^{\frac{1}{0.5-\alpha}}$ implies $\frac{\gamma}{4N^{\alpha}}\geq \frac{\bar k\log N}{\sqrt{N}}$, (36) holds by substituting (28), and (37) holds because $r\leq \frac{\log N}{72 (b-1)^2}$ implies $\log N \geq 20r(b-1).$ The remaining analysis to obtain $\mathbb{E}[W]$, $\mathbb{E}[S_1]$, and $\mathbb{E}[\sum_{i=1}^b S_i]$ is the same as the analysis for JSQ.

5. Conclusion and discussion

In this paper we have studied the steady-state performance of a class of load-balancing algorithms for many-server (N servers) systems in the sub-Halfin–Whitt regime ($\alpha < 0.5$). We established an upper bound on the higher moment on a distance function of the queue length with Stein’s method, and studied the probability that an incoming job is routed to a busy server under JSQ, I1F, JIQ, and Pod.

We studied the sub-Halfin–Whitt regime ($\alpha < 0.5$); one interesting extension is to consider a ‘heavier’ traffic regimes where $0.5 \leq \alpha < 1$. In such a regime, the above state space collapse result does not hold. It would require a different fluid model and a different state space collapse analysis, which is an interesting problem to be studied in the future.

Acknowledgements

The authors would like to thank Anton Braverman and Weina Wang for stimulating discussions that led to this result. This work was supported in part by NSF CNS-1824393, ECCS-1547294, ECCS-1609202, ECCS-1739344, and the U.S. Office of Naval Research (ONR Grant No. N00014-15-1-2169).

References

Banerjee, S. and Mukherjee, D. (2018). Join-the-shortest queue diffusion limit in Halfin–Whitt regime: Tail asymptotics and scaling of extrema. arXiv:1803.03306.Google Scholar
Bertsimas, D., Gamarnik, D. and Tsitsiklis, J. N. (2001). Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Prob. 11, 13841428.CrossRefGoogle Scholar
Braverman, A. (2018). Steady-state analysis of the join the shortest queue model in the Halfin–Whitt regime. arXiv:1801.05121.Google Scholar
Braverman, A. and Dai, J. G. (2017). Stein’s method for steady-state diffusion approximations of $m/\mathit{Ph}/n+m$ systems. Ann. Appl. Prob. 27, 550581.10.1214/16-AAP1211CrossRefGoogle Scholar
Braverman, A., Dai, J. G. and Feng, J. (2016). Stein’s method for steady-state diffusion approximations: An introduction through the Erlang-A and Erlang-C models. Stoch. Syst. 6, 301366.10.1287/15-SSY212CrossRefGoogle Scholar
Eschenfeldt, P. and Gamarnik, D. (2018). Join the shortest queue with many servers. The heavy-traffic asymptotics. Math. Operat. Res. 43, 867886.Google Scholar
Gast, N. (2017). Expected values estimated via mean-field approximation are $1/n$-accurate. Proc. ACM Meas. Anal. Comput. Syst. 1, 17:117:26.Google Scholar
Gast, N. and Van Houdt, B. (2018). A refined mean field approximation. In Proc. Ann. ACM SIGMETRICS Conf., Irvine, CA.10.1145/3219617.3219663CrossRefGoogle Scholar
Gupta, V. and Walton, N. (2017). Load balancing in the non-degenerate slowdown regime. arXiv:1707.01969.Google Scholar
Liu, X. and Ying, L. (2018). On achieving zero delay with power-of-d-choices load balancing. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM), Honolulu, Hawaii.10.1109/INFOCOM.2018.8485827CrossRefGoogle Scholar
Lu, Y., Xie, Q., Kliot, G., Geller, A., Larus, J. R. and Greenberg, A. (2011). Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation 68, 10561071.10.1016/j.peva.2011.07.015CrossRefGoogle Scholar
Mitzenmacher, M. (1996). The power of two choices in randomized load balancing. Ph.D. thesis, University of California at Berkeley.Google Scholar
Mukherjee, D., Borst, S. C., van Leeuwaarden, J. S. and Whiting, P. A. (2016). Universality of power-of-d load balancing in many-server systems. arXiv:1612.00723.Google Scholar
Stolyar, A. (2015). Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80, 341361.CrossRefGoogle Scholar
Stolyar, A. (2015). Tightness of stationary distributions of a flexible-server system in the Halfin–Whitt asymptotic regime. Stoch. Syst. 5, 239267.10.1287/14-SSY139CrossRefGoogle Scholar
van der Boor, M., Borst, S. C., van Leeuwaarden, J. S. and Mukherjee, D. (2017). Scalable load balancing in networked systems: Universality properties and stochastic coupling methods. arXiv:1712.08555.Google Scholar
Vvedenskaya, N. D., Dobrushin, R. L. and Karpelevich, F. I. (1996). Queueing system with selection of the shortest of two queues: An asymptotic approach. Problemy Peredachi Informatsii 32, 2034.Google Scholar
Wang, W., Maguluri, S. T., Srikant, R. and Ying, L. (2017). Heavy-traffic delay insensitivity in connection-level models of data transfer with proportionally fair bandwidth sharing. ACM SIGMETRICS Performance 45, 232245.10.1145/3199524.3199565CrossRefGoogle Scholar
Weber, R. R. (1978). On the optimal assignment of customers to parallel servers. J. Appl. Prob. 15, 406413.10.2307/3213411CrossRefGoogle Scholar
Winston, W. (1977). Optimality of the shortest line discipline. J. Appl. Prob. 14, 181189.10.2307/3213271CrossRefGoogle Scholar
Ying, L. (2016). On the approximation error of mean-field models. In Proc. Ann. ACM SIGMETRICS Conf., Antibes Juan-les-Pins, France, pp. 285297.10.1145/2964791.2901463CrossRefGoogle Scholar
Ying, L. (2017). Stein’s method for mean field approximations in light and heavy traffic regimes. Proc. ACM Meas. Anal. Comput. Syst. 1, 12:112:27.Google Scholar
Figure 0

Table 1: Our contributions and related work.

Figure 1

Figure 1. Load balancing in many-server systems.