Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-02-11T06:49:05.132Z Has data issue: false hasContentIssue false

ADMISSION CONTROL WITH INCOMPLETE INFORMATION TO A FINITE BUFFER QUEUE

Published online by Cambridge University Press:  15 December 2006

Dorothée Honhon
Affiliation:
Department of Information, Risk and Operations Management, McCombs School of Business, The University of Texas at Austin, Austin, TX 78712, E-mail: dorothee.honhon@mccombs.utexas.edu
Sridhar Seshadri
Affiliation:
Department of Information, Operations and Management Science, Leonard N. Stern School of Business, New York University, New York, NY 10012
Rights & Permissions [Opens in a new window]

Abstract

We consider the problem of admission control to a multiserver finite buffer queue under partial information. The controller cannot see the queue but is informed immediately if an admitted customer is lost due to buffer overflow. Turning away (i.e., blocking) customers is costly and so is losing an admitted customer. The latter cost is greater than that of blocking. The controller's objective is to minimize the average cost of blocking and rejection per incoming customer. Lin and Ross [11] studied this problem for multiserver loss systems. We extend their work by allowing a finite buffer and the arrival process to be of the renewal type. We propose a control policy based on a novel state aggregation approach that exploits the regenerative structure of the system, performs well, and gives a lower bound on the optimal cost. The control policy is inspired by a simulation technique that reduces the variance of the estimators by not simulating the customer service process. Numerical experiments show that our bound varies with the load offered to the system and is typically within 1% and 10% of the optimal cost. Also, our bound is tight in the important case when the cost of blocking is low compared to the cost of rejection and the load offered to the system is high. The quality of the bound degrades with the degree of state aggregation, but the computational effort is comparatively small. Moreover, the control policies that we obtain perform better compared to a heuristic suggested by Lin and Ross. The state aggregation technique developed in this article can be used more generally to solve problems in which the objective is to control the time to the end of a cycle and the quality of the information available to the controller degrades with the length of the cycle.

Type
Research Article
Copyright
© 2007 Cambridge University Press

1. INTRODUCTION

We consider the problem of admission control to a finite buffer queue under partial information. In this problem, a controller decides whether to admit a customer. The controller cannot see the queue but is informed immediately if an admitted customer is rejected due to buffer overflow. Turning away (i.e., blocking) customers is costly and so is losing an admitted customer due to buffer overflow. The latter cost is greater than that of blocking. The controller's objective is to minimize the average cost of blocking and rejection.

This problem was originally posed in the context of computer and telecommunications networks. Lin and Ross [11] established that the optimal policy is of a threshold type for single-server loss systems (i.e., a queue with no buffer). In this policy, arrivals are blocked for a fixed period of time following a rejection, after which all arrivals are admitted until a new rejection takes place. They showed that the optimal policy does not have a similar structure for multiserver systems but that it can be obtained by solving a dynamic program (DP). Unfortunately, the state space of the DP grows exponentially with the number of servers and the size of the buffer.

We consider an extension suggested in Lin and Ross [11] to the case when the queue has a finite buffer and the arrival process is of the renewal type. It is extremely difficult to solve this problem to optimality. For example, to determine the optimal policy to a reasonable accuracy for a system in which the combined size of the buffer and the number of servers is equal to 10, the number of states in the corresponding DP can exceed 30 million. We therefore propose to solve the DP using a new state aggregation technique. The novelty of our technique is that it yields both a control policy and a lower bound on the average cost. Moreover, the computational effort does not grow with the size of the buffer or the number of servers. We find in our experiments that the bound given by this technique is within 1%–10% of the optimal cost. In many instances, our method improves on the performance of the heuristic proposed by Lin and Ross. The bound is tight when the cost of blocking is low compared to the cost of rejection and the control policy performs especially well when the cost of blocking is high and the offered load is high. Finally, we also provide an efficient simulation technique to compute the average cost per customer without simulating the customer service process.

The key observation we use in this article is that the state space for the controlled Markov chain can be either the set of events since the last rejection or the probability distribution of the number of customers in the system, computed at arrival epochs of customers. The latter, although convenient to numerically determine the optimal control to any given accuracy, grows very quickly with the size of the buffer and the number of servers. The former grows with the number of events since the last rejection. However, if we retain only the information about the last few admitted customers, we can curtail the growth of the state space. If we then reconstruct the events based on this limited information (i.e., reconstruct the entire set of admission events since the last rejection) in a careful manner, taking into account the structure of the stochastic system, we show that we can achieve a lower bound on the minimum average cost per incoming customer, as well as a good control policy. The main contribution of this article is the idea of state reconstruction using limited information as a basis for state aggregation.

The method developed by us can be used to bound the average cost in systems in which the objective is to control the time to the end of a cycle and the quality of the information available to the controller degrades with the length of the cycle. Examples of such applications include equipment maintenance, cash management problems, and inventory management problems. In the case of equipment maintenance, the problem is to minimize the average cost of maintenance and breakdown repair. One might have to schedule preventive repairs on a machine with imperfect information about the state of the equipment. The repairs have the effect of delaying failure and a cost that is lower than that of a breakdown repair. Regeneration occurs when the machine finally breaks down and has to be replaced. The problem is to minimize the average cost of maintenance and breakdown repair. In the cash management setting, one might have to decide to give loans without knowing the exact amount of resources available (i.e., without information about the actual cash inflows from interest and repayment of the principal). By refusing to grant certain loans, the controller can extend the time until the firm runs out of resources. If there are inadequate funds and a loan is sanctioned, the firm incurs a significant cost and has to add assets to the balance sheet. The problem is to determine the loans to sanction until resources run out. Finally, one can apply our solution technique to decide whether to fill an order without knowing the actual number of items in stock. By refusing to take an order, the controller is able to delay the time until a customer whose order was taken experiences shortage. In this case, a fixed cost is incurred to replenish the inventory.

The rest of the article is organized as follows. A review of the relevant literature is given in Section 2. Section 3 presents the model. Section 4 is devoted to analyzing the optimal policy. Sections 5–8 present our main results. Numerical examples are given in Section 9.

2. LITERATURE REVIEW

There is a growing literature on control of queues with delayed or incomplete information on the state of the system. These articles consider either routing or admission (or both) control issues. In this article we focus on admission control.

Altman, Marquez, and Yechiali [4] considered the optimal routing and admission control in a multiserver loss system when the controller has delayed information or no information at all. With delayed information, they compare a “round-robin” policy, where customers are sent to servers before the information about their state becomes available, with a “wait” policy, where customers are first grouped in a finite buffer then sent to a server once information becomes available. They established the existence of a threshold on delays below which the wait policy gives a higher throughput. When the controller has no information at all, they proposed a timer mechanism where the interadmission time is set equal to the maximum of the interarrival time and some random variable and customers are assigned to servers in a cyclic fashion. For a given distribution of the random variable, they show how to choose the parameters of the timer to achieve the maximum throughput. Litvak and Yechiali [13] performed the same type of analysis in a system with limited buffer and the objective of minimizing the average queue length. In their system, the information on service completions is delayed by a random time. They showed that there exists a threshold value of the average information delay below which the wait policy performs better.

The problem of admission control with delayed information in a single-server discrete-time queue has been studied by Altman, Kofman, and Yechiali [2], Altman and Stidham [6], Altman and Nain [5], Altman and Basar [1], as well as Kuri and Kumar [9]. In all these papers the information on queue lengths is delayed by an arbitrary number of periods. Altman, Kofman, and Yechiali [2] focus on computing the distribution of the queue length in steady state. Altman and Stidham [6], Altman and Koole [3], and Altman and Nain [5] prove, in different contexts, that the optimal policy is characterized by a threshold on the queue length. Altman and Koole [3] consider the control of a random walk where the controller has noisy information on the current state of the queue but full information on previous states. They also establish the optimality of a threshold policy in this setting.

Kuri and Kumar [10] studied the control of arrivals to a queue with delayed information. They argued that the standard approach for solving a partially observable controlled Markov chain is to define the “state” of the system to be the conditional probability measure on the space of the underlying unobservable system. For example, in their problem (and in ours), the state is the distribution of the number of customers in the system. They suggested that an alternate and direct formulation is to appropriately define an enlarged state such that it captures all of the relevant information. They gave several references in which such an approach has been successfully used. For example, the pattern of arrivals to the queue along with the delay in information is sufficient to predict the state of the queue in their (and our) model. Therefore, this enlarged state space can serve to develop the optimal control. Their model differed from ours in several ways. They modeled a discrete time system with a single server, whereas we consider multiple servers. They modeled a system with delayed information, whereas we model one with partial information. Their main insight was also different. They show that even though there are many different arrival patterns, the control corresponding to subsets of patterns is the same. They sought to exploit this property; therefore, they explored state-space reduction, not state-space aggregation. Finally, they provided a lower and upper bound on the optimal cost based on a policy that never accepts customers. Our lower bound is based on changing the transition probability in each aggregate state and is obtained by solving a DP. Despite these differences, we believe that their insight that the direct formulation will give valuable results is the key to our results as well.

The work of Lin and Ross described in Section 1 is different from all of the work cited above because it deals with incomplete and not delayed information. In Lin and Ross [12], they extended their work and showed that a threshold policy, which is optimal in the case of a single server with exponential service time distribution, is not necessarily optimal with general service time distributions. In this latter work, they also provided a sufficient condition for the existence of an optimal policy that is of the threshold type.

3. PROBLEM SETTING

We consider a multiserver queue with finite buffer that has s identical servers. The size of the buffer is b, so that the system is full when there are b + s customers. The arrival process of customers to this queuing system is an ordinary renewal process such that interarrival times are independent and identically distributed (i.i.d.) with distribution F and mean 1/λ. The service time distribution is exponential with mean 1/μ.

A controller observes the arrival process. He knows the parameters of the arrival and service processes. However, he cannot observe the number of customers in the system and cannot observe departures from the system. A control is exercised at each arrival instance: The controller has to decide whether to admit or block an arriving customer. The cost of blocking a customer is c < 1 and that of admitting a customer is zero. If the customer is blocked, she leaves the system immediately without being served. If the customer is admitted and does not find the buffer full, she is accepted; that is, she joins the queue or starts service if a server is free. If the customer finds the system full, she is rejected at a cost of 1 and instantly leaves the system. The controller is informed immediately of the rejection. The objective of the controller is to minimize the average cost of blocking and rejection per incoming customer.

We consider systems in which it is not optimal to block all incoming customers and restrict our attention to control policies under which the expected time until a customer is rejected is finite. When the controller is informed of a rejection, he knows that at that instant there are exactly b + s customers in the system. In particular, all servers are busy, and because service times are exponential, their remaining processing times are exponentially distributed with mean 1/μ, independent of one another and of the past. Also, the next arrival takes place after one interarrival time, which is independent of the past as well. Given these observations, the controller's estimate of the distribution of the number of customers in the system does not depend on what happened before the rejection occurred. It follows that the only information the controller needs to consider is the set of events since the last rejection. Moreover, customers that are blocked do not impact the distribution of the number of customers in the system; therefore, the controller only needs to consider the admission epochs since the last rejection and the elapsed time since the last rejection. This information is sufficient to compute the probability distribution of the number of customers in the system, at every arrival epoch.

Therefore, we define a state to be the set of admission epochs since the last rejection along with the elapsed time since the last rejection, or, equivalently, the probability distribution of the number of customers in the system calculated from this information. These alternate definitions for the state space are examined in greater detail in Section 4. For now, let

denote the state space, under either definition.

Let Ψ be the set of control policies adapted to the processes of the number of arriving, blocked, and rejected customers since the last rejection. Let

denote the action space for state

; we have

Because the cost is computed per customer arrival, we define stage k of the DP to be the decision epoch when the kth customer arrives to the system. Let ik and ak respectively denote the state visited and the action taken in stage k. Let g(ik,ak) denote the cost incurred at stage k, in state

, and under action

. It is equal to

where f (ik) is the probability that the buffer is full in state ik.

Let J(π) be the average cost per customer under control policy π ∈ Ψ:

Let δ be the minimum average cost per incoming customer:

Also, for 0 < β < 1, let Jβ(i) be the optimal total discounted cost for initial state

:

Let Vβ(i) be the relative discounted value function in state

, given by

In other words, Vβ(i) is equal to the difference in optimal total discounted cost between two systems: one that starts in state i and one that starts in the state for which the total discounted cost is minimum. This quantity is bounded by the sums of costs in the two systems, each up to the next time a rejection occurs, because from that point onward, the probabilistic evolutions of the two systems are the same.

Using this observation and the assumption that the expected time to a rejection is finite and because per-stage costs are bounded, we obtain that

Therefore, the system satisfies the “boundedness” condition (B) of Schäl [17]. Moreover, since the set of actions available in each state is finite, the system also satisfies condition (S) of Schäl [17]. Hence, by Theorem 3.8 of Schäl, there exists an admissible stationary policy u*, which is average optimal, in the sense that

Therefore, we limit the search for an optimal policy to the set of admissible stationary policies, denoted as Ψs. In what follows, u denotes an admissible stationary policy.

4. TWO REPRESENTATIONS OF THE STATE SPACE

As mentioned in Section 3, there are two possible representations for the state space. We examine each below.

4.1. State Defined as Information Since Last Rejection

Let time 0 correspond to the epoch of the last rejection. Let t denote the time since the last rejection. Let n be the number of admitted customers since the last rejection. Let tj for j = 1,…, n be the admission epoch of the jth customer. A state is represented by the vector (t,n,t1,…, tn). The state space

is given by

The controller uses this vector to compute the probability distribution of the number of customers in the system. We propose the following method for computing the probability distribution. First, from (t,n,t1,…, tn) we compute the (b + s + 2)-dimensional vector of probabilities p(t,n,t1,…, tn) = (px(t,n,t1,…, tn); x = 0,…, b + s + 1), where px(t,n,t1,…, tn) for 0 ≤ xb + s is the probability that the system has x customers at time t given that customers were admitted (but not necessarily accepted) at times t1,…, tn. Also, pb+s+1(t,n,t1,…, tn) is the probability that the system has incurred a rejection since time 0 given that customers were admitted at times t1,…, tn. We refer to vector p as the vector of unconditional probabilities associated with state (t,n,t1,…, tn) because it does not factor in the condition that the arrivals were accepted. There is a nice algebraic method to determine p, which we describe next. This vector will then be used to derive the vector of conditional probabilities r, which factors in the condition that the arrivals were accepted.

Let eb+s be a (b + s + 2)-row vector with 1 in the (b + s)st column and zero otherwise. At the time a customer is rejected, the controller sets the probability distribution of the number of customers in the system equal to eb+s because he knows that there are exactly b + s customers in the system at this time. At each customer arrival epoch, the controller updates the distribution, using matrix M defined below, to account for the fact that customers may have left the system during the inter-arrival time. Finally every time he admits a customer he updates the distribution using matrix Z below. The vector of unconditional probabilities p associated with state (t,n,t1,…, tn) is computed as follows:

for all (t,n,t1,…, tn) ∈

.

In this expression, M(τ) is the (b + s + 2) × (b + s + 2) transition probability matrix such that component Mx,y(τ) is the probability that the number of customers in the system changed from x to y during an interarrival time of τ units. It can be written as

In the case of a single server, it simplifies to

where Dτ is a Poisson random variable with mean μτ.

Z is the (b + s + 2) × (b + s + 2) transition probability matrix that represents the impact on the probability distribution due to the admission of a customer:

If x < b + s, then admitting a customer increases the number of customers in the system by one. If x = b + s, then the admitted customer is rejected.

Let the (b + s + 1)-dimensional vector r(t,n,t1,…, tn) = (rx(t,n,t1,…, tn); x = 0,…, b + s), be the vector of conditional probabilities because it factors in the condition that the arrivals were accepted. rx(t,n,t1,…, tn) is the probability that there are x customers in the system t units of time after the last rejection given that n customers were accepted at epochs t1,…, tn.

From the vector p(t,n,t1,…, tn) the controller obtains the vector r(t,n,t1,…, tn) by dividing the first b + s + 1 components of p by the probability that the system has not incurred a rejection so far; that is, 1 − pb+s+1(t,n,t1,…, tn):

One reason that we introduced the unconditional vector p is for ease of computation of r. Another reason will become apparent in Section 6.

The state space

defined in (1) is infinite because each component of the state vectors is continuous and unbounded and the size of the vector itself is not bounded from above. For this reason, Lin and Ross [11] used the second definition of a state for solving the problem.

4.2. State Defined as Probability Distribution

Let a state be the vector of conditional probabilities r, as defined in (4). The set of all possible values for this vector is continuous and infinite. In order to make this state space discrete (and finite), select a positive integer m, called the discretizing constant, such that the probabilities are rounded off to a multiple of 1/m (see details in Lin and Ross [11, p.648]). Let

be the set of discretized vector of conditional probabilities obtained using the discretizing constant m.

As explained in Section 3, the cost of policy u in state

is given by

where f (r) is the probability that the buffer is full in state (r) and is given by

that is, f (r) is given by the (b + s)th component of the state vector.

Let

be the transition probability, under control u, from state r to state

, where

. The components of the matrix P are functions of f (r) and λ.

Let Jm(u) denote the average cost per customer, under stationary control policy u defined on state space

. Let δm be the minimum average cost per customer when the state space is

:

Let um* be the control that achieves this minimum, such that Jm(um*) = δm. The Bellman equation associated with this discrete-time problem is given by

where Vm(r) is the relative value function in state r.

The minimum average cost δm and optimal control policy um* on state space

can be obtained using value iteration. We assume that as m → ∞, δm tends to δ and um* tends to u*. It follows that for large m, solving the problem with state space

gives a good approximation of u*. However, for a given precision m, the state space becomes very large as either the buffer size b or the number of servers s increases (see Section 1). In particular, the number of states in

is equal to

For this reason we provide an alternative method to obtain a good control policy for which the size of the state space is independent of b and s. Our method consists of aggregating the state space

defined in (1).

5. POLICY BASED ON STATE AGGREGATION

If we use the state space

defined in (1), the controller has to remember all of the admission epochs since the last rejection. We suggest a method in which the controller keeps in memory a finite number of the most recent admission epochs. Formally, let l be the size of the controller's memory (i.e., the maximum number of past admission epochs he remembers). For example, l = 1 means that states are of the type (t,n,tn) when n ≥ 1 and they are called aggregate states with memory size 1. In general for l < ∞, aggregate states with memory size l are of the type (t,n,t(nl+1)∧1,…, tn) for n ≥ 1 (where ab ≡ min(a,b)) and (t,0) otherwise. To simplify the notation, we denote a generic aggregate state with memory size l as (t,n,tnl+1,…, tn) with the understanding that the size of each vector is n + 2. The states described in Section 4.1, (t,n,t1,…, tn), are such that l = ∞ and are referred to as full memory states.

The variables in this section are denoted with a superscript l to refer to the size of the controller's memory. Note that when l = ∞, the variables are equivalent to those defined in Section 4.1. Let

be the state space for memory size l:

Given the information that the system is currently in aggregate state

such that n > l, the controller does not have all of the information necessary to compute the exact unconditional distribution of the number of customers in the system (i.e., p). In particular, he does not remember when the first nl customers were admitted (i.e., t1,…, tnl). However, he can obtain an estimate of p, denoted pl, by taking a weighted average of the vector of unconditional probabilities associated with all possible values of t1,…, tnl:

where p is defined in (2) and

The constants w should be interpreted as conditional probabilities or weights on each full memory state in which the values of tnl+1,…, tn are equal to those in the aggregate space.

We define the probability that the buffer is full in the aggregate state (t,n,tnl+1,…, tn) as

The numerator corresponds to the probability that the system has incurred a rejection when admitting the nth customer since time 0. By dividing this value by the probability that the system has not incurred any rejection before the arrival of the nth customer, we obtain the probability that the buffer is full at time t given that there has been no rejection since time 0.

The set

defined in (6) is infinite. To make it finite, we do two things. First, we discretize the distribution of the interarrival time F in steps of Δ; that is, the probability that the time until the next arrival is k for k = Δ,2Δ,… is given by qk, where

Second, we bound the maximum number of admissions per cycle by T. We define the set of aggregate states

:

To avoid end effects due to the choice of T, we formulate the problem as a discrete-time semi-Markov decision process (SMDP). We describe the formulation briefly. Notice that aggregate states outside of

can occur once time ΔT is exceeded. To deal with this, we assume that the controller admits every arriving customer in aggregate states outside

until a rejection occurs. Let τl(t,n,tnl+1,…, tn) be equal to 1 if the next arrival takes place within ΔT with probability 1 or 1 plus the expected number of transitions until the next rejection otherwise. The cost incurred in state

using control policy u is given by

Also, let

be the probability of going from aggregate state i to

, where

, under control u.

Let δΔ,Tl and uΔ,Tl* respectively denote the minimum average cost per customer and the optimal stationary policy when

is the state space and Pl is the transition probability matrix. The Bellman equation associated with the SMDP is the following (see Ross [14, p.162]):

where Vl(i) is the relative value function in aggregate state

.

The problem can be transformed into an Markov decision process by allowing self-transitions. The corresponding Bellman equation is:

where

is the probability of staying in aggregate state (t,n,tnl+1,…, tn) and is such that the expected number of transitions in aggregate state (t,n,tnl+1,…, tn) is equal to τl(t,n,tnl+1,…, tn).

The minimum average cost δΔ,Tl and optimal control policy uΔ,Tl* on state space

can be obtained using value iteration. Also, as T → ∞ and Δ → 0, δΔ,Tl tends to δl, where

and Jl(u) denotes the average cost per customer under stationary control policy u ∈ Ψs defined on state space

. Also, uΔ,Tl* tends to ul*, which is the control that minimizes the average cost for state space

[i.e., such that Jl(ul*) = δl].

The advantage of using this method is that the size of the state space does not depend on the size of the system (i.e., b and s). The disadvantage is the problem of defining the weights in (7). The conventional method for doing this is described below, but it does not have any guarantee of convergence or any result with regard to the performance of the resulting policy. An alternate method that has both of these properties is suggested in Section 7.

The conventional way to determine the weights is to use the following recursive procedure. First, fix a control uk, where k = 1 at the first step. By simulation, estimate the steady-state probabilities ν(t,n,t1,…, tn)uk for every full memory state (t,n,t1,…, tn). Then compute the weights wk at step k using

Next, solve the DP in (8) to obtain the control uk+1 and repeat this procedure. It is not easy to provide a proof of convergence for this method (see Bertsekas [7, p.44] for a discussion). Also, this method does not use knowledge of the system to determine the weights. Therefore, we propose an alternate method for selecting w that is inspired by the simulation algorithm given in the next section.

6. SIMULATION TO DETERMINE THE AVERAGE COST

In this section we describe an efficient simulation technique for estimating the average cost per incoming customer under policy u, for a given size of the controller's memory l. We refer to the time between two rejections as one cycle. Successive cycles length are i.i.d. because state transitions are Markovian and the control is a stationary policy. Also, cycles are assumed to have finite expected length.

In the algorithm below, we simulate the arrival process; that is, we generate random numbers according to the interarrival time distribution F. Note that we do not simulate the departure process. In other words, we do not determine whether each admitted arrival was accepted or not. However, we can compute the probability that a customer was accepted based on the knowledge of past admission epochs. The simulation of a cycle ends when the probability of no rejection in the cycle falls below a predetermined small value.

We keep track of two quantities in cycle k:

  • A running estimate of the expected cycle length (in number of transitions):
  • A running estimate of the expected number of blocked customers per cycle:

The two quantities are updated at every customer arrival until the simulation of the cycle ends. At this time,

is an estimate of the expected length of cycle k, where the expectation is taken with respect to the departure process, for a given sample path of the arrival process.

Simulation algorithm

For a given stationary policy u ∈ Ψs:

The algorithm requires the specification of K, the number of cycles generated, and ε, the small probability value that determines the end of a cycle.

Lemma 1: For a given control policy u ∈ Ψs, we have

where

and

are obtained using the simulation algorithm.

Proof: Let C1 denote the length of the first cycle (i.e., the number of transitions until a rejection) given that the system was full at time 0. Let G1 denote the total cost incurred in the first cycle:

Because of the assumption that the cycle length has finite expectation and given that per-stage costs are bounded, we get that

. Since the control is a stationary policy, total costs in each cycle are i.i.d. and, therefore, we have

Let Ck and Gk be the length and total cost in cycle k, respectively. Given that

and

, the strong law of large numbers (see Chung [8, Thm.5.4.2]) implies that

We now prove that

For this, simulate one cycle with the simulation algorithm. Let ω denote the sample path of interarrival times generated in this cycle. Let

be the set of states that are reached during the repetitions of Step 1.1 for this cycle, using policy u. Let ij ∈ Φ denote the state reached in the jth transition (i.e., the jth repetition of Step 1.1). State ij is visited if the system does not incur a rejection before the jth transition, which happens with probability 1 − pb+s+1l(ij). It follows that the expected length of the cycle, given sample path ω, is greater than or equal to

It follows that if ε → 0, the expected length of the cycle, given ω, tends to

. Using the strong law of large numbers, we obtain (9).

The expected cost of blocking an arrival in state ij ∈ Φ is given by

which is the cost of blocking multiplied by the probability that ij is visited.

The expected cost of accepting an arrival in state ij ∈ Φ is equal to the probability that the system is full multiplied by the probability of visiting the state; that is,

The expected total cost incurred in the first cycle, given a sample path ω, is greater than or equal to

It follows that if ε → 0, the expected total cost incurred in the first cycle, given ω, tends to

. Finally, using the strong law of large numbers, we obtain (10). █

It follows that for sufficiently high K and small ε, we have

that is, the average cost per incoming customer under policy u can be approximated using the simulation algorithm. Note that this simulation technique has a lower variance of the estimators

and

because we avoid introducing the additional variance due to the service process (see similar arguments in Ross and Seshadri [16] and Ross [15]).

7. DEFINING THE WEIGHTS TO OBTAIN A LOWER BOUND

Based on the proposed simulation method, we suggest choosing the weights w(t,n,t1,..,tn) in (7) in the following way:

where

and p is given by (2). In the event that more than one set of values achieves the minimum, pick one of them randomly.

It follows that the vector of unconditional probabilities associated with aggregate state (t,n,tnl+1,tnl,…, tn) is the vector of unconditional probabilities associated with the full memory state (t,n,t1,…, tnl,tnl+1,…, tn), where t1,…, tnl are chosen such that they minimize the unconditional probability that the system has incurred a rejection since time 0. In other words, we have

and, therefore,

with r defined in (4).

The reason for the choice of (t,n,t1,… tnl,tnl+1,…, tn) is established in the following proposition.

Proposition 1: With w given by (11), we have δl+1 ≥ δl for l > 0.

Proof: By (12), we have, for every (t,n,t1,…, tn) ∈

,

Fix a policy u and use the simulation algorithm of the previous section to estimate Jl+1(u) and Jl(u). We use the subscripts l + 1 and l for the simulation of Jl+1(u) and Jl(u), respectively.

Using the same sample path of arrivals for each simulation, we get from (13) that at any given transition of cycle k,

It follows that

and, therefore,

Now consider control policy ul+1*, which is optimal when the memory size is l + 1. Again, use the simulation algorithm of Section 6 in two different systems: one with memory size l and one with l + 1. We refer to these two cases as systems l and l + 1, respectively. Let tεl+1 be the time at which the simulation of system l + 1 stops. By (14), the simulation of system l has not ended by then. After tεl+1, start admitting all customers in system l. The two simulations result in the same number of blocked and rejected customers. However, (15) implies that the cycle length is longer for system l. By Lemma 1, this implies that the cost of this policy for system l is less than that for system l + 1; that is,

Finally, by the optimality of ul*,

By applying Proposition 1 to the case of full memory (l = ∞), we get the following corollary.

Corollary 1: δl ≤ δ for l < ∞.

We have obtained lower bounds on the optimum average cost δ, which do not require the computation of the optimal control u*. The quality of the lower bound improves as l increases however, so does the complexity of the DP needed to obtain the control u*l. The numerical results of Section 9 show that the policies obtained with weights (11) perform quite well also. The question of how to find the original state (t,n,t1,…, tnl,tnl+1,…, tn) for each aggregate state (t,n,tnl+1,…, tn) is considered in the following section.

8. DETERMINING THE STATE (t, n, t1,…, tnl, tnl+1,…, tn)

In this section we derive properties that can be used to determine t1,…, tnl given (t,n,tnl+1,…, tn) with n > l. We refer to t1,…, tnl as the variable admission epochs (versus tnl+1,…, tn that are fixed). Also, let tk denote the instant just before the admission of the kth customer.

Lemma 2: pb+s+1(t,n,t1,…, tnl,tnl+1,…, tn) is convex in tk for k = 1,…, nl.

Proof: Choose 1 ≤ knl (assume that tk−1 = 0 if k = 1 and tk+1 = tnl+1 if k = nl). From (2), we get that the probability that there is a rejection before time t can be written as

where

In words,

is the vector of unconditional probabilities vector estimated at time tk−1 and Yy,b+s+1 with yb + s is the probability that the system experiences a rejection in the interval [tk+1,t], given there are y in the system at time tk+1. From the definition of Yy,b+s+1, we get

Let x denote the number of customers that are in the system at time tk−1; we refer to them as original customers. Let ix denote the number of original customers that are still in the system at time tk+1. We consider six cases. For each one, we show that the probability of rejection before time t is either independent of tk or is a convex function of tk. Let customer k be the customer that is admitted at time tk.

Case 1: x < s. The probability of a rejection before t is given by

In this expression,

is the probability that there are x customers in the system at time tk−1 and Mx,i(tk+1tk−1) is the probability that out of these x customers, i are still in the system at time tk+1. Because the number of original customers is less than the number of servers, there is at least one free server at time tk so that customer k starts service right away. With probability e−μ(tk+1tk−1), customer k is still being served at time tk+1 and there is a total of i + 1 customers in the system at that time, so that the probability of a rejection after tk+1 is Yi+1,b+s+1. With probability (1 − e−μ(tk+1tk−1)), customer k has left the system and there are i customers at tk+1, so that the probability of a rejection after tk+1 is Yi,b+s+1. By (17), this expression is convex in tk.

Case 2: sx < b + s and six. At time tk+1, some original customers are still in the queue, waiting to be served. It follows that customer k does not start service before time tk+1 and, hence, there are i + 1 customers in the system at time tk+1. By a logic similar to that in Case 1, the probability of a rejection before t is given by

which does not depend on tk.

Case 3: sx < b + s and i < s. At time tk+1, there is at least one free server. Therefore, customer k starts service in the interval [tk,tk+1). Let τ ∈ [tk−1,tk+1) denote the time when the number of original customers becomes s − 1. Customer k starts service at time max{tk,τ} ≡ tk ∨ τ. The probability of a rejection before t in Case 3 is given by

This is a convex function of tk.

Case 4: x = b + s and i = b + s. The system is full at time tk−1 and all of the original customers are still in the system at time tk+1. Therefore, customer k is rejected. The probability of a rejection before t is given by

which does not depend on tk.

Case 5: x = b + s and si < b + s. The system is full at time tk−1 and some original customers are still in the queue at time tk+1. If at least one original customer leaves the system before time tk, then this case is equivalent to Case 2. If none of the original customers leaves the system by time tk, which happens with probability Mb+s,b+s(tktk−1)Mb+s,i(tk+1tk), then customer k is rejected. The probability of a rejection before t is the sum of these:

It can be shown using (3) that this is a convex function of tk.

Case 6: x = b + s and i < s. The system is full at time tk−1 and there is at least one free server at time tk+1. Again, let τ ∈ [tk−1,tk+1) denote the time when the number of original customers becomes s − 1. If at least one original customers left before time tk, then this case is similar to Case 3. However, if no original customer leaves the system before time tk, which happens with probability Mb+s,b+s(tktk−1)Mb+s,s−1(τ − tk)Ms−1,i(tk+1 − τ), then customer k is rejected. The probability of a rejection before t is the sum of these:

It can be shown using (3) that this is a convex function of tk. █

It follows from Lemma 2 that a local optimum in the search for the minimum value of pb+s(t,n,t1,…, tn) can be found by a line search. We also observe from the proof that the optimal value of tk is the result of a trade-off between the risk of losing that customer because the system was full at the time he was admitted and the benefit of serving that customer as soon as possible in order to free the servers for future admissions.

The following corollary establishes that it is optimal to set the most recent bl variable admission epochs (i.e., t(nb)∧1,…, tnl) equal to the earliest admission epoch the controller remembers (i.e., tnl+1).

Corollary 2: The state (t,n,t1,…, tnl,tnl+1,tn) is such that for n ≥ 2,

Proof: The proof is by induction. Let (nb) ∧ 1 ≤ knl and assume that tj = tnl+1 for j = k + 1,…, nl; that is, nlk + 1 customers are admitted at time tnl+1 = tk+1. After that, l − 1 customers are admitted at times tnl+2,…, tn. The probability of a rejection before time t is given by (16). In this expression, Yy,b+s+1 is the probability that the system incurs a rejection in the interval [tk+1,t], given that there are y in the system at time tk+1. We claim that

Since there are nlk + 1 admissions at time tk+1, there is a rejection at this time if and only if there are more than b + s − (nlk + 1) customers at time tk+1, which gives (18). The l − 1 subsequent customers are rejected if they find the system full at the time of their arrival, which can only happen if there are more than b + s − (l − 1) − (nlk + 1) = b + s − (nk) at time tk+1; therefore, we get (19) and (20).

As shown in the proof of Lemma 2, the probability of a rejection before time t can be analyzed in six cases. In Cases 2 and 4, the probability does not depend on tk. We can eliminate Cases 1 and 3 and the first term of Case 6, as i < sb + s − (nk) and (20) implies Yi,b+s+1 = Yi+1,b+s+1 = 0 in these expressions. However, the second term of Case 6 is a function of tk. Summing up all of those terms for i = 0,…, s − 1, we get

In Case 5, the first term does not depend on tk. As for the second term, it is positive only if Yi+1,b+s+1 < 1, which from (19) is true for sib + s − (nkl) − 2. Therefore, only the second term depends on tk if sib + s − (nkl) − 2. Summing up all of those terms, we get

Adding up the two expressions, we get

The first term is the probability that the system remains full in the interval [tk−1,tk]. This probability decreases as tk increases since more customers have time to finish service. The second term is the probability that there is no rejection in the interval [tk+1,t], given that the system has b + s customers at time tk. The second term can be made equal to zero when tk = tk+1 and, therefore, it is optimal to do so. █

The intuition behind this result is the following. There can be a rejection either at time tk or in the interval [tk+1,t]. The former happens if the system is full at time tk and the probability of this happening decreases with tk. The latter happens if there are more than b + s − (nk) customers at tk+1, which is more than the number of servers because knb. The probability of this happening is independent of tk because customer k does not start service in this case and therefore has no chance of leaving the system before time tk+1. It follows that it is optimal to set tk = tk−1 to minimize the probability of a rejection at time tk.

There is a remarkable special case in which we can establish a symmetry property of the variable admission epochs. The result is given in the following lemma.

Lemma 3: When s = 1, the state (t,n,t1,…, tn−1,tn) corresponding to aggregate state (t,n,tn) is such that for nb + 1,

Proof: Let us assume for ease of presentation that nb > 1 (the proof when nb = 1 is similar). By Corollary 2, we have tnb = ··· = tn; that is, (b + 1) customers are accepted at time tn. This will result in a rejection unless the system is empty at time tn. So we wish to determine t1,…, tnb−1 so as to maximize the probability that the system is empty at time tn. Let t0 = 0 denote the epoch of the last rejection.

Let sk denote the number of departures in the interval (tk,tk+1] for k = 0,…, nb − 1. Given that the buffer is full at time 0, the number of departures should satisfy the following conditions in order to have (b + 1) customers at time tn and no rejection in the interval [0,tn]:

The first condition guarantees that the system is not full at each admission epoch. The second condition makes sure that the customers that have been admitted after time 0 have left the system before time tn. The third condition states that all of the customers that were ever in the system in the time interval [0,tn] should have left the system before time tn. This includes the b + 1 customers that were in the system at time 0 plus the nb − 1 customers who arrived in (0,tn).

Now, suppose that we let time run backward so that we start with a full system at time tn, and have customers come at time tnb−1,…, t1. Denote this as the reversed system. Due to the symmetry of (22) and (23), the same conditions would apply for the reversed system to be empty at time 0 without any rejection in the “interval” [tn,0].

It follows that every realization of the forward system in which there is no loss corresponds to a realization of the reversed system in which there is no loss. Similarly, every realization of the reversed system in which there is no loss corresponds to a realization of the forward system where there is no loss. Thus, the sample paths over which there is no loss are in 1–1 correspondence in the two systems.

If the epochs t1,…, tnb−1 maximize the probability that the system is empty in the forward system, then the reversed epochs maximize the same probability in the reversed system. Therefore, we conclude that the arrival epochs t1,…, tnb−1 must be symmetric in [0,tn]. █

9. NUMERICAL RESULTS

In this section we compute the optimal average cost, the lower bounds, and the performance of the policies on the aggregate state space. We use the MDP in (5) to obtain the optimal policy and we use the MDP in (8) with weights defined by (11) to obtain the policies on the aggregate state spaces

with l = 1,2. Average costs are estimated using the simulation algorithm described in Section 6.

We choose the following discretizing parameters: m = 50,Δ = 1,T = 75. Also, we set λ = 0.9 and take interarrival time X to be distributed according to

such that the mean interarrival time is (1 − e−λ)−1.

Table 1 shows the different sets of parameter values we use in each scenario, including ρ = (1 − e−λ)(sμ)−1, the load offered to the system.

Parameters Used in Experiments

For each set of parameters, we vary the cost of blocking c in steps of 0.05 for a range of values within which it is optimal to block and also to admit some customers.

Figure 1 shows that computing the bound with l = 2 (i.e., δ2) significantly improves the bound obtained with l = 1 (i.e., δ1). δ1 is, on average, 11.14% lower than the optimal average cost, whereas δ2 is, on average, 8.03% lower. The quality of the two bounds increases as c decreases. In contrast, Figure 2 shows that the policies u1* and u2* improve as c increases. They both perform better than the Reject–Gapping policy of Lin and Ross (depicted as RG) with an average optimality gap of 2.30% and 1.72%, respectively, for l = 1 and l = 2 versus 3.15% for the Reject–Gapping policy.

Scenario 1: bounds.

Scenario 1: performance of policies.

Figure 3 shows that the performance of the policy with l = 1 and the quality of the bound improve as the offered load increases. The bound is, on average, 1.1% lower than the optimal policy and the policy u1* performs, on average, only 1.01% worse than the optimal policy, as opposed to 4.66% for the Reject–Gapping policy.

Scenario 2: performance and bound.

In scenario 3, we cannot obtain the optimal policy because of the large state space, however, we can compute δ1 and δ2. Figure 4 shows that the difference between these bounds increases with c and is, on average, equal to 2.01%. Figure 5 shows that the performance of the policies are very similar. However, our policies tend to perform better than the RG policy for small values of c. The average percentage difference between the cost given by our policy, J(u2*), and δ2 is 5.03%, so that even without knowing the optimal policy, we have an estimate of the optimality gap.

Scenario 3: bounds.

Scenario 3: performance of policies.

A good feature of our lower bound is that is is tight when the cost of blocking is low, because, in this case, the optimality gap is likely to be larger given that it is optimal for the controller to frequently block arriving customers. In contrast, when c is high, all policies tend to converge quickly to one that admits most customers and, therefore, the case when c is high is not as interesting.

10. CONCLUSION

We have provided a method for finding a control policy based on aggregation of the state space by discarding some information about the admissions to the system. By remembering only a finite number l of the most recent admission epochs along with the time and number of customers admitted since the last rejection, the controller can construct a control policy on a state space that does not grow with either the size of the buffer or the number of servers.

For each aggregate state, we set the unconditional probability of a rejection equal to the minimum of the rejection probability over all of the full memory states with the same values of the l most recent admission epochs. The full memory state that realizes this minimum is given a weight of one, whereas every other state is given a weight of zero. Our approach uses knowledge about the system to define and efficiently identify this state. It is interesting to note that the weights we obtain differ from the conditional probabilities induced by the control. The conventional approach of state aggregation uses these conditional probabilities but has no guarantee of convergence. In contrast, our method for defining the weights allows us to obtain a lower bound on the performance of the optimal policy. This is particularly useful when the buffer size and the number of servers are large and computing the optimal policy is very computationally intensive. We show numerically that the bound is tight in the important case when the cost of blocking is low and the load offered to the system is high. Moreover, the policies that we obtain perform generally better than the Reject–Gapping policy of Lin and Ross [11] and their performance naturally improves with the number of admission epochs the controller remembers. Finally, they perform best when the cost of blocking is high and the offered load increases. An opportunity for future research is to determine a technique for policy improvement that at the same time maintains the lower bound.

Acknowledgments

The authors wish to thank participants in the seminar at New York University and the 2004 Informs meetings in Denver for their comments and suggestions.

References

REFERENCES

Altman, E. & Basar, T. (1995). Optimal rate control for high speed telecommunication networks: The case of delayed information. In First Workshop on ATM Traffic Management, WATM'95, IFIP, WG.6.2 Broadband Communication, Paris.
Altman, E., Kofman, D., & Yechiali, U. (1995). Discrete time queues with delayed information. Queueing Systems 19: 361376.Google Scholar
Altman, E. & Koole, G. (1995). Control of a random walk with noisy delayed information. Systems and Control Letters 24: 207213.Google Scholar
Altman, E., Marquez, R., & Yechiali, U. (2003). Admission and routing control with partial information and limited buffers. International Journal of Systems Science 34(10–11): 615626.Google Scholar
Altman, E. & Nain, P. (1995). Closed-loop control with delayed information. In Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pp. 193204.
Altman, E. & Stidham, S. (1995). Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. QUESTA 21: 267291.Google Scholar
Bertsekas, D. (1995). Dynamic programming and optimal control, Vol. 2. Belmont, MA: Athena Scientific.
Chung, K. (1974). A course in probability theory, 2nd ed. New York: Academic Press.
Kuri, J. & Kumar, A. (1995). Optimal control of arrivals to queues with delayed queue length information. IEEE Transactions on Automatic Control 40(8): 14441450.Google Scholar
Kuri, J. & Kumar, A. (1997). On the optimal control of arrivals to a single queue with arbitrary feedback delay. Queueing Systems 27(1–2): 116.Google Scholar
Lin, K. & Ross, S. (2003). Admission control with incomplete information of a queueing system. Operations Research 51(4): 645654.Google Scholar
Lin, K. & Ross, S. (2004). Optimal admission control for a single-server loss queue. Journal of Applied Probability 41: 535546.Google Scholar
Litvak, N. & Yiechiali, U. (2003). Routing in queues with delayed information. Queueing Systems 43: 147165.Google Scholar
Ross, S. (1970). Applied probability models with optimization applications. New York: Dover.
Ross, S. (1997). Simulation, 2nd ed. New York: Academic Press.
Ross, S. & Seshadri, S. (2002). Hitting times in an Erlang loss system. Probability in the Engineering and Informational Sciences 16(2): 167184.Google Scholar
Schäl, M. (1993). Average optimality in dynamic programming with general state space. Mathematics of Operations Research 18(1): 163172.Google Scholar
Figure 0

Parameters Used in Experiments

Figure 1

Scenario 1: bounds.

Figure 2

Scenario 1: performance of policies.

Figure 3

Scenario 2: performance and bound.

Figure 4

Scenario 3: bounds.

Figure 5

Scenario 3: performance of policies.