Published online by Cambridge University Press: 01 January 2005
The problem of controlling transmission rate over a randomly varying channel is cast as a Markov decision process wherein the channel is modeled as a Markov chain. The objective is to minimize a cost that penalizes both buffer occupancy (equivalently, delay) and power. The nature of the optimal policy is characterized using techniques adapted from classical inventory control.
In a recent article, Goyal, Kumar, and Sharma [4] considered the problem of optimal transmission over a fading channel to minimize mean delays subject to a power constraint. By casting the problem as a constrained Markov decision problem, they obtained structural results for the optimal policy, improving upon the earlier results of Berry and Gallager [1]. They consider an additive Gaussian noise model for the channel and explicitly use the Shannon formula for its information-theoretic capacity in their analysis. Our aim here is to view the problem in a more abstract framework, with the effect of the randomly varying channel on the power required being modeled as a Markov chain. Our key observation is the similarity of this problem with classical inventory control [2,3]. Based on this observation, we are able to recover some structural results for the optimal policy through a fairly simple analysis.
The precise model is described. The finite horizon control problem is analyzed in the next section and the infinite horizon discounted problem is analyzed in Section 3.
Consider buffered packet traffic with the evolution of buffer occupancy denoted by
Here, uk denotes the number of arriving packets and wk ∈ [0,K], K > 0, is the number of packets transmitted in time slot k, with the constraint that wk ≤ xk ∀k. (Note that we treat xk and wk as continuous valued variables, a fairly common approximation.) Under the reasonable condition that x0 ≥ 0, this then ensures that {xk} is an
-valued process. The arrivals {uk} are assumed to be independent and identically distributed (i.i.d.) with law ζ supported on [0,M] for a large finite
1ζ supported on
can be allowed under mild technical conditions.
The cost at time k will be
Here, h(·) is a continuously differentiable convex increasing function and {c(k)} is an ergodic Markov chain on [0,C] for some C > 0. The first term penalizes high buffer occupancy and, therefore, mean delay. As an example, consider h(·), which takes small values for 0 ≤ x ≤ B − δ and rapidly increases thereafter, where B > 0 is the intended buffer capacity and δ > 0 is a small number. This is a “soft” constraint on buffer capacity that approximates the “hard” constraint
The second term in the cost is proportional to the number of packets transmitted in the kth time slot and captures the power cost. The random coefficient c(k) captures the dependence of this cost on the randomly varying channel conditions. Let p(x,dy) denote the transition kernel of the Markov chain {c(k)}.
The third term in (2) is the negative of the “reward” for successful transmission, with
. The latter requirement ensures that at least for sufficiently high buffer occupancy, there will be incentive to transmit. We will simplify (2) by combining the second and the third terms and replace (2) by
with c(k) now taking values in
, instead of [0,C], and p(x,dy) is redefined accordingly. Note that
.
Let α ∈ (0,1] denote the “discount factor” and N ≥ 1. Our objectives will be to minimize the finite horizon cost,
for j = 0 or the infinite horizon discounted cost (for α ∈ (0,1)),
over the admissible controls {wk}. We correspondingly define the “value functions”
where the infima are over admissible controls.
Consider the cost (3). For this cost, the dynamic programming equation is
Let y [trie ] x − w. Then the above can be rewritten as
We claim that the Jk(·,c)'s are convex for k < N, a fact that will be proved later. Assuming this, we now derive the structural properties of the optimal policy. Let Sk = Sk(c) be the least element of the connected set of unconstrained minimizers of
over y ∈ [−∞,∞] , with +∞ and −∞ being admissible values. Since 0 ∨ (x − K) ≤ y ≤ x, we have the following:
for Sk ≥ 0,
for Sk < 0,
An optimal policy can therefore be determined by the sequence of extended real-valued numbers (S0,S1,…,SN−1) and is of the form:
for Sk ≥ 0,
for Sk < 0,
We now establish the convexity claim.
Lemma 1: Jk(·,c), k < N, are convex.
Proof: The proof is by backward induction. Set JN(·,·) ≡ 0 in what follows. Consider the policy at k = N − 1. Note that SN−1 = −∞; thus,
and
Thus, since c < 0, we have that JN−1(·,c) is convex. Now, suppose that Jk+1(·,c) is convex. Then we have the following:
for Sk ≥ 0,
for Sk < 0,
where Sk = Sk(c) minimizes
over y ∈ [−∞,∞]. Jk(·,c) is clearly continuous and separately convex on [0,Sk], [Sk,Sk + K], and [Sk + K,∞) when Sk ≥ 0, and on [0,K] and [K,∞) when Sk < 0. Taking the left derivative of the above expression at its minimum Sk and using the property of its slope at Sk,
where Ji−(·,c) denotes the left derivative of Ji(·,c), 1 ≤ i ≤ N. Thus,
Similarly, taking the right derivative at Sk, we get
where Ji+(·,c) denotes the right derivative of Ji(·,c), 1 ≤ i ≤ N. This leads to
For Sk < 0, we can consider the right derivative at y = 0 to conclude
implying
From the expression for Jk(x,c), we get the following:
for Sk ≥ 0,
for Sk < 0,
Recall that h is an increasing convex function. Hence, from (7) and (8) and the above expressions, we have, for Sk ≥ 0,
and
Similarly, for Sk < 0, using (9),
This implies that Jk(·) is convex, which completes the induction step. █
We summarize the results in the following theorem.
Theorem 1: The optimal policy is given by {μk*(·,·)}.
This leaves the issue of the nature of dependence of Sk(c) on c. Unfortunately, not much can be said in general, but we make a few comments here that may help give some handle on this in specific cases. Note that Sk(c) was the global minimizer of a function of the form −cy + g(c,y) over y ∈ [−∞,∞] . For simplicity, suppose that it is unique and finite for each c. Consider the cross-derivative
assumed to exist and be continuous. If this is greater than 0 (resp., less than 0), it implies “increasing differences” (resp., “decreasing differences”) in the sense of [6, Chap.10]. Note that
for
. Using this and by mimicking the arguments of Theorem 10.7 of [6, p.259], one then has that Sk(c) is increasing (resp., decreasing) in c in a neighborhood of a point where the right-hand side of (13) is < 0 (resp., > 0).
We now analyze the infinite horizon discounted cost problem by treating it as a limiting case of the above as N → ∞. We will assume the following:
(*) There is at least one control policy under which the cost is finite for any initial condition.
As an example, consider the case when K > ∫uζ(du) (i.e., the maximum permissible transmission rate exceeds the mean arrival rate). Then, it is easy to verify that the foregoing will hold for the control choice wk = x ∧ K, k ≥ 0. Under (*), one has [5] the dynamic programming equation
In addition, we assume the following:
(**) The kernel p(x,dy) is strong Feller; that is, for any bounded measurable
, ∫f (y)p(x,dy) is continuous in x.
We will also assume that h(·) is Lipschitz continuous with Lipschitz constant L > 0. (This could be relaxed at the expense of greatly enhanced technicalities.) Let JN [trie ] J0,JiN [trie ] Ji, so as to make the dependence on the time horizon N explicit. Define also
for
. Then, {JjN} satisfies the dynamic programming equation
Clearly,
are monotone increasing. Under (*), they are also pointwise bounded. Thus,
is well defined.
Lemma 2: {JiN(·,c),JiN(·,c)} satisfies the Lipschitz condition with a common Lipschitz constant
.
Proof: We will prove that for each i, JiN(·,c) and JiN(·,c) satisfy the Lipschitz condition with a common Lipschitz constant
, which, in turn, implies the lemma. This is trivially true for i = N. The claim for JiN,i < N, follows by a simple backward induction, using the fact that
where the last inequality follows from the induction hypothesis. The claim for {JiN} is immediate from (16). █
Corollary 1: The convergence
is uniform on compacts. Furthermore,
satisfies
Proof: As an increasing limit of continuous functions,
is at least lower semicontinuous. (In fact, it is convex, being the pointwise limit of convex functions and hence continuous on the interior of its domain.) Lemma 2 implies that
will be Lipschitz with a common Lipschitz constant, as earlier. In particular, they are equicontinuous and, hence, for each fixed c, the convergence
is uniform on compacts. The monotone convergence theorem leads to
where the convergence in the x and w variables is uniform on compacts for fixed c in view of the foregoing remarks. In particular, this implies that the corresponding minima over w ∈ [0,x ∧ K] converge. Thus, we can let N → ∞ in (5) to conclude that
satisfies (17). By the uniform Lipschitz continuity of
and (**), the expression in square brackets on the right-hand side of (17) is uniformly continuous w.r.t. (x,c) on compacts. Thus, the claim follows from Dini's theorem. █
Corollary 2:
. It is convex in x, uniformly continuous in (x,c), and the least
-valued continuous solution to (14).
Proof: Uniform continuity of
and the fact that it satisfies (17) are proved in Corollary 1. Furthermore, Corollary 1 also implies that
Then, J is a uniformly continuous solution to (14), with J = limN→∞ JN, the limit being uniform on compacts. Since the JN's are convex, so is J. Suppose
satisfies (14). Then,
satisfies (17). Iterating (17) N times and using the nonnegativity of J′, we get J′ ≥ J0N. Letting N → ∞, we have
; hence, J* ≥ J. This completes the proof. █
Let S = S(c) be the least element of the connected set of unconstrained minimizers of
over [−∞,∞] . One argues as in the preceding section to conclude the following theorem.
Theorem 2: The optimal policy is given by
Regarding the dependence on c, remarks analogous to those at the end of the preceding section apply.
This work was done while one of the authors (G. R.) was visiting the School of Technology and Computer Science, Tata Institute of Fundamental Research during the summer of 2003.
Research by one of the authors (V. S. B.) was supported in part by grant 2900-IT-1 from the Centre Franco-Indien pour la Promotion de la Recherche Avancee (CEFIPRA).