Published online by Cambridge University Press: 01 January 2005
Consider a single-commodity inventory system in which the demand is modeled by a sequence of independent and identically distributed random variables that can take negative values. Such problems have been studied in the literature under the name cash management and relate to the variations of the on-hand cash balances of financial institutions. The possibility of a negative demand also models product returns in inventory systems. This article studies a model in which, in addition to standard ordering and scrapping decisions seen in the cash management models, the decision-maker can borrow and store some inventory for one period of time. For problems with back orders, zero setup costs, and linear ordering, scrapping, borrowing, and storage costs, we show that an optimal policy has a simple four-threshold structure. These thresholds, in a nondecreasing order, are order-up-to, borrow-up-to, store-down-to, and scrap-down-to levels; that is, if the inventory position is too low, an optimal policy is to order up to a certain level and then borrow up to a higher level. Analogously, if the inventory position is too high, the optimal decision is to reduce the inventory to a certain point, after which one should store some of the inventory down to a lower threshold. This structure holds for the finite and infinite horizon discounted expected cost criteria and for the average cost per unit time criterion. We also provide sufficient conditions when the borrowing and storage options should not be used. In order to prove our results for average costs per unit time, we establish sufficient conditions when the optimality equations hold for a Markov decision process with an uncountable state space, noncompact action sets, and unbounded costs.
Consider a single-commodity inventory system for which we do not assume that the demand is nonnegative. Such problems have been studied in the literature under the name cash management and relate to the variations of the on-hand cash flows of financial institutions. For the cash management problem without fixed ordering costs, an optimal policy is defined by two thresholds; if the inventory level is above the higher threshold, it should be reduced to this threshold, and, similarly, if the inventory level is below the lower threshold, it should be increased up to this threshold (see Eppen and Fama [16] or Heyman and Sobel [27, Sect.8.4]). This is an extension of S-policies (also called “order-up-to” or “basestock” policies) for models with nonnegative demand.
In cash management problems, ordering and scrapping the inventory is associated with financial transactions. In reality, a number of financial instruments are available to a company. In particular, a firm may be able to borrow money for a short period of time. A natural question to investigate is how should short-term and long-term transactions be linked. A similar dilemma also takes place for production/inventory operations; is it better to buy or lease even when leasing is more expensive per unit time? In fact, in inventory systems with product returns, a company can benefit from temporary increases to its inventory (leasing) for a fixed period of time since, in view of possible returns, after this period, ordering may not be needed. In this article, we model short-term transactions by one-step borrowing and storage decisions.
We consider the problem when the manager can borrow or store the inventory at any period of time. The borrowed inventory should be returned at the beginning of the next period and the stored inventory is automatically added to the main inventory in the beginning of the next period. We consider the model with back orders. For example, it is possible that the borrowing option is executed but the inventory level becomes negative in the beginning of the next period. In this case, the borrowed amount is returned anyway and the back order increases. The manager, though, has an option to borrow again. In the model considered in this article, the purchasing, scrapping, borrowing, and storing costs are linear. There are no fixed costs associated with ordering for any of these operations. All capacities are unconstrained and all lead times are zero.
We show that when the option to borrow or store for one period is available to the decision-maker, an optimal policy is defined by four thresholds. If the inventory position is too high, the optimal decision is to reduce the inventory via scrapping, but only to a certain point, after which one should also store some of the commodity to meet future demand. Analogously, if the inventory position is too low, the decision-maker should order up to a certain level and then borrow from the secondary source to meet potential demand. We also provide sufficient conditions when optimal policies do not use the borrowing and storage options; that is, the production threshold is equal to the borrowing threshold and the similar equality holds for scrapping and storage. In particular, we show that this holds if borrowing is more expensive than backordering in problems with linear holding and backordering costs or if the demand is nonnegative and borrowing is not less expensive than ordering.
The study of inventory control dates back at least to the work of Arrow, Harris, and Marshcak [2] and some would say (in the deterministic case) to the work of Harris [23]. With this in mind, we will not attempt a complete review here except to point the reader to the excellent survey article of Porteus [32] that touches many of the high points in the area until 1990. Cash management has also received a considerable amount of attention, although much less in the operations research literature than the inventory models with nonnegative demand. Eppen and Fama [16] considered a model with independent and identically distributed (i.i.d.) discrete demands with finite support and showed the existence of order-up-to and down-to levels in the finite horizon case for models without setup costs. Girgis [21] and Neave [30] considered both fixed and variable costs for each transaction. The former shows that when there are fixed costs for increasing or decreasing demand (but not both), an optimal policy analogous to (s,S)-policies in inventory control results. One important difference is that between the order-up-to level (S) and the lower limit (s), where no action is usually taken, the optimal policy in the cash management model may order. Furthermore, it is not immediately obvious what the ordering decision in this case might be. The latter article shows that when both transactions have fixed costs, this analogy need not hold in both directions, and it provides conditions under which it does hold. All of these results were collected and simplified in [15]. Other generalizations of the cash management problem appeared in [12,28]. Other models of product returns were studied in [20,26,41]. The most recent results on the cash management problem with fixed costs can be found in Chen and Simchi-Levi [9].
Early models that allow for demand to be met by emergency orders, possibly at the expense of higher costs, include those of Daniel [13], Neuts [31], and Barankin [4]. Aneji and Noori [1] discussed a problem in which unmet demand may be met by a secondary source and showed that the ordering policy is an (s,S)-policy. Tagaras and Vlachos [40] discussed a periodic review system with the possibility of emergency replenishments with various lead times. Recently, Huggins and Olsen [29] showed that when overtime production is available, an (s,S)-policy is still optimal for regular production and various policies are optimal for overtime. Other related models include those of Chiang and Gutierrez [10,11] and Arslan, Ayhan, and Olsen [3].
Section 2 of this article provides a formal description of the problem and the optimality criteria considered. Section 3 discusses optimal order-up-to and down-to policies in the finite horizon case and provides conditions under which we need not consider the borrowing or storage options. This is continued in Sections 4 and 5 for the infinite horizon discounted and average cost cases, respectively. Section 6 proves the existence of a solution to the average cost optimality equations (ACOEs) for our problem. Since our search of the literature did not yield sufficient conditions for the validity of the ACOEs for our problem, we provide sufficient conditions for a more general Markov decision process (MDP) and show that they hold in our case. We conclude by discussing avenues of future research in Section 7. For the remainder of this section, we explain what sufficient conditions are available in the literature and what is required for our problem.
For MDPs with Borel state spaces and bounded rewards, the ACOEs were introduced by Ross [35] and studied by Gubenko and Statland [22], Dynkin and Yushkevich [14], and Fernández-Gaucherand [18]. For problems with unbounded rewards, according to Schäl [37, Prop. 1.3] a stationary policy that satisfies the average cost optimality inequalities is optimal. Of course, if the optimality equations hold, a stationary policy that satisfies them is optimal as well. Schäl [37] described two groups of conditions, (W) and (S), and proved that each of these conditions imply the validity of the optimality inequalities. Conditions (W) require weak continuity of transition probabilities. Conditions (S) require setwise continuity of transition probabilities; discussed further later. We remark that if the action sets are finite, as was assumed in Ross [35] and Ritt and Sennott [34], then the setwise convergence conditions (S) from Schäl [37] hold and no specific continuity assumption is needed on transition probabilities.
An important feature of many inventory control models considered in the literature is that the control sets may not be compact and are, in fact, assumed to be unbounded. However, Schäl [37] assumed that the action sets are compact. This assumption prevents direct applications of the results from [37] to classic problems with unlimited ordering/scrapping capacities.
We remark that for inventory control problems, conditions (W) are natural, whereas conditions (S) are too strong. Consider a typical inventory control equation
where xn is the inventory at the end of period n, an is the decision on how much should be ordered, and Dn is the demand during period n. Let q(dy|x,a) be the transition probability for the control problem (1.1); that is, q(B|x,a) represents the probability that a subset B of the state space is visited at the next step, given that action a is chosen in state x. Weak continuity of q in Schäl's [37] condition (W) means that Exnk,ank f (xn+1) → Exn,an f (xn+1) for any sequence {(xnk,ank),k ≥ 0} of state–action pairs such that (xnk,ank) → (xn,an) for each bounded, continuous function f. This is true in light of (1.1) and Lebesgue's dominated convergence theorem. On the other hand, recall that setwise continuity in Schäl's [37] condition (S) means that q(B|xn,ank) → q(B|xn,an) as ank → an for any Borel subset B of the state space. Suppose that the demand is deterministic, Dn = 1, and ank = an + (1/k), xnk = xn. Then, q(B|xn,an) = 1 for B = (−∞,xn + an − 1] and q(B|xn,ank) = 0 for all k = 1,2,… .
As discussed earlier, Schäl [37] established the optimality inequalities under both weak and setwise continuity conditions but assumed the compactness of action sets. Hernández-Lerma and Lasserre [25, Chap.5], and Fernández-Gaucherand, Arapostathis, and Marcus [19] presented results for noncompact action sets but assumed setwise convergence. Moreover, Section 5.7 in Hernández-Lerma and Lasserre [25] provides conditions for the existence of stationary optimal policies for an MDP with weakly continuous transition probabilities, but the derivation is done directly—without deriving the optimality equations or inequalities. In this article, we need not only the existence of optimal policies but also the validity of the ACOEs. Hartley [24] established the validity of the ACOEs for the class of problems whose dynamics are described by equations that include (1.1). However, he assumed that the demand is a uniformly bounded random variable (i.e., |Dn| ≤ C for a finite constant C). Thus, although several sufficient conditions for the validity of the ACOEs have been described in the literature, none of them covers our problem. In Section 6, we provide the sufficient conditions that do this.
Recall that we consider a single-commodity system with positive or negative demand. Assume that items that are returned are immediately available for resale. All lead times are assumed to be zero, the production/scrapping and borrowing/storage costs are assumed to be linear, and the unmet demand is backlogged. The cost of inventory held or backlogged (negative inventory) is modeled as a convex function. The sequence of events is as follows. At the beginning of the period, a decision-maker decides how much of the product to order or to scrap (at a loss) to meet demand. In addition, the decision-maker simultaneously decides how much inventory to borrow from or store to a (potentially external) secondary source. During the period, demand is realized and holding costs are accrued on the surplus or backlogged inventory. At the end of the period, borrowed or stored material is returned and the process continues. Note that since borrowed or stored inventory is returned the next period, holding costs are accrued but the inventory level of the next period is not affected. The objective is to minimize the total expected discounted cost over a finite horizon or the discounted or average cost over an infinite horizon. Let the following hold:
For
, let a+ and a− be the positive and negative parts of a, respectively; a+ = max{a,0} and a− = max{−a,0}. Let Xn be the inventory position at period n and let the ordered pair (Yn,Zn) be the amount ordered/scrapped and the amount borrowed/stored in this period. Denote the one-step cost function by C(x,(a,b)):
We model the decision scenario as a Markov decision process (see, e.g., Bertsekas [5,6], Dynkin and Yushkevich [14], Feinberg and Shwartz [17], Puterman [33], or Ross [36]). A general policy can be randomized and depend on the history of the inventory levels and decisions. A policy is called stationary if decisions are nonrandomized and depend only on the current inventory level; that is, a stationary policy is defined by a measurable function that maps the inventory position (the state space,
) to the set of potential actions (the action space). A Markov policy is defined as a sequence d0,d1,…, where dn(x) is the decision that should be selected if the inventory level is x at step n. In our model, if an action (a,b) is chosen in state x, the cost C(x,(a,b)) is accrued, the system moves to state x + a − D, and this process continues. As was alluded to earlier, since the amount that is borrowed or stored is returned the next period, it has no effect on the subsequent inventory position. For a policy π and for an initial inventory level x, we define
Equations (2.2), (2.3), and (2.4) define the N-stage expected discounted cost, the infinite horizon expected discounted cost, and the long-run average expected cost, respectively. In the finite horizon problem, only the portion of the policy required for the time horizon is used. In each case, we define the optimal values
where Π is the set of all policies. A policy φ is called optimal for the respective criterion if its value vN,βφ(x), vβφ(x), or wφ(x) corresponds to the value on the right-hand side of (2.5), (2.6), or (2.7), respectively, for all
.
We remark that our assumptions also imply that vβ(x) < ∞ for all
. Indeed, the assumptions on the holding cost h imply that there is a point x* such that
. Without loss of generality, we assume that x* = 0 and h(0) = 0. Consider a policy φ that never borrows/stores and always orders/scraps in a way that the inventory level before the demand is known is zero. Then
In this section, we study the finite horizon problem. Since it is fixed throughout the section, we suppress β whenever possible. It is well known that if a solution to the following finite horizon optimality equations (FHOEs) exist, that solution is equal to vn (componentwise) as defined in (2.5). Let v0 ≡ 0 and for n = 1,2,…,
where
We observe that an equivalent system is
where
This observation leads to an algorithm for solving the proposed problem. First, solve (3.4) for g*. Using this function, find the optimal ordering/scrapping policy by solving (3.3). Finally, using g* evaluated at y + a, find the optimal borrowing/storage decision b. The following lemma provides preliminary results on vn and g*.
Lemma 3.1:
(i) The functions g*(y) and vn(y) are convex in y for all n ≥ 0.
(ii) g*(y) → ∞ and vn(y) → ∞ as |y| → ∞ for all n ≥ 1.
(iii) The inf can be replaced with min in (3.3) and (3.4).
Proof: For any convex function k(y) and random variable X such that
is also convex in y (cf. Bertsekas [6, Lemma4.2.1]). Since h is convex, the function inside the infimum in (3.4) is jointly convex in y and b. Thus, applying Proposition B-4 of Heyman and Sobel [27], we have that g*(y) is convex in y. The fact that g*(y) → ∞ as |y| → ∞ follows from the assumption that h(y) → ∞ as |y| → ∞.
We prove parts (i) and (ii) by induction. By assumption, v0 is convex. Assume that convexity holds for n − 1. Since we have just shown that g* is convex, the function inside the infimum in (3.3) is jointly convex in y and a. Again applying Proposition B-4 of Heyman and Sobel [27] yields that vn(y) is convex in y. Moreover, note that vn(y) → ∞ as |y| → ∞ is implied by the fact that g*(y) → ∞ as |y| → ∞ and the result is proven.
To verify (iii), we observe that for each y the function of b on the right-hand side of (3.4) is convex in
and, therefore, continuous. In addition, this function tends to ∞ as |b| → ∞. This is also true for (3.3) with the variable a instead of b. █
For a function f (y), let d−f (y)/dy denote the left-hand derivative of f. This derivative exists if f is convex. In light of the results of Lemma 3.1, the infimums in (3.3) and (3.4) can be replaced by minimums. The minimum can be computed by finding the place at which the left-hand derivative changes sign. We, thus, can define the following quantities for n = 1,2,…:
where the supremum (infimum) of the empty set is taken to be −∞ (∞). Although the values defined in (3.5)–(3.8) depend on β, to keep the notation simple, we continue to suppress this dependence in the current section. For example, the full notation for Lnp is Ln,βp.
We remark that Eqs. (8-58a) and (8-58b) of Heyman and Sobel [27] are similar to our (3.5)–(3.12), but only one point where the appropriate convex functions achieve their minimums was considered in [27]. Here, we consider the intervals of all possible solutions. Obviously, Lnp ≤ Lnp, Unp ≤ Unp, Lb ≤ Lb, and Ub ≤ Ub, where equalities are possible in each case. We say that Lnp is a lower actual inventory threshold if
Unp is an upper actual inventory threshold if
Lb is a lower total (actual plus borrowed) inventory threshold if
and Ub is an upper total inventory threshold if
The following lemma clarifies these definitions.
Lemma 3.2: For any n = 1,2,…, consider four threshold levels Lnp, Unp, Lb, and Ub satisfying (3.13)–(3.16). Define the following decisions: order up/scrap down to the level
where y is the current inventory level, and borrow up/store down to the level
Then the actions an(y) = tn′ − y and bn(y) = zn′ − an(y) minimize the right-hand sides of the optimality equations (3.3) and (3.4), respectively. Therefore, for any N = 1,2,…, the policy φ = {dN,dN−1,…,d1} is optimal for the N-step problem, where dn(x) = (an(x),bn(x)), n = 1,…,N, and −∞ < x < ∞.
Proof: In light of Lemma 3.1, the problem of finding optimal policies via (3.3) and (3.4) is simply the single-period cash balance models discussed in [15] and [27, Sect.8-4]. Thus, similar to [27, Sect.8-4], we have optimal order-up-to and order-down-to levels defined by the minimums of the convex functions defined in (3.3) and (3.4) when the changes of variables z = y + a and z = y + b are applied. For any Lnp ∈ [Lnp,Lnp] , an optimal action is to order up to level Lnp if the current inventory y < Lnp. If the inventory level is above any level Unp ∈ [Unp,Unp] , an optimal action is to reduce the inventory level to Unp. If the inventory level is between Lnp and Unp, an optimal action is not to use the ordering/scrapping option. Similarly, the optimal decision obtained from (3.4) is to increase y to Lb ∈ [Lb,Lb] if y < Lb and to decrease y to Ub ∈ [Ub,Ub] if y > Ub. No borrowing/storage action is required if Lb < y < Ub. █
The following lemma simplifies the structure of the optimal policy in the sense that it displays that we need not produce (scrap) and store (borrow) in the same period.
Lemma 3.3: The following inequalities hold: Lnp ≤ Ub and Lb ≤ Unp, for n = 1,2,….
Proof: We prove the first inequality. The proof of the second inequality is similar. Suppose Lnp > Ub and set Lnp = Lnp and Ub = Ub. For n = 1 and y < L1p, (3.1) and Lemma 3.2 imply that for a = L1p − y > 0 and b = Lb − L1p < 0,
where the last expression corresponds to the policy that orders (a+ − b−) units and borrows nothing. This violation of the optimality equation implies the contradiction. Thus, the case n = 1 is proven.
For n ≥ 2, the inequality Lnp > Ub is not possible by similar arguments, but we must consider two-step decisions. Again, suppose that this inequality holds. According to Lemma 3.2, when the initial state y < Lnp, there is an optimal policy φ that prescribes to order up to the level Lnp (a = Lnp − y) and then to store down to the level Ub (b = Ub − Lnp < 0). Consider now a policy ψ (when the initial state is y) that prescribes to order up to the level Ub at the first step and borrow nothing. At the second step, it orders/scraps the amount that the policy φ would order/scrap at the second step if the inventory level were Lnp − D1 plus it orders −b = Lnp − Ub to make up the difference. It also borrows the same amount as the policy φ given that their actual inventory levels are the same. At the following steps, the policies coincide so that the inventory position seen by ψ coincides with φ; the processes couple.
Denote by c(y,d1(y)) the ordering/scrapping costs for policy φ at the second step. The total ordering costs at the first two steps plus the borrowing/storage cost at stage 2 for policy φ are
Similarly for the policy ψ, we have
All other costs for these two policies coincide. Since Cψ(y) < Cφ(y) (almost surely), we have vnψ(y) < vnψ(y). Thus, φ is not an optimal policy. This contradiction completes the proof. █
For Lnp, Unp, Lb, and Ub satisfying (3.13)–(3.16), define
and
Lemma 3.3 implies
Combining Lemmas 3.2 and 3.3 and (3.24), we arrive at the major result of this section.
Theorem 3.4: Consider four threshold levels Lnp, Unp, Lb, and Ub satisfying (3.13)–(3.16) and consider Lnb and Unb defined in (3.22) and (3.23). Moreover, for the current inventory level y, let tn′ be the order up/scrap down to action defined in (3.17) and let zn′ be the borrow up/store down to action defined by
Then, (3.24) holds and the actions an(y) = tn′ − y and bn(y) = zn′ − an(y) minimize the right-hand sides of the optimality equations (3.3) and (3.4), respectively. Therefore, for any N = 1,2,…, the policy φ = {dN,dN−1,…,d1} is optimal for the N-step problem, where dn(x) = (an(x),bn(x)), n = 1,…,N, and −∞ < x < ∞.
Theorem 3.4 implies the existence of an optimal policy that in each period either produces (scraps) up (down) to the level Lnp (Unp) and then potentially borrows (scraps) up (down) to the level Lb (Ub). These ideas are illustrated in the following example.
Example 3.5: Consider a finite horizon discounted cost problem with the following parameters:
The holding cost is assumed quadratic, and when x > 0 units are held in inventory, the holding cost is 0.002x2, and when x ≤ 0, the backordering cost is 0.008x2. The probability mass function p(·) of demand per period is
After a finite state approximation, the optimal policy can be defined by tn′ in (3.17), where the optimal production/scrapping levels are
and Lnb = 0, Unb = 2 for all n = 1,2,… in (3.25).
Aside from the observation that there exists several order-up-to or down-to levels, one should also note that the borrowing and storage options are used as secondary options to meet demand. Thus, the borrow-up-to level is higher than the order-up-to level, and the store-down-to level is lower than the scrap-down-to level.
Example 3.5 illustrates that it is possible that an optimal policy uses all four options: producing, scrapping, borrowing, and storing. The following two propositions give sufficient conditions when only ordering/scrapping options should be used and managers should not borrow or store. Proposition 3.6 states that borrowing and storage should not be used when they are relatively expensive. Proposition 3.7 indicates that borrowing and storage should not be used when the demand is either nonnegative or nonpositive.
Proposition 3.6: Suppose the holding and backordering costs are linear,
If e+ > h−, then the optimal policy presented in Theorem 3.4 does not borrow. Similarly, if h+ < e−, this optimal policy does not store. In addition, if e+ = h−, then Lb = −∞ and the optimal policy defined in Theorem 3.4 with Lb = −∞ does not borrow. Similarly, if e− = h+, then Ub = ∞ and the optimal policy defined in Theorem 3.4 with Ub = ∞ does not store.
The case when e+ = h− (e− = h+) has the potential that one could borrow (store) and still be optimal but that this option need not be exercised. This is a consequence of the possibility of multiple optimal borrowing (storage) levels.
Proof: Note that
Thus, (3.9) implies that Lb = −∞ and yields the first result. Now, note that
so that the second result follows by assumption since h+ < e− implies Ub = ∞. The cases e+ = h− and h+ = e− follow from similar considerations. █
Theorem 3.4 implies that
Similarly, for n ≥ 1,
For standard inventory problems with nonnegative demands and zero setup costs, so-called S-policies (also called “order-up-to” or “basestock” policies) are optimal: Always order up to the level S when the inventory level is smaller than S. For finite horizon problems, these order-up-to levels may depend on the stage number. The following statement demonstrates that for inventory problems with nonnegative demands, borrowing should not be used unless borrowing costs per unit are less expensive than ordering costs. Unlike Proposition 3.6, we do not assume that the holding costs are linear.
Proposition 3.7: Suppose c+ ≤ e+ and P(D ≥ 0) = 1. Then Lb ≤ Lnp, and by selecting Lb = Lb in Theorem 3.4, we have that the optimal policy defined by (3.17) and (3.25) never borrows.
Proof: If Lb ≤ Lnp and Lb = Lb, then Lnb = Lnp and the policy defined by (3.17) and (3.25) never borrows. So, we need only prove that Lb ≤ Lnp. According to (3.5), this inequality is equivalent to
for all z < Lb. We fix z < Lb. Note that from (3.27)
and (3.29) holds for n = 0 as a nonstrict inequality. We consider any n = 1,2,… and make the induction assumption that the left-hand side of (3.29) is nonpositive for n − 1. Differentiating (3.28) yields (recall Lb ≤ Lnb ≤ Unp)
Adding c+ + (d−g*(z)/dz) = c+ − e+ to both sides of (3.30) and applying the inductive hypothesis (since D ≥ 0 almost surely) yields that (3.29) holds and the result is proven. █
Since the cases when the demand is nonnegative and nonpositive are symmetric, Proposition 3.7 implies the following corollary.
Corollary 3.8: Suppose c− ≤ e− and P(D ≤ 0) = 1. Then, Ub ≥ Unp, and by selecting Ub = Ub in Theorem 3.4, we have the optimal policy defined by (3.17) and (3.25) never stores.
Finally, we remark that the assumption that v0 = 0 is simply for convenience; the results of this section hold when v0 is an arbitrary nonnegative convex function.
In order to obtain results analogous to Theorem 3.4 for the infinite horizon discounted problem, it is sufficient to justify taking limits as n approaches infinity on each side of the finite horizon optimality equations (3.3). This result is alluded to for the cash balance problem in [21] and [27], but apparently not shown.
Assume β < 1. Unlike the previous section, we do not suppress β. Since the optimality equations hold for problems with nonnegative costs (see, e.g., Theorem 8.2 in Strauch [39]), we may write the discounted cost optimality equations (DCOEs),
where g(y,b) is defined in (3.2). The system (4.1) is equivalent to
where g* is as defined in (3.4).
The model satisfies the following two conditions: All costs are nonnegative, and for all
and all n = 1,2…, the sets
are compact. Therefore, in view of [5, Prop.1.7, p.148] or [7, Prop. 9.17], vn,β ↑ vβ as n → ∞. Thus, vβ is a convex function and Lemma 3.1 holds for the objective function vβ and for each of the optimality equations (4.1) and (4.2). Similar to the finite horizon case, we can rewrite the optimality equation (4.2) in the form
We define the numbers Lβp, Lβp, Uβp, and Uβp defined by (3.5)–(3.8) with vn−1 replaced by vβ. As in the finite horizon case, note that Lβp ≤ Lβp and Uβp ≤ Uβp and define lower and upper actual inventory thresholds Lβp and Lβp satisfying the inequalities
The following lemma is similar to Lemma 3.2 and has a virtually identical proof, with the only difference being that (4.2) should be considered instead of (3.3).
Lemma 4.1: Consider four threshold levels Lβp, Uβp, Lb, and Ub satisfying (4.5), (4.6), (3.15), and (3.16), respectively. The stationary policy that orders up/scraps down to the level
where y is the current inventory level, and borrows up/stores down to the level
is optimal.
Similar to Lemma 3.3, we have that
The proofs of these inequalities coincide with the proof of Lemma 3.3 for n ≥ 2. We also define
. Thus, similar to Theorem 3.4, we have the main result for infinite horizon discounted cost problems.
Theorem 4.2: Consider four threshold levels Lβp, Uβp, Lb, and Ub satisfying (4.5), (4.6), (3.15), and (3.16), respectively. The stationary policy defined by the ordering/scrapping decision (4.7) for a current inventory level y and by the borrowing/storage decision to borrow up/store down to the level
is optimal.
Example 4.3: Consider the infinite horizon analog of Example 3.5. An optimal policy is defined by
. This is simply the four-threshold policy from Example 3.5 for n ≥ 6.
For infinite horizon problems, Propositions 3.6 and 3.7 hold for the stationary policy defined in Theorem 4.2. The proofs are virtually unchanged. However, since Theorem 4.2 describes a stationary policy, a stronger version of Proposition 3.7 holds.
Proposition 4.4: Suppose c+ ≤ e+ and P(D ≥ 0) = 1. Then, Lb ≤ Lβp, and by selecting Lb = Lb in Theorem 4.2, we have that the Lβp policy that prescribes to order at each step up to the level Lβp, never borrows, never scraps, and never stores is optimal when the initial inventory level y ≤ Lβp.
Similar to Corollary 3.8, we have the following statement.
Corollary 4.5: Suppose c− ≤ e− and P(D ≤ 0) = 1. Then, Ub ≥ Uβp, and by selecting Ub = Ub in Theorem 4.2, we have that the policy that prescribes to scrap at each step down to the level Lβp, never borrows, never orders, and never stores is optimal when the initial inventory level y ≥ Uβp.
We remark that in addition to the convergence of the values vn,β ↑ vβ, the convergence of the optimal policies takes place. Indeed, if
are limit points of sequences of the optimal ordering/scrapping thresholds for the objective function vn,β, then for any Lb and Ub satisfying (3.15) and (3.16), we have that the four-threshold policy with
defined in Theorem 4.2 is optimal for the infinite horizon problem. This follows, for example, from the remark in [5, p.149].
In this section, we extend the previous results for the average cost case. We define the constant
where
.
As is shown in the next section, for our problem, w < ∞ and there exists a function u(x) with nonnegative finite values such that
where g(y,b) is as defined in (3.2). In addition, there exists a sequence βn ↑ 1 such that limβn↑1(1 − βn)mβn = w and u(x) = limn→∞{vβn(x) − mβn} ≥ 0 for all
. Thus, the function u is nonnegative and convex and u(x) → ∞ as |x| → ∞.
Analogous to the discounted case, an equivalent system is
where g* is defined as in (3.4).
Similar to (3.5)–(3.8), define
and consider Lb, Lb, Ub, and Ub defined by (3.9), (3.10), (3.11), and (3.12), respectively. Consider Lb and Ub satisfying (3.15) and (3.16), respectively. As analogs to (3.13) and (3.14), consider Lp and Up satisfying the inequalities
Lemma 5.1: Consider four threshold levels Lp, Up, Lb, and Ub satisfying (5.8), (5.9), (3.15), and (3.16), respectively. The stationary policy that prescribes to order up/scrap down to
where y is the current inventory level, and to borrow up/store down to the level
defines the values a = t′ − y and b = z′ − t′ that minimize the right-hand side of (5.3) and (3.4), respectively. In addition, the infimum of the average costs per unit time, w(y) defined in (2.7), equals the constant w defined in (5.1), and the thresholds t′ and z′ define a stationary optimal policy.
Proof: Similar to Lemma 3.2, the convexity of u and h implies that the policy described minimizes the right-hand sides of (5.3) and (3.4). We recall that a policy for an MDP is called stationary if it is defined by a measurable mapping of the state space into an action space. The interpretation of this mapping is that if the system is at some state, the value of this mapping is the selected action (independent of time). For our problem, a stationary policy φ is a measurable mapping from
to
, where the first coordinate is how much to order/scrap and the second coordinate is how much to borrow/store. We remark that t′ = t′(y) in (5.10) and z′ = z′(t′) = z′(t′(y)) in (5.11) for an inventory level y. We consider the mapping φ(y) = (a(y),b(y)) = (t′(y),z′(t′(y))). Since the functions t′ and z′ have simple threshold forms, φ is measurable. Thus, we have
According to Schäl [37, Prop. 1.3], if the left-hand side of (5.12) is greater than or equal to the right-hand side, then wφ(y) = w(y) = w for all
. █
Lemma 5.2: Lp ≤ Ub and Lb ≤ Up.
Proof: Let φ be the (stationary) policy defined by (5.10) and (5.11). Since φ defines actions that minimize the right-hand sides of (5.3) and (3.4), the policy φ is canonical; see [25, Sect.5.2]; that is, for any n ≥ 1, φ minimizes the criterion
. The rest of the proof follows from the coupling arguments in the proof of Lemma 3.3 for n ≥ 2. █
Corresponding to (3.22) and (3.23), define
Lemma 5.2 implies
, which together with Lemma 5.1 imply our main result for undiscounted costs.
Theorem 5.3: For four thresholds Lp, Up, Lb, and Ub satisfying (5.8), (5.9), (3.15), and (3.16), respectively, consider the thresholds
defined in (5.13) and (5.14). The stationary policy that prescribes for a current inventory level y to order up/scrap down to the level t′ defined in (5.10) and to borrow up/store down to the level
where y is the current inventory level and
have been defined in (5.13) and (5.14), is optimal.
Example 5.4: Suppose we use the same parameters as Example 3.5, but consider the average cost criterion. An optimal policy is defined by
.
The following proposition is similar to Proposition 3.6. Its proof is identical to that of Proposition 3.6, with the major difference being that (5.3) should be considered instead of (3.3).
Proposition 5.5: Suppose the holding costs are linear (i.e., (3.26) holds). If e+ > h−, then the optimal policy presented in Theorem 5.3 does not borrow. Similarly, if h+ < e−, this policy does not store. In addition, if e+ = h−, then Lb = −∞ and the optimal policy defined in Theorem 5.3 with Lb = −∞ does not borrow. Similarly, if e− = h+, then Ub = ∞ and the optimal policy defined in Theorem 5.3 with Ub = ∞ does not store.
Analogous to Proposition 4.4 and Corollary 4.5, we have the following two results.
Proposition 5.6: Suppose c+ ≤ e+ and P(D ≥ 0) = 1. Then Lb ≤ Lp, and by selecting Lb = Lb in Theorem 5.3 we have that the Lp policy that prescribes to order at each step up to the level Lp, never borrows, never scraps, and never stores, is optimal when the initial inventory level y ≤ Lp.
Corollary 5.7: Suppose c− ≤ e− and P(D ≤ 0) = 1. Then Ub ≥ Up, and by selecting Ub = Ub in Theorem 5.3 we have that the policy that prescribes to scrap at each step down to the level Up, never borrows, never orders, and never stores, is optimal when the initial inventory level y ≥ Up.
At the end of Section 4, we established that discounted cost finite horizon optimal thresholds converge to discounted cost infinite horizon optimal thresholds. Similarly, discounted cost infinite horizon optimal thresholds converge in some sense to optimal thresholds for the average cost criterion. Indeed, for ordering and scrapping decisions, let
be limit points of a sequence of optimal ordering/scrapping thresholds Lβnp and Uβnp as n → ∞, where the sequence {βn,n ≥ 0} is chosen as discussed in the text following 5.2. Appealing to the results in the the next section implies that
are respectively optimal ordering and scrapping thresholds for the average cost per unit time criterion.
In this section, we prove the existence of a solution to the ACOEs (5.2) and (3.2). Instead of restricting attention solely to the problem at hand, we provide sufficient conditions for the existence of a solution to the ACOEs for a more general MDP, and then show that these conditions are satisfied for our problem.
Consider an MDP with a standard Borel state space
. Suppose ρ is a metric on
such that ρ(x,y) < ∞ for all
is complete and separable. Assume that the one-step costs c(x,a) are nonnegative and that the MDP satisfies the standard Borel measurability conditions (see, e.g., [37]). The optimal values in the infinite horizon discounted cost case, vβ(x), are defined for discount factors β ∈ [0,1). Let A(x) be the set of available actions at state x and q be the transition probability. Consider the following set of assumptions
Assumptions C:
1. There exists a nondecreasing, continuous function η on R+ = [0,∞) such that η(0) = 0, η(r) < ∞ for all
,
for all
, and
for all
and a ∈ A(x).
2. There exists a finite constant w such that
where
.
3. For
, there exists a compact subset
such that c(x) ≥ N for all
, where c(x) = infa∈A(x) c(x,a).
Note that Assumption C1 implies that vβ(x) is a continuous function in x for any β ∈ [0,1). Since vβ(x) ≥ c(x), there exist xβ such that vβ(xβ) = mβ. Let M(β) = {x ∈ X|vβ(x) = mβ}. In particular, M(β) ⊆ KN for N > mβ. The following lemma is similar to Lemma 4.6 in [37] and to Lemma 4 in [8].
Lemma 6.1: Let Assumptions C2 and C3 hold. Suppose {β(n),n ≥ 1} is a subsequence such that β(n) ↑ 1 and w = limn→∞(1 − β(n))mβ(n). Then, there exists a compact subset
and a finite integer [ell ] such that M(β(n)) ⊆ K for n ≥ [ell ].
Proof: Consider N > w + 1. Then, N > limn→∞(1 − β(n))mβ(n) + 1 and there exists an integer [ell ] such that for n ≥ [ell ]
Set K = KN as defined in Assumption C3. We prove by contradiction that K is the set whose existence is postulated by the lemma. Let β satisfy (6.2) and suppose that this is not the case. Then, there exists
; that is, c(x) ≥ N and vβ(x) = mβ. Let T be the first hitting time of K; T = inf{n ≥ 0|Xn ∈ K}. Then, for any stationary policy π,
where the strict inequality follows from (6.2) and the last inequality follows from T ≥ 1 when x ∉ K. Since π is arbitrary, vβ(x) > mβ and the inclusion x ∈ M(β) is not possible. █
We recall that a real-valued function f defined on a metric space Y is called inf-compact if the set
is compact for any
. Note that since compact sets are closed, inf-compactness implies lower-semicontinuity. Consider the following two additional conditions:
Assumptions C (continued):
4. For each fixed
, the function c(x,a) is inf-compact in a.
5. Transition probabilities q(·|x,a) are weakly continuous in a for each
.
Theorem 6.2: Suppose C hold. Then the following hold:
1. There exists a continuous nonnegative function u on
such that for all
,
2. There exists a sequence β(n) → 1 such that u(x) = limn→∞ uβ(n)(x), where uβ(x) = vβ(x) − mβ,
.
3. If
is a linear space and the functions vβ are convex for all β ∈ [0,1), then u is convex.
4. For fixed
, let a* be a limit point of a sequence {an,n ≥ 1}, where vβ(n)(x) = c(x,an) + β(n)∫vβ(n)(y)q(dy|x,an) (the point a* exists in view of Lemma 6.1). Then
Proof: Consider the discount cost optimality equations (DCOEs),
In particular, the minimum is achieved in the right-hand side of (6.4) since the expression minimized is lower semicontinuous (the sum of two lower semicontinuous functions). The first function is lower semicontinuous because of Assumption C4 and the second function is lower semicontinuous because of Assumptions C1 and C5. The former implies that vβ(x) is continuous. In addition, vβ(x) ≥ 0.
A little algebra in (6.4) reveals
Utilizing Assumption C2, there exists a sequence β(n) ↑ 1 as n → ∞ such that limn→∞ β(n)mβ(n) = w. Lemma 6.1 implies that we can select this sequence such that M(β(n)) ⊆ K, where K is a compact subset of
. Let diam(K) denote the maximal distance between two points in K (the diameter of K). Since K is compact, diam(K) exists and is finite. We fix any point z0 ∈ K. Then, according to Assumption C1,
where z ∈ K is such that vβ(n)(z) = mβ(n).
Since |uβ(x) − uβ(y)| ≤ ε when η(ρ(x,y)) < ε, the family of functions uβ(n) is equicontinuous. By the Ascoli theorem [25, p.96], there exists a subsequence {β(nk),k ≥ 1} of the sequence {β(n),n ≥ 1} such that uβ(nk) converges pointwise to a continuous function u and this convergence is uniform on each compact subset of
. In particular, for each
,
Fix
. We first show that
where the minimum in (6.8) is attained by the same reasons as in (6.3). For each β(nk), consider a(nk) such that
Without loss of generality, select the sequence nk such that |(1 − β(nk))mβ(nk) + uβ(nk)(x) − w − u(x)| ≤ 1 for all nk. Assumption C4 implies that all a(nk) belong to the compact set KN(x), where N = w + u(x) + 1. Thus, there exists a* ∈ A(x) and a subsequence {mk} of {nk} such that a(mk) → a*. The result obtained by Serfozo's [38] extension of Fatou's lemma (see also Lemma 2.3 in [37]) implies that
Since c is lower semicontinuous,
Thus, (6.8) is proved.
From (6.5), we have that for any a ∈ A(x),
Consider (6.11) for β = β(n) defined above and apply sequentially the Ascoli theorem [25, p.96] and Lebesgue's dominated convergence theorem. The latter is applicable in view of Assumption C1 and (6.6). Indeed, (6.6) yields
and (6.1) implies that
We, thus, have for any a ∈ A(x),
which is equivalent to
The next result shows that the ACOEs (5.2) and (3.2) hold for the inventory problem considered. We remark that Theorem 6.2 also implies the validity of the ACOEs for problems without borrowing and storage (let
in (5.2)).
Proposition 6.3: In the inventory problem considered, there exists a solution to the ACOEs (5.2) and (3.2), (w,u(y)), such that the relative value function u(y) is nonnegative, convex, and equal to the limit of functions uβ(n) described in Theorem 6.2.
Proof: We verify that Assumptions C1–C5 hold so that Theorem 6.2 applies. First, in our case
, ρ(x,y) = |x − y|, and η(r) = r max{c+,c−}. Consider two inventory levels x and z. In state x, suppose the manager orders or scraps inventory to bring the level to z, then implements an optimal policy thereafter. Since this policy may not be optimal, vβ(x) ≤ vβ(z) + max{c+,c−}|z − x|. In other words,
Since
for any
, h is convex, and h(x) → ∞ as x → ∞, we have
and Assumption C1 is verified.
Fix
. Consider the policy φ that always orders up to level x. The renewal reward theorem implies that wφ(x) = limn→∞(1/n)vn,1φ(x) < ∞. Thus, lim infβ↑1(1 − β)mβ ≤ lim infβ↑1(1 − β)vβ(x) ≤ lim infβ↑1(1 − β)vβφ(x) = wφ(x) < ∞ and Assumption C2 holds.
To verify Assumption C3, we must prove that c(x) → ∞ as |x| → ∞. Let x → ∞ and let d = min{c+,c−,e+,e−}. Note that d > 0 and for
,
where the first inequality follows by setting y = a + b and applying Jensen's inequality. To verify the second inequality, consider the two possibilities (a)
and (b)
. The case x → −∞ is similar.
Assumption C4 holds since for
, {(a,b)|C(x,(a,b)) ≤ λ} is closed and bounded. The fact that it is bounded follows from the fact that C(x,(a,b)) → ∞, as either a or b → ∞. It is closed since the cost function is convex and, therefore, continuous. Assumption C5 is equivalent to
as an → a, for any bounded, continuous f, which follows from Lebesgue's dominated convergence theorem. █
We have studied an extension of the classic inventory control/cash management models to include one-period borrowing and storage. Instead of an optimal policy requiring two thresholds, as has been shown for the cash management problem, we have four thresholds. We expect that in the discounted models (both finite and infinite horizon), a fixed cost could be added for either ordering or scrapping (but not both) and our results could be extended without difficulties to results analogous to those shown in [16,21]. This follows from the fact that the (convex) holding cost in each model would be replaced with the function g*. On the other hand, we do not believe that allowing for fixed costs to be associated with borrowing/storage and ordering/scrapping would lead to such simple policies. Although the borrowing and storage policy would most likely be analogous to (s,S)-policies, it is not immediately clear that the K-convexity would carry through. This is left as a potential future research direction. Other potential research directions are to study problems with lost sales, lead times, and problems with the borrowing/storage time intervals longer than one period.
The authors thank Emmanuel Fernandez for the comments regarding the existence of stationary optimal policies for the average cost criterion, and Woonghee Tim Huh and Ganesh Janakiraman, who brought to our attention that condition η(0) = 0 was accidentally missing in Assumption C1 of a preliminary version of this paper. We would also like to thank the anonymous referees for several comments that improved the readability of this article. Research by the first author was partially supported by NSF grant DMI-0300121.