Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-07T11:55:59.193Z Has data issue: false hasContentIssue false

Note on discounted continuous-time Markov decision processes with a lower bounding function

Published online by Cambridge University Press:  30 November 2017

Xin Guo*
Affiliation:
University of Liverpool
Alexey Piunovskiy*
Affiliation:
University of Liverpool
Yi Zhang*
Affiliation:
University of Liverpool
*
* Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK.
* Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK.
* Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We consider the discounted continuous-time Markov decision process (CTMDP), where the negative part of each cost rate is bounded by a drift function, say w, whereas the positive part is allowed to be arbitrarily unbounded. Our focus is on the existence of a stationary optimal policy for the discounted CTMDP problems out of the more general class. Both constrained and unconstrained problems are considered. Our investigations are based on the continuous-time version of the Veinott transformation. This technique has not been widely employed in the previous literature on CTMDPs, but it clarifies the roles of the imposed conditions in a rather transparent way.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2017 

References

[1] Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL. Google Scholar
[2] Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York. Google Scholar
[3] Báuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg. Google Scholar
[4] Blok, H. and Spieksma, F. M. (2015). Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations. Adv. Appl. Prob. 47, 10881107. Google Scholar
[5] Chen, A., Pollett, P., Zhang, H. and Cairns, B. (2005). Uniqueness criteria for continuous-time Markov chains with general transition structures. Adv. Appl. Prob. 37, 10561074. CrossRefGoogle Scholar
[6] Chen, M. (2015). Practical criterion for uniqueness of Q-processes. Chinese J. Appl. Prob. Statist. 31, 213224. Google Scholar
[7] Costa, O. L. V. and Dufour, F. (2015). A linear programming formulation for constrained discounted continuous control for piecewise deterministic Markov processes. J. Math. Anal. Appl. 424, 892914. Google Scholar
[8] Dufour, F. and Piunovskiy, A. B. (2013). The expected total cost criterion for Markov decision processes under constraints. Adv. Appl. Prob. 45, 837859. Google Scholar
[9] Dufour, F. and Prieto-Rumeau, T. (2016). Conditions for the solvability of the linear programming formulation for constrained discounted Markov decision processes. Appl. Math. Optimization 74, 2751. Google Scholar
[10] Dufour, F., Horiguchi, M. and Piunovskiy, A. B. (2012). The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. Appl. Prob. 44, 774793. CrossRefGoogle Scholar
[11] Dynkin, E. B. and Yushkevich, A. A. (1979). Controlled Markov Processes. Springer, Berlin. Google Scholar
[12] Feinberg, E. A. (2004). Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492524. Google Scholar
[13] Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems, Birkhäuser, New York, pp. 7797. Google Scholar
[14] Feinberg, E. A. and Rothblum, U. G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Operat. Res. 37, 129153. Google Scholar
[15] Feinberg, E. A. and Sonin, I. M. (1996). Notes on equivalent stationary policies in Markov decision processes with total rewards. Math. Meth. Operat. Res. 44, 205221. Google Scholar
[16] Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2013). Sufficiency of Markov policies for continuous-time Markov decision processes and solutions to Kolmogorov's forward equation for jump Markov processes. In Proc. 52nd IEEE Conference on Decision and Control, IEEE, pp. 57285732. Google Scholar
[17] Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov's equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411, 261270. Google Scholar
[18] Feinberg, E., Mandava, M. and Shiryaev, A. N. (2017). Kolmogorov's equations for jump Markov processes with unbounded jump rates. To appear in Ann. Operat. Res. Available at https://doi.org/10.1007/s10479-017-2538-8. Google Scholar
[19] Gīhman, Ĭ. Ī. and Skorohod, A. V. (1975). The Theory of Stochastic Processes. II. Springer, New York. Google Scholar
[20] Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 7387. Google Scholar
[21] Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin. Google Scholar
[22] Guo, X. and Zhang, Y. (2017). Constrained total undiscounted continuous-time Markov decision processes. Bernoulli 23, 16941736. Google Scholar
[23] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York. Google Scholar
[24] Jacod, J. (1975). Multivariate point processes: predictable projection, Radon–Nykodým derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235253. CrossRefGoogle Scholar
[25] Jaśkiewicz, A. and Nowak, A. S. (2011). Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378, 450462. Google Scholar
[26] Jaśkiewicz, A. and Nowak, A. S. (2011). Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn. Games Appl. 1, 253279. Google Scholar
[27] Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC, Boca Raton, FL. Google Scholar
[28] Kuznetsov, S. E. (1984). Inhomogeneous Markov processes. J. Soviet Math. 25, 13801498. Google Scholar
[29] Le Van, C. and Morhaim, L. (2002). Optimal growth models with bounded or unbounded returns: a unifying approach. J. Econom. Theory 105, 158187. Google Scholar
[30] Miller, B. L. (1968). Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 552569. Google Scholar
[31] Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Kluwer, Dordrecht. CrossRefGoogle Scholar
[32] Piunovskiy, A. (2015). Randomized and relaxed strategies in continuous-time Markov decision processes. SIAM J. Control Optimization 53, 35033533. Google Scholar
[33] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 20322061. Google Scholar
[34] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the dynamic programming approach. Preprint. Available at https://arxiv.org/abs/1103.0134. Google Scholar
[35] Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. World Scientific, London. Google Scholar
[36] Rykov, V. V. (1966). Markov decision processes with finite state and decision spaces. Theory Prob. Appl. 11, 302311. Google Scholar
[37] Spieksma, F. M. (2015). Countable state Markov processes: non-explosiveness and moment function. Prob. Eng. Inf. Sci. 29, 623637. Google Scholar
[38] Spieksma, F. M. (2016). Kolmogorov forward equation and explosiveness in countable state Markov processes. Ann. Operat. Res. 241, 322. Google Scholar
[39] Van der Wal, J. (1981). Stochastic Dynamic Programming: Successive Approximations and Nearly Optimal Strategies for Markov Decision Processes and Markov Games. Mathematisch Centrum, Amsterdam. Google Scholar
[40] Veinott, A. F., Jr. (1969). Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 16351660. Google Scholar
[41] Zhang, Y. (2017). On the nonexplosion and explosion for nonhomogeneous Markov pure jump processes. To appear in J. Theoret. Prob. Available at https://doi.org/10.1007/s10959-017-0763-3. Google Scholar