Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-02-07T17:53:19.705Z Has data issue: false hasContentIssue false

Computable approximations for average Markov decision processes in continuous time

Published online by Cambridge University Press:  26 July 2018

Jonatha Anselmi*
Affiliation:
INRIA
François Dufour*
Affiliation:
INRIA and Université de Bordeaux
Tomás Prieto-Rumeau*
Affiliation:
UNED
*
* Postal address: INRIA Bordeaux Sud Ouest, Bureau B430, 200 av. de la Vieille Tour, 33405 Talence cedex, France. Email address: jonatha.anselmi@inria.fr
** Postal address: Institut de Mathématiques de Bordeaux, Université de Bordeaux, 351 cours de la Libération, F33405 Talence, France. Email address: francois.dufour@math.u-bordeaux.fr
*** Postal address: Statistics Department, UNED, calle Senda del Rey 9, 28040 Madrid, Spain. Email address: tprieto@ccia.uned.es
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we study the numerical approximation of the optimal long-run average cost of a continuous-time Markov decision process, with Borel state and action spaces, and with bounded transition and reward rates. Our approach uses a suitable discretization of the state and action spaces to approximate the original control model. The approximation error for the optimal average reward is then bounded by a linear combination of coefficients related to the discretization of the state and action spaces, namely, the Wasserstein distance between an underlying probability measure μ and a measure with finite support, and the Hausdorff distance between the original and the discretized actions sets. When approximating μ with its empirical probability measure we obtain convergence in probability at an exponential rate. An application to a queueing system is presented.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2018 

References

[1]Anselmi, J., Dufour, F. and Prieto-Rumeau, T. (2016). Computable approximations for continuous-time Markov decision processes on Borel spaces based on empirical measures. J. Math. Anal. Appl. 443, 13231361. Google Scholar
[2]Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA. Google Scholar
[3]Boissard, E. (2011). Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Prob. 16, 22962333. 10.1214/EJP.v16-958Google Scholar
[4]Chang, H. S., Fu, M. C., Hu, J. and Marcus, S. I. (2007). Simulation-Based Algorithms for Markov Decision Processes. Springer, London. Google Scholar
[5]Dufour, F. and Prieto-Rumeau, T. (2012). Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 12541267. Google Scholar
[6]Dufour, F. and Prieto-Rumeau, T. (2013). Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optimization 51, 12981324. Google Scholar
[7]Dufour, F. and Prieto-Rumeau, T. (2014). Stochastic approximations of constrained discounted Markov decision processes. J. Math. Anal. Appl. 413, 856879. Google Scholar
[8]Dufour, F. and Prieto-Rumeau, T. (2015). Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87, 273307. Google Scholar
[9]Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756. Google Scholar
[10]Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953985. 10.1239/aap/1293113146Google Scholar
[11]Guo, X. and Zhang, W. (2014). Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Europ. J. Operat. Res. 238, 486496. Google Scholar
[12]Hinderer, K. (2005). Lipschitz continuity of value functions in Markovian decision processes. Math. Meth. Operat. Res. 62, 322. Google Scholar
[13]Jacod, J. (1979). Calcul Stochastique et Problèmes de Martingales (Lecture Notes Math. 714). Springer, Berlin. Google Scholar
[14]Piunovskiy, A. and Zhang, Y. (2014). Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach. 4OR 12, 4975. Google Scholar
[15]Powell, W. B. (2007). Approximate Dynamic Programming. John Wiley, Hoboken, NJ. Google Scholar
[16]Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Prob. 49, 10721090. Google Scholar
[17]Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201207. Google Scholar
[18]Saldi, N., Linder, T. and Yüksel, S. (2015). Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans. Automatic Control 60, 553558. Google Scholar
[19]Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google Scholar
[20]Van Roy, B. (2002). Neuro-dynamic programming: overview and recent trends. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), Kluwer, Boston, MA, pp. 431459. Google Scholar
[21]Zhang, Y. (2014). Average optimality for continuous-time Markov decision processes under weak continuity conditions. J. Appl. Prob. 51, 954970. Google Scholar