×

A note on the existence of optimal stationary policies for average Markov decision processes with countable states. (English) Zbl 1522.90250

Summary: In many practical stochastic dynamic optimization problems with countable states, the optimal policy possesses certain structural properties. For example, the \((s, S)\) policy in inventory control, the well-known \(c \mu\)-rule and the recently discovered \(c / \mu\)-rule [L. Xia et al., “A \(c/\mu\)-rule for job assignment in heterogeneous group-server queues”, Prod. Oper. Manag. 31, No. 3, 1191-1215 (2022; doi:10.1111/poms.13605)] in scheduling of queues. A presumption of such results is that an optimal stationary policy exists. There are many research works regarding to the existence of optimal stationary policies of Markov decision processes with countable state spaces (see, e.g., D. P. Bertsekas [Dynamic programming and optimal control. Vol. 2. Belmont, MA: Athena Scientific (2012; Zbl 1298.90001)]; O. Hernández-Lerma and J. B. Lasserre [Discrete-time Markov control processes. Basic optimality criteria. New York, NY: Springer-Verlag (1995; Zbl 0840.93001)]; M. L. Puterman [Markov decision processes: discrete stochastic dynamic programming. New York, NY: John Wiley & Sons, Inc. (1994; Zbl 0829.90134)]; L. I. Sennott [Stochastic dynamic programming and the control of queueing systems. New York, NY: Wiley (1999; Zbl 0997.93503)]). However, these conditions are usually not easy to verify in such optimization problems. In this paper, we study the optimization of long-run average of continuous-time Markov decision processes with countable state spaces. We provide an intuitive approach to prove the existence of an optimal stationary policy. The approach is simply based on compactness of the policy space, with a special designed metric, and the continuity of the long-run average in the space. Our method is capable to handle cost functions unbounded from both above and below, which makes a complementary contribution to the literature work where the cost function is unbounded from only one side. Examples are provided to illustrate the application of our main results.

MSC:

90C40 Markov and semi-Markov decision processes

References:

[1] Altman, E., Constrained Markov decision processes (1999), CRC Press · Zbl 0963.90068
[2] Bertsekas, D. P., Dynamic programming and optimal control-vol.2 (2012), Athena Scientific: Athena Scientific Boston · Zbl 1298.90001
[3] Borkar, V. S., Control of Markov chains with long-run average cost criterion: The dynamic programming equations, SIAM Journal on Control and Optimization, 27, 642-657 (1989) · Zbl 0668.60059
[4] Borkar, V. S., Ergodic control of Markov chains with constraints-the general case, SIAM Journal on Control and Optimization, 32, 176-186 (1994) · Zbl 0795.93100
[5] Cao, P.; Xie, J., A new condition for the existence of optimal stationary policies in denumerable state average cost continuous time Markov decision processes with unbounded cost and transition rates (2015), arXiv:1504.05674
[6] Cavazos-Cadena, R., Weak conditions for the existence of optimal stationary policies in average Markov decision chains with unbounded costs, Kybernetika, 25, 145-156 (1989) · Zbl 0673.90092
[7] Cavazos-Cadena, R., Recent results on conditions for the existence of average optimal stationary policies, Annals of Operations Research, 28, 3-27 (1991) · Zbl 0744.90097
[8] Cavazos-Cadena, R.; Sennott, L. I., Comparing recent assumptions for the existence of optimal stationary policies, Operations Research Letters, 11, 33-37 (1992) · Zbl 0763.90092
[9] Chao, X.; Pinedo, M., On generalized networks of queues with positive and negative arrivals, Journal Probability in the Engineering and Informational Sciences, 7, 301-334 (1993)
[10] Feinberg, E. A., Optimality conditions for inventory control, INFORMS Tutorials in Operations Research, 1, 4-44 (2016)
[11] Feinberg, E. A.; Jaśkiewicz, A.; Nowak, A. S., Constrained discounted Markov decision processes with Borel state spaces, Automatica, 111, Article 108582 pp. (2020) · Zbl 1434.90212
[12] Feinberg, E. A.; Kasyanov, P. O.; Zadoianchuk, N. V., Average-cost Markov decision processes with weakly continuous transition probabilities, Mathematics of Operations Research, 37, 591-607 (2012) · Zbl 1297.90173
[13] Feinberg, E. A.; Lewis, M. E., Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem, Mathematics of Operations Research, 32, 769-785 (2007) · Zbl 1341.90142
[14] Feinberg, E. A.; Liang, Y., On the optimality equation for average cost Markov decision processes and its validity for inventory control, Annals of Operations Research (2017)
[15] Feinberg, E. A.; Shwartz, A., Handbook of Markov decision processes: Methods and applications (2002), Kluwer · Zbl 0979.90001
[16] Guo, X.; Hernández-Lerma, O., Continuous-time Markov decision processes (2009), Springer · Zbl 1209.90002
[17] Guo, X.; Rieder, U., Average optimality for continous-time Markov decision processes in polish spaces, Annals of Applied Probability, 16, 730-756 (2006) · Zbl 1160.90010
[18] Hernández-Lerma, O., Average optimality in dynamic programming on Borel spaces - Unbounded costs and controls, Systems & Control Letters, 17, 237-242 (1991) · Zbl 0771.90098
[19] Hernández-Lerma, O.; Lasserre, J. B., Discrete-time Markov control processes: Basic optimality criteria (1996), Springer: Springer New York
[20] Hernández-Lerma, O.; Lasserre, J. B., Further topics on discrete-time Markov control processes (1999), Springer: Springer New York · Zbl 0928.93002
[21] Jaśkiewicz, A.; Nowak, A. S., On the optimality equation for average cost Markov control processes with Feller transition probabilities, Journal of Mathematical Analysis and Applications, 316, 495-509 (2006) · Zbl 1148.90015
[22] Jaśkiewicz, A.; Nowak, A. S., Constrained Markov decision processes with expected total reward criteria, SIAM Journal on Control and Optimization, 57, 5, 3118-3136 (2019) · Zbl 1421.90161
[23] Lasserre, J. B., Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes, Journal of Mathematical Analysis and Applications, 136, 479-490 (1988) · Zbl 0659.90094
[24] Liu, K.; Filar, J. A., Weighted Markov decision processes with perturbation, Mathematics Methods Operational Research, 53, 3, 465-480 (2001) · Zbl 1031.90063
[25] Meyn, S., The policy iteration algorithm for average reward Markov decision processes with general state space, IEEE Transactions on Automatic Control, 42, 1663-1680 (1997) · Zbl 0906.93063
[26] Meyn, S., Algorithms for optimization and stabilization of controlled Markov chains, Sadhana, 24, 339-367 (1999) · Zbl 1075.90558
[27] Prieto-Rumeau, T.; Hernández-Lerma, O., Selected topics on continuous-time controlled Markov chains and Markov games (2012), Imperial College Press · Zbl 1269.60004
[28] Puterman, M. L., Markov decision processes: Discrete stochastic dynamic programming (1994), John Wiley & Sons: John Wiley & Sons New York · Zbl 0829.90134
[29] Sennott, L. I., A new condition for the existence of optimum stationary policies in average cost Markov decision processes, Operations Research Letters, 5, 17-23 (1986) · Zbl 0593.90083
[30] Sennott, L. I., Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs, Operations Research, 37, 626-633 (1989) · Zbl 0675.90091
[31] Sennott, L. I., Stochastic dynamic programming and the control of queueing systems (1999), John Wiley & Sons: John Wiley & Sons New York · Zbl 0997.93503
[32] Smith, W. E., Various optimizers for single-stage production, Naval Research Logistics Quarterly, 3, 59-66 (1956)
[33] Xia, L.; Zhang, Z. G.; Li, Q. L., A c/\( \mu \)-rule for job assignment in heterogeneous group-server queues, Production and Operations Management, 31, 3, 1191-1215 (2022)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.