×

Optimal scheduling for reference tracking or state regulation using reinforcement learning. (English) Zbl 1395.93275

Summary: The problem of optimal control of autonomous nonlinear switching systems with infinite-horizon cost functions, for the purpose of tracking a family of reference signals or regulation of the states, is investigated. A reinforcement learning scheme is presented which learns the solution and provides scheduling between the modes in a feedback form without enforcing a mode sequence or a number of switching. This is done through a value iteration based approach. The convergence of the iterative learning scheme to the optimal solution is proved. After answering different analytical questions about the solution, the learning algorithm is presented. Finally, numerical analyses are provided to evaluate the performance of the developed technique in practice.

MSC:

93C30 Control/observation systems governed by functional relations other than differential equations (such as hybrid and switching systems)
93C10 Nonlinear systems in control theory
93B52 Feedback control
93B40 Computational methods in systems theory (MSC2010)
93-04 Software, source code, etc. for problems pertaining to systems and control theory
93B15 Realizations from input-output data
Full Text: DOI

References:

[1] Xu, X.; Antsaklis, P. J., Optimal control of switched systems via non-linear optimization based on direct differentiations of value functions, Int. J. Control, 75, 16-17, 1406-1426, (2002) · Zbl 1039.93005
[2] Xu, X.; Antsaklis, P., Optimal control of switched systems based on parameterization of the switching instants, IEEE Trans. Autom. Control, 49, January, 2-16, (2004) · Zbl 1365.93308
[3] Axelsson, H.; Boccadoro, M.; Egerstedt, M.; Valigi, P.; Wardi, Y., Optimal mode-switching for hybrid systems with varying initial states, Nonlinear Anal.: Hybrid Syst., 2, 3, 765-772, (2008) · Zbl 1215.49033
[4] X. Ding, A. Schild, M. Egerstedt, L. Jan, Real-time optimal feedback control of switched autonomous systems, in: IFAC Proceedings Volumes (IFAC-PapersOnline), vol. 3, 2009, pp. 108-113.
[5] H. Axelsson, M. Egerstedt, Y. Wardi, G. Vachtsevanos, Algorithm for switching-time optimization in hybrid dynamical systems, in: Proceedings of the IEEE International Symposium on Intelligent Control, June 2005, pp. 256-261.
[6] Y. Wardi, M. Egerstedt, Algorithm for optimal mode scheduling in switched systems, in: Proceedings of the American Control Conference, 2012. · Zbl 1243.93052
[7] Kamgarpour, M.; Tomlin, C., On optimal control of non-autonomous switched systems with a fixed mode sequence, Automatica, 48, 6, 1177-1181, (2012) · Zbl 1244.49070
[8] Rungger, M.; Stursberg, O., A numerical method for hybrid optimal control based on dynamic programming, Nonlinear Anal.: Hybrid Syst., 5, 2, 254-274, (2011) · Zbl 1225.49028
[9] M. Sakly, A. Sakly, N. Majdoub, M. Benrejeb, Optimization of switching instants for optimal control of linear switched systems based on genetic algorithms, in: IFAC Proceedings Volumes (IFAC-PapersOnline), vol. 2, 2009. · Zbl 1037.93519
[10] Lien, C.-H.; Yu, K.-W.; Chang, H.-C.; Chung, L.-Y.; Chen, J.-D., Switching signal design for exponential stability of discrete switched systems with interval time-varying delay, J. Frankl. Inst., 349, 6, 2182-2192, (2012) · Zbl 1300.93145
[11] Zhai, S.; Yang, X.-S., Exponential stability of time-delay feedback switched systems in the presence of asynchronous switching, J. Frankl. Inst., 350, 1, 34-49, (2013) · Zbl 1282.93088
[12] Heydari, A.; Balakrishnan, S., Optimal multi-therapeutic HIV treatment using a global optimal switching scheme, Appl. Math. Comput., 219, 14, 7872-7881, (2013) · Zbl 1288.92012
[13] C. Qin, H. Zhang, Y. Luo, B. Wang, Finite horizon optimal control of non-linear discrete-time switched systems using adaptive dynamic programming with epsilon-error bound, Int. J. Syst. Sci. (2013), http://dx.doi.org/10.1080/00207721.2012.748945. · Zbl 1291.49021
[14] W. Lu, S. Ferrari, An approximate dynamic programming approach for model-free control of switched systems, in: Proceedings of the IEEE Conference on Decision and Control, 2013, pp. 3837-3844.
[15] Rinehart, M.; Dahleh, M.; Reed, D.; Kolmanovsky, I., Suboptimal control of switched systems with an application to the disc engine, IEEE Trans. Control Syst. Technol., 16, 2, 189-201, (2008)
[16] A. Heydari, S.N. Balakrishnan, Optimal orbit transfer with on-off actuators using a closed form optimal switching scheme, in: AIAA Guidance, Navigation, and Control Conference, 2013.
[17] Benmansour, K.; Benalia, A.; Djemaï, M.; de Leon, J., Hybrid control of a multicellular converter, Nonlinear Anal.: Hybrid Syst., 1, 1, 16-29, (2007) · Zbl 1117.93304
[18] Liu, C.; Gong, Z., Modelling and optimal control of a time-delayed switched system in fed-batch process, J. Frankl. Inst., 351, 2, 840-856, (2014) · Zbl 1293.93085
[19] Hernandez-Vargas, E.; Colaneri, P.; Middleton, R.; Blanchini, F., Discrete-time control for switched positive systems with application to mitigating viral escape, Int. J. Robust and Nonlinear Control, 1093-1111, (2011) · Zbl 1225.93072
[20] Zhai, J.; Shen, B.; Gao, J.; Feng, E.; Yin, H., Optimal control of switched systems and its parallel optimization algorithm, J. Comput. Appl. Math., 261, 287-298, (2014) · Zbl 1278.93180
[21] Lincoln, B.; Rantzer, A., Relaxing dynamic programming, IEEE Trans. Autom. Control, 51, August, 1249-1260, (2006) · Zbl 1366.90208
[22] Rinehart, M.; Dahleh, M.; Kolmanovsky, I., Value iteration for (switched) homogeneous systems, IEEE Trans. Autom. Control, 54, 6, 1290-1294, (2009) · Zbl 1367.93342
[23] Sutton, R. S.; Barto, A. G., Reinforcement learning: an introduction, (2012), MIT Press Cambridge, MA
[24] Werbos, P. J., Approximate dynamic programming for real-time control and neural modeling, (White, D. A.; Sofge, D. A., Handbook of Intelligent Control, (1992), Multiscience Press New York, NY)
[25] Balakrishnan, S. N.; Biega, V., Adaptive-critic based neural networks for aircraft optimal control, J. Guid, Control Dyn., 19, 893-898, (1996) · Zbl 0875.93396
[26] Prokhorov, D.; Wunsch, D., Adaptive critic designs, IEEE Trans. Neural Netw., 8, 997-1007, (1997)
[27] Al-Tamimi, A.; Lewis, F.; Abu-Khalaf, M., Discrete-time nonlinear hjb solution using approximate dynamic programmingconvergence proof, IEEE Trans. Syst., Man, Cybern., Part B: Cybern., 38, August, 943-949, (2008)
[28] Venayagamoorthy, G.; Harley, R.; Wunsch, D., Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator, IEEE Trans. Neural Netw., 13, May, 764-773, (2002)
[29] He, P.; Jagannathan, S., Reinforcement learning-based output feedback control of nonlinear systems with input constraints, IEEE Trans. Syst., Man, Cybern., Part B: Cybern., 35, 1, 150-154, (2005)
[30] Zhang, H.; Wei, Q.; Luo, Y., A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm, IEEE Trans. Syst., Man, Cybern., Part B: Cybern., 38, 4, 937-942, (2008)
[31] Dierks, T.; Thumati, B. T.; Jagannathan, S., Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence, Neural Netw., 22, 5-6, 851-860, (2009) · Zbl 1338.49074
[32] Wang, D.; Liu, D.; Wei, Q.; Zhao, D.; Jin, N., Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, 48, 8, 1825-1832, (2012) · Zbl 1269.49042
[33] Lewis, F.; Vrabie, D.; Vamvoudakis, K., Reinforcement learning and feedback controlusing natural decision methods to design optimal adaptive controllers, IEEE Control Syst., 32, December, 76-105, (2012) · Zbl 1395.93584
[34] Fairbank, M.; Alonso, E.; Prokhorov, D., An equivalence between adaptive dynamic programming with a critic and backpropagation through time, IEEE Trans. Neural Netw. Learn. Syst., 24, 12, 2088-2100, (2013)
[35] Chen, X.; Gao, Y.; Wang, R., Online selective kernel-based temporal difference learning, IEEE Trans. Neural Netw. Learn. Syst., 24, 12, 1944-1956, (2013)
[36] Heydari, A.; Balakrishnan, S. N., Fixed-final-time optimal control of nonlinear systems with terminal constraints, Neural Netw., 48, 61-71, (2013) · Zbl 1297.93109
[37] Q. Zhao, H. Xu, S. Jagannathan, Optimal control of uncertain quantized linear discrete-time systems, Int. J. Adapt. Control Signal Process. (2014), http://dx.doi.org/10.1002/acs.2473. · Zbl 1330.93150
[38] Heydari, A.; Balakrishnan, S., Optimal switching and control of nonlinear switching systems using approximate dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., 25, 1106-1174, (2014)
[39] Heydari, A.; Balakrishnan, S., Optimal switching between autonomous subsystems, J. Frankl. Inst., 351, (2014) · Zbl 1372.93115
[40] Heydari, A.; Balakrishnan, S., Optimal switching between controlled subsystems with free mode sequence, Neurocomputing, 149, 1620-1630, (2015)
[41] Qin, C.; Zhang, H.; Luo, Y., Optimal tracking control of a class of nonlinear discrete-time switched systems using adaptive dynamic programming, Neural Comput. Appl., 24, 3-4, 531-538, (2014)
[42] Heydari, A., Revisiting approximate dynamic programming and its convergence, IEEE Trans. Cybern., 44, 12, 2733-2743, (2014)
[43] Heydari, A.; Balakrishnan, S. N., Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Trans. Neural Netw. Learn. Syst., 24, 1, 145-157, (2013)
[44] Kirk, D. E., Optimal control theory: an introduction, pp. 53-94, (1970), Prentice-Hall Mineola, NY
[45] W.F. Trench, Introduction to Real Analysis, Available online at: 〈http://ramanujan.math.trinity.edu/wtrench/texts/trench_real_analysis.pdf〉, 2012, pp. 204-250.
[46] Rudin, W., Principles of mathematical analysis, (1976), McGraw-Hill New York, NY, pp. 55, 60 · Zbl 0346.26002
[47] Hornik, K.; Stinchcombe, M.; White, H., Multilayer feedforward networks are universal approximators, Neural Netw., 2, 5, 359-366, (1989) · Zbl 1383.92015
[48] Jeffreys, H.; Jeffreys, B. S., Weierstrass׳s theorem on approximation by polynomials, in: Methods of Mathematical Physics, (1988), Cambridge University Press Cambridge, England, pp. 446-448
[49] Available online at 〈http://webpages.sdsmt.edu/ aheydari/Research/SourceCodes〉.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.