首页 | 官方网站   微博 | 高级检索  
     


A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system
Affiliation:1. School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea;2. Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, 291, Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea;1. School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China;2. Donlinks School of Economics and Management, University of Science and Technology Beijing, Beijing, 100083, China;1. School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072, China;2. School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China;3. The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;1. School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, Seoul, 08826, Republic of Korea;1. The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;2. The School of Automation, Guangdong University of Technology, Guangzhou 510006, China;3. The School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China;4. Texas A&M University at Qatar, PO Box 23874, Doha, Qatar
Abstract:The Hamilton–Jacobi–Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号