基于鸽群的鲁棒强化学习算法 Robust reinforcement learning algorithm based on pigeon-inspired optimization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于鸽群的鲁棒强化学习算法

作者姓名：	张明英华冰张宇光李海东郑墨泓

作者单位：	1. 中国电子技术标准化研究院，北京 100007;2. 南京航空航天大学航天学院，江苏南京 211106;3. 中国电子科技集团公司第七研究所，广东广州 510000

基金项目：	科技创新2030重大项目(2020AAA0107804)

摘要：	Reinforcement learning(RL) is an artificial intelligence algorithm with the advantages of clear calculation logic and easy expansion of the model. Through interacting with the environment and maximizing value functions on the premise of obtaining little or no prior information, RL can optimize the performance of strategies and effectively reduce the complexity caused by physical models . The RL algorithm based on strategy gradient has been successfully applied in many fields such as intelligent image recognition, robot control and path planning for automatic driving. However, the highly sampling-dependent characteristics of RL determine that the training process needs a large number of samples to converge, and the accuracy of decision making is easily affected by slight interference that does not match with the simulation environment. Especially when RL is applied to the control field, it is difficult to prove the stability of the algorithm because the convergence of the algorithm cannot be guaranteed. Considering that swarm intelligence algorithm can solve complex problems through group cooperation and has the characteristics of self-organization and strong stability, it is an effective way to be used for improving the stability of RL model. The pigeon-inspired optimization algorithm in swarm intelligence was combined to improve RL based on strategy gradient. A RL algorithm based on pigeon-inspired optimization was proposed to solve the strategy gradient in order to maximize long-term future rewards. Adaptive function of pigeon-inspired optimization algorithm and RL were combined to estimate the advantages and disadvantages of strategies, avoid solving into an infinite loop, and improve the stability of the algorithm. A nonlinear two-wheel inverted pendulum robot control system was selected for simulation verification. The simulation results show that the RL algorithm based on pigeon-inspired optimization can improve the robustness of the system, reduce the computational cost, and reduce the algorithm’s dependence on the sample database. © 2022, Beijing Xintong Media Co., Ltd.. All rights reserved.
关键词：	鸽群算法强化学习策略梯度鲁棒性
Robust reinforcement learning algorithm based on pigeon-inspired optimization

Authors:	Mingying ZHANG Bing HUA Yuguang ZHANG Haidong LI Mohong ZHENG

Affiliation:	1. China Electronics Standardization Institute, Beijing 100007, China;2. College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;3. The 7th Research Institute of China Electronics Technology Group Corporation, Guangzhou 510000, China

Abstract:	Reinforcement learning(RL) is an artificial intelligence algorithm with the advantages of clear calculation logic and easy expansion of the model.Through interacting with the environment and maximizing value functions on the premise of obtaining little or no prior information, RL can optimize the performance of strategies and effectively reduce the complexity caused by physical models .The RL algorithm based on strategy gradient has been successfully applied in many fields such as intelligent image recognition, robot control and path planning for automatic driving.However, the highly sampling-dependent characteristics of RL determine that the training process needs a large number of samples to converge, and the accuracy of decision making is easily affected by slight interference that does not match with the simulation environment.Especially when RL is applied to the control field, it is difficult to prove the stability of the algorithm because the convergence of the algorithm cannot be guaranteed.Considering that swarm intelligence algorithm can solve complex problems through group cooperation and has the characteristics of self-organization and strong stability, it is an effective way to be used for improving the stability of RL model.The pigeon-inspired optimization algorithm in swarm intelligence was combined to improve RL based on strategy gradient.A RL algorithm based on pigeon-inspired optimization was proposed to solve the strategy gradient in order to maximize long-term future rewards.Adaptive function of pigeon-inspired optimization algorithm and RL were combined to estimate the advantages and disadvantages of strategies, avoid solving into an infinite loop, and improve the stability of the algorithm.A nonlinear two-wheel inverted pendulum robot control system was selected for simulation verification.The simulation results show that the RL algorithm based on pigeon-inspired optimization can improve the robustness of the system, reduce the computational cost, and reduce the algorithm’s dependence on the sample database.

Keywords:	pigeon-inspired optimization algorithm strengthen learning policy gradient robustness
本文献已被维普等数据库收录！
	点击此处可从《》浏览原始摘要信息
	点击此处可从《》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏