首页 | 官方网站   微博 | 高级检索  
     

无人机反应式扰动流体路径规划
引用本文:吴健发,王宏伦,王延祥,刘一恒.无人机反应式扰动流体路径规划[J].自动化学报,2023,49(2):272-287.
作者姓名:吴健发  王宏伦  王延祥  刘一恒
作者单位:1.北京航空航天大学自动化科学与电气工程学院 北京 100191
基金项目:国家自然科学基金(62173022, 61673042, 61175084)资助
摘    要:针对复杂三维障碍环境,提出一种基于深度强化学习的无人机(Unmanned aerial vehicles, UAV)反应式扰动流体路径规划架构.该架构以一种受约束扰动流体动态系统算法作为路径规划的基本方法,根据无人机与各障碍的相对状态以及障碍物类型,通过经深度确定性策略梯度算法训练得到的动作网络在线生成对应障碍的反应系数和方向系数,继而可计算相应的总和扰动矩阵并以此修正无人机的飞行路径,实现反应式避障.此外,还研究了与所提路径规划方法相适配的深度强化学习训练环境规范性建模方法.仿真结果表明,在路径质量大致相同的情况下,该方法在实时性方面明显优于基于预测控制的在线路径规划方法.

关 键 词:无人机  反应式路径规划  受约束扰动流体动态系统  深度强化学习  训练环境
收稿时间:2021-03-29

UAV Reactive Interfered Fluid Path Planning
Affiliation:1.School of Automation Science and Electrical Engineering, Beihang University, Beijing 1001912.Science and Technology on Space Intelligent Control Laboratory, Beijing Institute of Control Engineering, Beijing 1000943.Science and Technology on Aircraft Control Laboratory, Beihang University, Beijing 100191
Abstract:In this paper, aiming at complex 3D obstacle environments, a reactive interfered fluid path planning framework is proposed for unmanned aerial vehicles (UAV) based on deep reinforcement learning. The constrained interfered fluid dynamical system algorithm is used as the fundamental path planning method in the framework. According to relative states between unmanned aerial vehicles and each obstacle, and categories of obstacles, the reaction and direction coefficients of the corresponding obstacle are generated online using the actor networks trained by deep deterministic policy gradient. On this basis, the total modulation matrices in constrained interfered fluid dynamical system can be resolved and the flight path is accordingly modified to realize the reactive obstacle avoidance. In addition, the normative modeling method of deep reinforcement learning training environments, which is matched with the proposed path planning method, is studied. Finally, simulation results show that the proposed method is obviously superior to the online path planning method based on predictive control in real-time performance under the condition that the path qualities are approximately the same.
Keywords:
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号