首页 | 官方网站   微博 | 高级检索  
     

动态环境下分布式异构多机器人避障方法研究
引用本文:欧阳勇平1,魏长赟1,蔡帛良1,2. 动态环境下分布式异构多机器人避障方法研究[J]. 智能系统学报, 2022, 17(4): 752-763. DOI: 10.11992/tis.202106044
作者姓名:欧阳勇平1  魏长赟1  蔡帛良1  2
作者单位:1. 河海大学 机电工程学院,江苏 常州 213022;2. 英国卡迪夫大学 工学院,威尔士 卡迪夫 CF10 3A
摘    要:多机器人系统在联合搜救、智慧车间、智能交通等领域得到了日益广泛的应用。目前,多个机器人之间、机器人与动态环境之间的路径规划和导航避障仍需依赖精确的环境地图,给多机器人系统在非结构环境下的协调与协作带来了挑战。针对上述问题,本文提出了不依赖精确地图的分布式异构多机器人导航避障方法,建立了基于深度强化学习的多特征策略梯度优化算法,并考虑了人机协同环境下的社会范式,使分布式机器人能够通过与环境的试错交互,学习最优的导航避障策略;并在Gazebo仿真环境下进行了最优策略的训练学习,同时将模型移植到多个异构实体机器人上,将机器人控制信号解码,进行真实环境测试。实验结果表明:本文提出的多特征策略梯度优化算法能够通过自学习获得最优的导航避障策略,为分布式异构多机器人在动态环境下的应用提供了一种技术参考。

关 键 词:异构多机器人  深度强化学习  非结构环境  多特征策略梯度  动态避障  自学习  分布式控制  控制策略

Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments
OUYANG Yongping1,WEI Changyun1,CAI Boliang1,2. Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J]. CAAL Transactions on Intelligent Systems, 2022, 17(4): 752-763. DOI: 10.11992/tis.202106044
Authors:OUYANG Yongping1  WEI Changyun1  CAI Boliang1  2
Affiliation:1. College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China;2. School of Engineering, Cardiff University, Cardiff CF10 3AT, UK
Abstract:Multirobot systems have been widely used in cooperative search and rescue missions, intelligent warehouses, intelligent transportation, and other fields. At present, the path planning and collision avoidance problems between multiple robots and the dynamic environment still rely on accurate maps, which brings challenges to the coordination and cooperation of multirobot systems in unstructured environments. To address the above problem, this paper presents a navigation and collision avoidance approach that does not require accurate maps and is based on the deep reinforcement learning framework. A multifeatured policy gradients algorithm is proposed in this work, and social norms are also integrated so that the learning agent can obtain the optimal control policy via trial-and-error interactions with the environment. The optimal policy is trained and obtained in the Gazebo environment, and afterward, the optimal policy is transferred to several heterogeneous real robots by decoding the control signals. The experimental results show that the multifeature policy gradients algorithm proposed can obtain the optimal navigation collision avoidance policy through self-learning, and it provides a technical reference for the application of distributed heterogeneous multirobot systems in dynamic environments.
Keywords:heterogeneous multi-robot systems   deep reinforcement learning   non-structural environment   multi-feature policy gradients   dynamic collision avoidance   self-learning   distributed control   control policy
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号