动态环境下分布式异构多机器人避障方法研究 Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

动态环境下分布式异构多机器人避障方法研究

引用本文：	欧阳勇平¹,魏长赟¹,蔡帛良^1,2. 动态环境下分布式异构多机器人避障方法研究[J]. 智能系统学报, 2022, 17(4): 752-763. DOI: 10.11992/tis.202106044

作者姓名：	欧阳勇平¹ 魏长赟¹ 蔡帛良¹ 2

作者单位：	1. 河海大学机电工程学院，江苏常州 213022;2. 英国卡迪夫大学工学院，威尔士卡迪夫 CF10 3A

摘要：	多机器人系统在联合搜救、智慧车间、智能交通等领域得到了日益广泛的应用。目前，多个机器人之间、机器人与动态环境之间的路径规划和导航避障仍需依赖精确的环境地图，给多机器人系统在非结构环境下的协调与协作带来了挑战。针对上述问题，本文提出了不依赖精确地图的分布式异构多机器人导航避障方法，建立了基于深度强化学习的多特征策略梯度优化算法，并考虑了人机协同环境下的社会范式，使分布式机器人能够通过与环境的试错交互，学习最优的导航避障策略；并在Gazebo仿真环境下进行了最优策略的训练学习，同时将模型移植到多个异构实体机器人上，将机器人控制信号解码，进行真实环境测试。实验结果表明:本文提出的多特征策略梯度优化算法能够通过自学习获得最优的导航避障策略，为分布式异构多机器人在动态环境下的应用提供了一种技术参考。
关键词：	异构多机器人深度强化学习非结构环境多特征策略梯度动态避障自学习分布式控制控制策略
Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments

OUYANG Yongping¹,WEI Changyun¹,CAI Boliang^1,2. Collision avoidance approach for distributed heterogeneous multirobot systems in dynamic environments[J]. CAAL Transactions on Intelligent Systems, 2022, 17(4): 752-763. DOI: 10.11992/tis.202106044

Authors:	OUYANG Yongping¹ WEI Changyun¹ CAI Boliang¹ 2

Affiliation:	1. College of Mechanical and Electrical Engineering, Hohai University, Changzhou 213022, China;2. School of Engineering, Cardiff University, Cardiff CF10 3AT, UK

Abstract:	Multirobot systems have been widely used in cooperative search and rescue missions, intelligent warehouses, intelligent transportation, and other fields. At present, the path planning and collision avoidance problems between multiple robots and the dynamic environment still rely on accurate maps, which brings challenges to the coordination and cooperation of multirobot systems in unstructured environments. To address the above problem, this paper presents a navigation and collision avoidance approach that does not require accurate maps and is based on the deep reinforcement learning framework. A multifeatured policy gradients algorithm is proposed in this work, and social norms are also integrated so that the learning agent can obtain the optimal control policy via trial-and-error interactions with the environment. The optimal policy is trained and obtained in the Gazebo environment, and afterward, the optimal policy is transferred to several heterogeneous real robots by decoding the control signals. The experimental results show that the multifeature policy gradients algorithm proposed can obtain the optimal navigation collision avoidance policy through self-learning, and it provides a technical reference for the application of distributed heterogeneous multirobot systems in dynamic environments.

Keywords:	heterogeneous multi-robot systems deep reinforcement learning non-structural environment multi-feature policy gradients dynamic collision avoidance self-learning distributed control control policy

	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏