首页 | 官方网站   微博 | 高级检索  
     

融合两级注意力的多机器人强化学习导航
引用本文:张耀丹,况立群,焦世超,韩慧妍,薛红新.融合两级注意力的多机器人强化学习导航[J].计算机系统应用,2023,32(12):43-51.
作者姓名:张耀丹  况立群  焦世超  韩慧妍  薛红新
作者单位:中北大学 计算机科学与技术学院, 太原 030051;中北大学 计算机科学与技术学院, 太原 030051;机器视觉与虚拟现实山西省重点实验室(中北大学), 太原 030051;山西省视觉信息处理及智能机器人工程研究中心, 太原 030051
基金项目:国家自然科学基金(62272426,62106238);山西省科技重大专项计划(202201150401021);山西省科技成果转化引导专项(202104021301055);山西省回国留学人员科研资助项目(2020-113);山西省基础研究计划(202203021222027)
摘    要:针对多智能体强化学习中因智能体之间的复杂关系所导致的学习效率低及收敛速度慢的问题, 提出基于两级注意力机制的方法MADDPG-Attention, 在MADDPG算法的Critic网络中增加了软硬两级注意力机制, 通过注意力机制学习智能体之间的可借鉴经验, 提升智能体之间的相互学习效率. 由于单层的软注意力机制会给完全不相关的智能体也赋予学习权重, 因此采用硬注意力判断两个智能体之间学习的必要性, 裁减无关信息的智能体, 再用软注意力判断两个智能体间学习的重要性, 按重要性分布来分配学习权重, 据此向有可用经验的智能体学习. 在多智能体粒子的合作导航环境上进行测试, 实验结果表明, MADDPG-Attention算法对复杂关系的理解更为清晰, 在3种环境的导航成功率都达到了90%以上, 有效提高了学习效率, 加快了收敛速度.

关 键 词:多智能体强化学习  导航  MADDPG  硬注意力  软注意力
收稿时间:2023/5/25 0:00:00
修稿时间:2023/6/26 0:00:00

Multi-robot Reinforcement Learning Navigation Incorporating Two Levels of Attention
ZHANG Yao-Dan,KUANG Li-Qun,JIAO Shi-Chao,HAN Hui-Yan,XUE Hong-Xin.Multi-robot Reinforcement Learning Navigation Incorporating Two Levels of Attention[J].Computer Systems& Applications,2023,32(12):43-51.
Authors:ZHANG Yao-Dan  KUANG Li-Qun  JIAO Shi-Chao  HAN Hui-Yan  XUE Hong-Xin
Abstract:To solve the low learning efficiency and slow convergence due to the complex relationship among intelligent agents in multi-agent reinforcement learning, this study proposes a two-level attention mechanism based on MADDPG-Attention. The mechanism adds soft and hard attention mechanisms to the Critic network of the MADDPG algorithm and learns the learnable experience among intelligent agents through the attention mechanism to improve the mutual learning efficiency of the agents. Since the single-level soft attention mechanism assigns learning weights to completely irrelevant intelligent agents, hard attention is employed to determine the necessity of learning between two intelligent agents, and the agents with irrelevant information are cut. Then soft attention is adopted to determine the importance of learning between two intelligent agents, and the learning weights are assigned according to the importance distribution to learn from the agents with available experience. Meanwhile, tests on a collaborative navigation environment with multi-agent particles show that the MADDPG-Attention algorithm has a clearer understanding of complex relationships and achieves a success rate of more than 90% in all three environments, which improves the learning efficiency and accelerates the convergence rate.
Keywords:multi-agent reinforcement learning  navigation  MADDPG  hard attention  soft attention
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号