首页 | 官方网站   微博 | 高级检索  
     

基于值分解的多目标多智能体深度强化学习方法
引用本文:宋健,王子磊. 基于值分解的多目标多智能体深度强化学习方法[J]. 计算机工程, 2023, 49(1): 31-40. DOI: 10.19678/j.issn.1000-3428.0063303
作者姓名:宋健  王子磊
作者单位:中国科学技术大学 自动化系, 合肥 230027
基金项目:国家自然科学基金重点项目“时空结构知识引导的视频语义分析模型及学习方法”(62176246)。
摘    要:多智能体深度强化学习方法可应用于真实世界中需要多方协作的场景,是强化学习领域内的研究热点。在多目标多智能体合作场景中,各智能体之间具有复杂的合作与竞争并存的混合关系,在这些场景中应用多智能体强化学习方法时,其性能取决于该方法是否能够充分地衡量各智能体之间的关系、区分合作和竞争动作,同时也需要解决高维数据的处理以及算法效率等应用难点。针对多目标多智能体合作场景,在QMIX模型的基础上提出一种基于目标的值分解深度强化学习方法,并使用注意力机制衡量智能体之间的群体影响力,利用智能体的目标信息实现量两阶段的值分解,提升对复杂智能体关系的刻画能力,从而提高强化学习方法在多目标多智能体合作场景中的性能。实验结果表明,相比QMIX模型,该方法在星际争霸2微观操控平台上的得分与其持平,在棋盘游戏中得分平均高出4.9分,在多粒子运动环境merge和cross中得分分别平均高出25分和280.4分,且相较于主流深度强化学习方法也具有更高的得分与更好的性能表现。

关 键 词:深度强化学习  多智能体  多目标  值分解  注意力机制
收稿时间:2021-11-21
修稿时间:2022-02-12

Multi-Goal Multi-Agent Deep Reinforcement Learning Method Based on Value Decomposition
SONG Jian,WANG Zilei. Multi-Goal Multi-Agent Deep Reinforcement Learning Method Based on Value Decomposition[J]. Computer Engineering, 2023, 49(1): 31-40. DOI: 10.19678/j.issn.1000-3428.0063303
Authors:SONG Jian  WANG Zilei
Affiliation:Department of Automation, University of Science and Technology of China, Hefei 230027, China
Abstract:Multi-agent deep reinforcement learning method can be used in scenarios that require multi-party cooperation in the real world, which remains a challenge in the field of reinforcement learning.In these scenarios, agents usually have complex relationships with each other, including both cooperation and competition.The performance of multi-agent reinforcement learning method depends on whether the method can correctly assess the relationships among agents and distinguish cooperative and competitive actions.Additionally, it also faces the efficiency problem related to high-dimensional data processing.Focused on multi-goal multi-agent cooperation scenarios, this paper proposes a deep multi-goal multi-agent reinforcement learning method, using value function factorization based on QMIX. In this method, agents' goals and the attention mechanism are utilized to measure the social influence among them, which leads to an improved characterization of complexity of their relationships.The experimental results show that the reward scores are almost the same, 4.9 higher, 25 higher and 280.4 higher than QMIX's in StarCraft 2 platform, checker game, and the merge and cross map of multi-particle environment.The method presented here shows higher scores and better performance compared to other mainstream deep reinforcement learning methods in the representative scenarios.
Keywords:deep reinforcement learning  multi-agent  multi-goal  value decomposition  attention mechanism  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号