引入通信与探索的多智能体强化学习QMIX算法 Improved QMIX algorithm from communication and exploration for multi-agent reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

引入通信与探索的多智能体强化学习QMIX算法

引用本文：	邓晖奕,李勇振,尹奇跃.引入通信与探索的多智能体强化学习QMIX算法[J].计算机应用,2023,43(1):202-208.

作者姓名：	邓晖奕李勇振尹奇跃

作者单位：	北京建筑大学电气与信息工程学院，北京 102616 厦门大学自动化系，福建厦门 361002 中国科学院自动化研究所，北京 100190

基金项目：	北京高等学校高水平人才交叉培养“实培计划”项目;北京建筑大学2022年度青年教师科研能力提升计划项目(X22022)

摘要：	非平稳性问题是多智能体环境中深度学习面临的主要挑战之一，它打破了大多数单智能体强化学习算法都遵循的马尔可夫假设，使每个智能体在学习过程中都有可能会陷入由其他智能体所创建的环境而导致无终止的循环。为解决上述问题，研究了中心式训练分布式执行（CTDE）架构在强化学习中的实现方法，并分别从智能体间通信和智能体探索这两个角度入手，采用通过方差控制的强化学习算法（VBC）并引入好奇心机制来改进QMIX算法。通过星际争霸Ⅱ学习环境（SC2LE）中的微操场景对所提算法加以验证。实验结果表明，与QMIX算法相比，所提算法的性能有所提升，并且能够得到收敛速度更快的训练模型。
关键词：	多智能体环境深度强化学习中心式训练分布式执行架构好奇心机制智能体通信
收稿时间：	2021-11-08
修稿时间：	2022-05-26
Improved QMIX algorithm from communication and exploration for multi-agent reinforcement learning

Huiyi DENG,Yongzhen LI,Qiyue YIN.Improved QMIX algorithm from communication and exploration for multi-agent reinforcement learning[J].journal of Computer Applications,2023,43(1):202-208.

Authors:	Huiyi DENG Yongzhen LI Qiyue YIN

Affiliation:	School of Electrical and Information Engineering，Beijing University of Civil Engineering and Architecture，Beijing 102616，China Department of Automation，Xiamen University，Xiamen Fujian 361002，China Institute of Automation，Chinese Academy of Sciences，Beijing 100190，China

Abstract:	Non-stationarity that breaks the Markov assumption followed by most single-agent reinforcement learning algorithms is one of the main challenges in multi-agent environment， making each agent may be caught in an infinite loop caused by the environment created by the other agents during the learning process. To solve above problem， the implementation method of Centralized Training with Decentralized Execution （CTDE） structure in reinforcement learning was studied， and from two perspectives of agent communication and exploration， the QMIX algorithm was improved by introducing a Variance Control-Based （VBC） communication model and a curiosity mechanism. The proposed algorithm was validated in micro control scenarios of StarCraft Ⅱ Learning Environment （SC2LE）. Experimental results show that the proposed algorithm can improve the performance and obtain a training model with higher convergence speed compared to QMIX algorithm.

Keywords:	multi-agent environment deep reinforcement learning Centralized Training with Decentralized Execution (CTDE) structure curiosity mechanism agent communication

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏