首页 | 官方网站   微博 | 高级检索  
     

一种基于梯度的多智能体元深度强化学习算法
引用本文:赵春宇,赖俊,陈希亮,张人文.一种基于梯度的多智能体元深度强化学习算法[J].计算机应用研究,2024,41(5).
作者姓名:赵春宇  赖俊  陈希亮  张人文
作者单位:陆军工程大学 指挥控制工程学院,陆军工程大学 指挥控制工程学院,陆军工程大学 指挥控制工程学院,陆军工程大学 指挥控制工程学院
基金项目:国家自然科学基金资助项目(61806221)
摘    要:多智能体系统在自动驾驶、智能物流、医疗协同等多个领域中广泛应用,然而由于技术进步和系统需求的增加,这些系统面临着规模庞大、复杂度高等挑战,常出现训练效率低和适应能力差等问题。为了解决这些问题,将基于梯度的元学习方法扩展到多智能体深度强化学习中,提出一种名为多智能体一阶元近端策略优化(MAMPPO)方法,用于学习多智能体系统的初始模型参数,从而为提高多智能体深度强化学习的性能提供新的视角。该方法充分利用多智能体强化学习过程中的经验数据,通过反复适应找到在梯度下降方向上最敏感的参数并学习初始参数,使模型训练从最佳起点开始,有效提高了联合策略的决策效率,显著加快了策略变化的速度,面对新情况的适应速度显著加快。在星际争霸II上的实验结果表明,MAMPPO方法显著提高了训练速度和适应能力,为后续提高多智能强化学习的训练效率和适应能力提供了一种新的解决方法。

关 键 词:元学习    深度强化学习    梯度下降    多智能体深度强化学习
收稿时间:2023/9/20 0:00:00
修稿时间:2024/4/10 0:00:00

Gradient-based multi-agent meta deep reinforcement learning algorithm
ZHAO Chunyu,LAI Jun,CHEN Xiliang and ZHANG Renwen.Gradient-based multi-agent meta deep reinforcement learning algorithm[J].Application Research of Computers,2024,41(5).
Authors:ZHAO Chunyu  LAI Jun  CHEN Xiliang and ZHANG Renwen
Affiliation:College of Command Information System,Army Engineering University,,,
Abstract:Multi-agent systems have a wide range of applications in many fields, such as autonomous driving, intelligent logistics, and medical collaboration, etc. However, due to technological advances and increased system requirements, these systems face challenges such as large scale and high complexity, and often suffer from inefficient training and poor adaptability. To address these problems, this paper proposed a multi-agent first-order meta proximal policy optimization(MAMPPO) method by extending gradient-based meta-learning to multi-agent deep reinforcement learning. The method learned the initial model parameters in the multi-agent system to provide a new perspective for improving the performance of multi-agent deep reinforcement learning. It made full use of the previous experience in the process of multi-agent reinforcement learning to find the most sensitive parameters in the direction of gradient descent through repeated adaptation, and learned the initial parameters so that the model training starts from the optimal starting point. This method effectively improved the decision-making efficiency of the joint policy, and led to a significant increase in the speed of its policy change, which significantly accelerated the speed of adaptation in the face of a new situation. Experimental results on Starcraft II show that the MAMPPO method can significantly improve the training speed and adaptability, which provides a new solution for the subsequent improvement of the training efficiency and adaptability of multi-agent reinforcement learning.
Keywords:meta learning  deep reinforcement learning  gradient descent  multi-agent deep reinforcement learning
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号