多智能体深度强化学习及其可扩展性与可迁移性研究综述 A survey on scalability and transferability of multi-agent deep reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

多智能体深度强化学习及其可扩展性与可迁移性研究综述

引用本文：	闫超,相晓嘉,徐昕,王菖,周晗,沈林成.多智能体深度强化学习及其可扩展性与可迁移性研究综述[J].控制与决策,2022,37(12):3083-3102.

作者姓名：	闫超相晓嘉徐昕王菖周晗沈林成

作者单位：	国防科技大学智能科学学院,长沙 410073

基金项目：	科技创新2030-“新一代人工智能”重大项目(2020AAA0108200)；国家自然科学基金项目(61825305, 61906203, 61803377)；湖南省研究生科研创新项目(CX20210001).

摘要：	得益于深度学习强大的特征表达能力和强化学习有效的策略学习能力,深度强化学习在一系列复杂序贯决策问题中取得了令人瞩目的成就.伴随着深度强化学习在诸多单智能体任务中的成功应用,其在多智能体系统中的研究方兴未艾.近年来,多智能体深度强化学习在人工智能领域备受关注,可扩展与可迁移性已成为其中的核心研究点之一.鉴于此,首先阐释深度强化学习的发展脉络和典型算法,介绍多智能体深度强化学习的3种学习范式,分析两类多智能体强化学习的典型算法,即分解值函数方法和中心化值函数方法;然后归纳注意力机制、图神经网络等6类具有可扩展性的多智能体深度强化学习模型,梳理迁移学习和课程学习在多智能体深度强化学习可迁移性方向的研究进展;最后讨论多智能体深度强化学习的应用前景与研究方向,为未来多智能体深度强化学习的进一步发展提供可借鉴的参考.
关键词：	深度强化学习多智能体系统迁移学习课程学习可扩展性可迁移性
A survey on scalability and transferability of multi-agent deep reinforcement learning

YAN Chao,XIANG Xiao-ji,XU Xin,WANG Chang,ZHOU Han,SHEN Lin-cheng.A survey on scalability and transferability of multi-agent deep reinforcement learning[J].Control and Decision,2022,37(12):3083-3102.

Authors:	YAN Chao XIANG Xiao-ji XU Xin WANG Chang ZHOU Han SHEN Lin-cheng

Affiliation:	College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China

Abstract:	Due to the powerful feature representation capability of deep learning and the effective policy learning capability of reinforcement learning(RL), deep reinforcement learning(DRL) has made remarkable achievements in a series of complex sequential decision-making problems. With the popularity of DRL in many single-agent tasks, its application in multi-agent systems is flourishing. Recently, multi-agent deep reinforcement learning(MADRL) has attracted increasing attention in the field of artificial intelligence, and the scalability and transferability have become one of the important issues. This paper first describes the development process and typical algorithms of DRL. Then, three types of learning paradigms of MADRL are introduced, and two typical classes of cooperative MADRL algorithms are analyzed, i.e., the value function decomposition approach and the centralized value function approach. In addition, we summarize six types of scalable MADRL models such as attention mechanisms and graph neural networks, and investigate the research progress of transfer learning and curriculum learning in the transferability of MADRL. Finally, we discuss the application prospects and research directions of MADRL, providing some insights for the further development of MADRL in the future.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏