基于深度强化学习的多机协同空战方法研究 Research on Multi-aircraft Cooperative Air Combat Method Based on Deep Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度强化学习的多机协同空战方法研究

引用本文：	施伟,冯旸赫,程光权,黄红蓝,黄金才,刘忠,贺威.基于深度强化学习的多机协同空战方法研究[J].自动化学报,2021,47(7):1610-1623.

作者姓名：	施伟冯旸赫程光权黄红蓝黄金才刘忠贺威

作者单位：	1.国防科技大学系统工程学院长沙 410073

基金项目：	国家自然科学基金(71701205, 62073333)资助

摘要：	多机协同是空中作战的关键环节, 如何处理多实体间复杂的协作关系、实现多机协同空战的智能决策是亟待解决的问题. 为此, 提出基于深度强化学习的多机协同空战决策流程框架(Deep-reinforcement-learning-based multi-aircraft cooperative air combat decision framework, DRL-MACACDF), 并针对近端策略优化(Proximal policy optimization, PPO)算法, 设计4种算法增强机制, 提高多机协同对抗场景下智能体间的协同程度. 在兵棋推演平台上进行的仿真实验, 验证了该方法的可行性和实用性, 并对对抗过程数据进行了可解释性复盘分析, 研讨了强化学习与传统兵棋推演结合的交叉研究方向.
关键词：	多机协同空战智能决策深度强化学习 PPO算法增强机制
收稿时间：	2020-12-24
Research on Multi-aircraft Cooperative Air Combat Method Based on Deep Reinforcement Learning

Affiliation:	1.College of Systems Engineering, National University of Defense Technology, Changsha 4100732.Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 1000833.School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083

Abstract:	Multi-aircraft cooperation is the key part of air combat, and how to deal with the complex cooperation relationship between multi-entities is the essential problem to be solved urgently. In order to solve the problem of intelligent decision-making in multi-aircraft cooperative air combat, a deep-reinforcement-learning-based multi-aircraft cooperative air combat decision framework (DRL-MACACDF) is proposed in this paper. Based on proximal policy optimization (PPO), four algorithm enhancement mechanisms are designed to improve the synergistic degree of agents in multi-aircraft cooperative confrontation scenarios. The feasibility and practicability of the method are verified by the simulation on the wargame platform, and the interpretable review analysis of the antagonistic process data is carried out, and the cross research direction of the combination of reinforcement learning and traditional wargame deduction is discussed.

Keywords:

	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏