大量需求点下基于深度Q学习的受损路网抢修队调度 Repair crew scheduling for damaged road network with enormous demand points using deep Q-learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

大量需求点下基于深度Q学习的受损路网抢修队调度

引用本文：	张国富,常加远,苏兆品,沈宇锋.大量需求点下基于深度Q学习的受损路网抢修队调度[J].控制与决策,2022,37(12):3267-3277.

作者姓名：	张国富常加远苏兆品沈宇锋

作者单位：	合肥工业大学计算机与信息学院,合肥 230601;合肥工业大学大数据知识工程教育部重点实验室, 合肥 230601;合肥工业大学智能互联系统安徽省实验室,合肥 230009;合肥工业大学工业安全应急技术安徽省重点实验室,合肥 230601

基金项目：	安徽省重点研究与开发计划项目(202004d07020011,202104d07020001)；中国工程院战略咨询重点项目(2020-XZ-3)；教育部人文社会科学研究青年基金项目(19YJC870021)；广东省类脑智能计算重点实验室开放课题项目(GBL202117)；中央高校基本科研业务费专项资金项目(PA2020GDKC0015, PA2021GDSK0073,PA2021GDSK0074).

摘要：	受损路网抢修是重特大自然灾害发生后开展应急处置和救援的一个基本前提,主要研究如何对道路抢修队进行合理的调度以快速恢复路网畅通、保障救援队伍和应急物资从出救点及时输送到各需求点.鉴于已有研究在面向大量需求点时往往很难给出有效的调度策略,首先基于路网模型和马尔科夫决策过程分析抢修队修复受损路网的关键因素,并设计一种双反馈回报函数;然后基于深度Q学习求解抢修队的最优调度策略;最后通过对比实验结果表明,在大量需求点环境下,所提出方法具有较好的稳定性和可靠性,兼顾受损路网的修复效率和运输效率,能够以更少的修复代价令所有需求点可达,为灾后复杂应急场景下的受损路网抢修提供有益的尝试.
关键词：	应急处置和救援路网抢修大量需求点抢修队调度双反馈回报函数深度Q学习
Repair crew scheduling for damaged road network with enormous demand points using deep Q-learning

ZHANG Guo-fu,CHANG Jia-yuan,SU Zhao-pin,SHEN Yu-feng.Repair crew scheduling for damaged road network with enormous demand points using deep Q-learning[J].Control and Decision,2022,37(12):3267-3277.

Authors:	ZHANG Guo-fu CHANG Jia-yuan SU Zhao-pin SHEN Yu-feng

Affiliation:	School of Computer Science and Information Engineering,Hefei University of Technology,Hefei 230601,China;Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education,Hefei University of Technology,Hefei 230601,China

Abstract:	Repairing the damaged road network, which mainly focuses on how to reasonably schedule the repair crew to quickly unblock the road network and ensure that rescue teams and emergency resources in the source node can be delivered to different demand nodes in time, is a basic premise for emergency disposal and rescue after the occurrence of extraordinarily serious natural disasters. However, it is difficult for the existing methods to find a feasible scheduling strategy under enormous demand nodes. Therefor the key factors of repairing the damaged road network are first analyzed according to the road network model and the Markov decision-making process, based on which a double-feedback reward function is designed. Then, the deep Q-learning is utilized to solve the optimal scheduling strategy of the repair crew. Finally, comprehensive experimental studies show that for the damaged road network with enormous demand points, the proposed method has high stability and reliability, can achieve a good balance between the repair efficiency and the transportation efficiency, and can make all the demand points achievable with less repair cost, which may provide a useful attempt to repair the damaged road network in complex emergency scenarios of post-disaster.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏