首页 | 官方网站   微博 | 高级检索  
     

基于优先经验回放可迁移深度强化学习的高铁调度
引用本文:代学武,吴越,石琦,崔东亮,俞胜平.基于优先经验回放可迁移深度强化学习的高铁调度[J].控制与决策,2023,38(8):2375-2388.
作者姓名:代学武  吴越  石琦  崔东亮  俞胜平
作者单位:东北大学 流程工业综合自动化国家重点实验室,沈阳 110819
基金项目:国家自然科学基金项目(61790574).
摘    要:高铁行车调度是一个复杂的多阶段序列决策问题,需要考虑列车、线路设备等条件,且决策空间随问题规模的增大呈指数增长.而深度强化学习(DQN)兼备强大的搜索和学习能力,为高铁调度提供了新的解决方案,但存在经验利用效率低、迁移能力差等问题.本文提出一种基于优先经验回放可迁移深度强化学习的高铁调度方法.将包含股道运用计划等约束的高铁调度问题构建为多阶段序列决策过程,为提高算法的迁移能力,提出一种新的支持源域和目标域共享的状态向量和动作空间.为提高经验的利用效率和算法的收敛速度,设计了一种融合优先经验回放的深度Q网络训练方法.以徐兰线小规模案例为源域问题的经验学习实验表明,所提算法的经验利用效率和算法收敛速度优于传统DQN算法,并可适当增大优先级指数和调节权重参数以改善其收敛性能.以京沪线繁忙路段的晚点案例为目标域问题,本文提出的在线决策算法相比于经典的混合整数规划算法,决策时间平均减少约75$%$,且在近77$%$的案例中,总晚点时间的性能损失在15$%$以内.

关 键 词:高速铁路  调度算法  深度强化学习  状态向量  动作空间  优先经验回放

A transferable deep reinforcement learning high-speed railway rescheduling method based on prioritized experience replay
DAI Xue-wu,WU Yue,SHI Qi,CUI Dong-liang,YU Sheng-ping.A transferable deep reinforcement learning high-speed railway rescheduling method based on prioritized experience replay[J].Control and Decision,2023,38(8):2375-2388.
Authors:DAI Xue-wu  WU Yue  SHI Qi  CUI Dong-liang  YU Sheng-ping
Affiliation:State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China
Abstract:High-speed railway train operation rescheduling is a complex multi-stage sequential decision problem which requires consideration of trains, line equipment and other conditions, and the decision space grows exponentially with the scale of the problem. Deep reinforcement learning(DQN), which combines powerful search and learning capabilities, provides a new solution for high-speed railway rescheduling, but has problems such as inefficient use of experience and poor transferability. This paper proposes a transferable deep reinforcement learning high-speed railway rescheduling method based on prioritized experience replay. The high-speed railway rescheduling problem which contains constraints such as the track utilization plan is constructed as a multi-stage sequential decision-making process, and in order to improve the transferability of the algorithm, a new state vector and action space that supports the sharing of source and target domains is proposed. A deep Q-network training method combining prioritized experience replay for high-speed railway rescheduling is designed to improve the efficiency of experience utilization and the convergence speed of the algorithm. The experience learning experiments using the small-scale cases of Xulan line as the source domain problems show that the experience utilization efficiency and algorithm convergence speed of the proposed algorithm are better than that of the traditional DQN algorithm, and the convergence performance can be improved by appropriately increasing the priority exponent and adjusting the weight parameters. Taking the delay cases in the busy section of Jinghu line as the target domain problems, the online decision making algorithm proposed in this paper reduces the decision time by about 75$%$ on average compared with the classical mixed integer programming algorithm, and the performance loss of the total delay time is within 15$%$ in nearly 77$%$ of the cases.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号