首页 | 官方网站   微博 | 高级检索  
     

基于深度强化学习的自动化集装箱码头集成调度方法
引用本文:尹星,张煜,郑倩倩,唐可心.基于深度强化学习的自动化集装箱码头集成调度方法[J].交通信息与安全,2022,40(6):81-91.
作者姓名:尹星  张煜  郑倩倩  唐可心
作者单位:1.武汉理工大学交通与物流工程学院 武汉 430063
基金项目:国家自然科学基金项目72174160
摘    要:针对自动化集装箱码头卸货过程中岸桥、智能运输机器人和场桥设备交互作业, 实际调度环境复杂多变等问题, 以最小化最大完工时间为目标, 构建基于混合流水车间的三阶段集装箱码头集成调度模型, 为解决自动化码头调度环境动态性强的特点, 使用1种深度强化学习算法(DDQN)进行求解。依据码头实际调度情况, 使用神经网络实时拟合动作-值函数, 把各阶段设备状态数据输入模型, 采用经验回放机制训练模型, 把单一启发式规则加复合启发式规则作为设备候选行为, 通过强化学习动作选择与动作评估机制, 得到最优的集装箱-设备组合策略, 并与精确算法和常用的几种元启发式策略进行对比分析。结果表明: 较大规模算例下, 与目前较为先进的粒子群算法相比, 所提方法的总作业时间平均降低了7.84%, 与理论下界值的差距分别为6.0%, 5.6%, 4.6%, 三阶段设备负载较为均衡, 设备平均利用率为89%, 满足实际应用需求; 小规模算例下, 与Gurobi求解器的总完工时间平均误差为1.99%, 且随着算例规模增加, 所提算法在求解时间上显现出一定的优势, 求解时间最大提升59%, 验证了所提方法对于提升自动化集装箱码头运行效率的可行性和高效性。 

关 键 词:智能交通    自动化集装箱码头    三阶段集成调度    深度强化学习    混合流水车间
收稿时间:2022-06-13

A Study of Integrated Scheduling of Automated Container Terminal Based on DDQN
Affiliation:1.School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China2.Inland Port and Shipping Industry Research Co. Ltd. of Guangdong Province, Shaoguan 512000, Guangdong, China
Abstract:The interactive operations of quay cranes, artificial intelligent robots of transportation(ARTs), and yard cranes during automatic container terminal unloading are studied. A three-stages integrated scheduling model of automated container terminal based on hybrid flow shop scheduling problem is proposed, with the criterion of minimizing the makespan. In addition, the scheduling environment requires high real-time response. A deep reinforcement learning algorithm, namely double deep Q-network(DDQN), is used to solve the problem of dynamic characteristics of the automatic terminal scheduling environment. The input of the model is the real-time status data of the equipment at each stage. The neural network is used to fit the value-action function. The model is trained by experience playback mechanism. The single heuristic rule with the compound heuristic rule is taken as the equipment candidate behavior. By strengthening the learning action selection and action evaluation mechanism, the optimal container equipment combination strategy is obtained. According to the actual survey data of Tianjin Port Automation Terminal, different scales cases are designed for experimental comparison and analysis. The results show that: the total operation time of the proposed method is reduced by 7.84% on average compared with the current advanced particle swarm optimization algorithm, and the gap with the theoretical lower bound value is 6.0%, 5.6%, and 4.6%, respectively. In addition, the equipment loading in the three stages is relatively balanced. And the average utilization rate of equipment is 89%, which can meet the actual application requirements. In small-scale examples, the average error of the total completion time obtained by DDQN is 1.99% compared with Gurobi. With the increase of the size of the example, the solving time is increased by 59% at most, which verifies the feasibility and efficiency of the proposed method for improving the operation efficiency of the automated container terminal. 
Keywords:
点击此处可从《交通信息与安全》浏览原始摘要信息
点击此处可从《交通信息与安全》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号