首页 | 官方网站   微博 | 高级检索  
     

基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法
引用本文:贺俊杰,张洁,张朋,汪俊亮,郑鹏,王明. 基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法[J]. 中国机械工程, 2022, 33(3): 329-338. DOI: 10.3969/j.issn.1004-132X.2022.03.009
作者姓名:贺俊杰  张洁  张朋  汪俊亮  郑鹏  王明
作者单位:1.东华大学机械工程学院,上海,2016202.上海交通大学机械与动力工程学院,上海,200240
基金项目:国家重点研发计划(2019YFB1706300);东华大学青年教师科研启动基金
摘    要:针对等效并行机在线调度问题,以加权完工时间和为目标,提出了一种基于长短期记忆近端策略优化(LSTM-PPO)强化学习的在线调度方法。通过设计融合LSTM的智能体记录车间的历史状态变化和调度策略,进而根据状态信息进行在线调度。设计了车间状态矩阵对问题约束和优化目标进行描述,在调度决策中引入额外的设备等待指令来扩大解空间,并设计奖励函数将优化目标分解为分步奖励值实现调度决策评价。最后基于PPO算法进行模型更新和参数全局优化。实验结果表明所提方法优于现有的几种启发式规则,并将所提算法应用于实际车间的生产调度,有效减小了加权完工时间和。

关 键 词:等效并行机  在线调度  强化学习  长短期记忆近端策略优化  

Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning
HE Junjie,ZHANG Jie,ZHANG Peng,WANG Junliang,ZHENG Peng,WANG Ming. Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning[J]. China Mechanical Engineering, 2022, 33(3): 329-338. DOI: 10.3969/j.issn.1004-132X.2022.03.009
Authors:HE Junjie  ZHANG Jie  ZHANG Peng  WANG Junliang  ZHENG Peng  WANG Ming
Affiliation:1.School of Mechanical Engineering,Donghua University,Shanghai,2016202.School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai,200240
Abstract:To solve the related parallel machine online scheduling problems, the total weighted completion time was taken into account, and an online scheduling method was proposed based on LSTM-PPO reinforcement learning. A LSTM-integrated agent was designed to record the historical variations of workshop states and the corresponding scheduling policy adjustment, and then online scheduling decision was made according to the state information. Meanwhile, the workshop state matrix was designed to describe the problem constraints and optimization goals, additional machine waiting was introduced in scheduling action space to expand solution space, and the reward function was designed to decompose the optimization goal into step-by-step rewards to achieve scheduling decision evaluation. Finally, the model updating and global optimization of parameters was achieved by PPO algorithm. Experimental results show that the proposed method has competitive solutions than the existing heuristic rules, and the proposed algorithm is applied to the production scheduling of the actual workshops, which effectively reduces the total weighted completion time.
Keywords:related parallel machine   online scheduling   reinforcement learning   proximal policy optimization with long short-term memory(LSTM-PPO)  
点击此处可从《中国机械工程》浏览原始摘要信息
点击此处可从《中国机械工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号