基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法 Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法

引用本文：	贺俊杰,张洁,张朋,汪俊亮,郑鹏,王明. 基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法[J]. 中国机械工程, 2022, 33(3): 329-338. DOI: 10.3969/j.issn.1004-132X.2022.03.009

作者姓名：	贺俊杰张洁张朋汪俊亮郑鹏王明

作者单位：	1.东华大学机械工程学院，上海,2016202.上海交通大学机械与动力工程学院，上海,200240

基金项目：	国家重点研发计划（2019YFB1706300）；东华大学青年教师科研启动基金

摘要：	针对等效并行机在线调度问题，以加权完工时间和为目标，提出了一种基于长短期记忆近端策略优化（LSTM-PPO）强化学习的在线调度方法。通过设计融合LSTM的智能体记录车间的历史状态变化和调度策略，进而根据状态信息进行在线调度。设计了车间状态矩阵对问题约束和优化目标进行描述，在调度决策中引入额外的设备等待指令来扩大解空间，并设计奖励函数将优化目标分解为分步奖励值实现调度决策评价。最后基于PPO算法进行模型更新和参数全局优化。实验结果表明所提方法优于现有的几种启发式规则，并将所提算法应用于实际车间的生产调度，有效减小了加权完工时间和。
关键词：	等效并行机在线调度强化学习长短期记忆近端策略优化
Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning

HE Junjie,ZHANG Jie,ZHANG Peng,WANG Junliang,ZHENG Peng,WANG Ming. Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning[J]. China Mechanical Engineering, 2022, 33(3): 329-338. DOI: 10.3969/j.issn.1004-132X.2022.03.009

Authors:	HE Junjie ZHANG Jie ZHANG Peng WANG Junliang ZHENG Peng WANG Ming

Affiliation:	1.School of Mechanical Engineering,Donghua University,Shanghai,2016202.School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai,200240

Abstract:	To solve the related parallel machine online scheduling problems, the total weighted completion time was taken into account, and an online scheduling method was proposed based on LSTM-PPO reinforcement learning. A LSTM-integrated agent was designed to record the historical variations of workshop states and the corresponding scheduling policy adjustment, and then online scheduling decision was made according to the state information. Meanwhile, the workshop state matrix was designed to describe the problem constraints and optimization goals, additional machine waiting was introduced in scheduling action space to expand solution space, and the reward function was designed to decompose the optimization goal into step-by-step rewards to achieve scheduling decision evaluation. Finally, the model updating and global optimization of parameters was achieved by PPO algorithm. Experimental results show that the proposed method has competitive solutions than the existing heuristic rules, and the proposed algorithm is applied to the production scheduling of the actual workshops, which effectively reduces the total weighted completion time.

Keywords:	related parallel machine online scheduling reinforcement learning proximal policy optimization with long short-term memory(LSTM-PPO)

	点击此处可从《中国机械工程》浏览原始摘要信息
	点击此处可从《中国机械工程》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏