首页 | 官方网站   微博 | 高级检索  
     

带平衡约束矩形布局优化问题的深度强化学习算法
引用本文:徐义春,万书振,董方敏.带平衡约束矩形布局优化问题的深度强化学习算法[J].计算机应用研究,2022,39(1):146-150.
作者姓名:徐义春  万书振  董方敏
作者单位:三峡大学 计算机与信息学院,湖北 宜昌443002
基金项目:国家自然科学基金—新疆联合基金资助项目(U1703261)。
摘    要:带平衡约束的矩形布局问题源于卫星舱设备布局设计,属于组合优化问题。深度强化学习利用奖赏机制,通过数据训练实现高性能决策优化。针对布局优化问题,提出一种基于深度强化学习的新算法DAR及其扩展算法IDAR。DAR用指针网络输出定位顺序,再利用定位机制给出布局结果,算法的时间复杂度是O(n3);IDAR算法在DAR的基础上引入迭代机制,算法时间复杂度是O(n4),但能给出更好的结果。测试表明DAR算法具有较好的学习能力,用小型布局问题进行求解训练所获得的模型,能有效应用在大型问题上。在两个大规模典型算例的对照实验中,提出算法分别超出和接近目前最优解,具有时间和质量上的优势。

关 键 词:布局优化问题  指针网络  强化学习  深度学习
收稿时间:2021/5/19 0:00:00
修稿时间:2021/12/19 0:00:00

Deep reinforcement learning algorithm for the rectangle layout optimization with equilibrium constraint
Xu Yichun,Wan Shuzhen,Dong Fangmin.Deep reinforcement learning algorithm for the rectangle layout optimization with equilibrium constraint[J].Application Research of Computers,2022,39(1):146-150.
Authors:Xu Yichun  Wan Shuzhen  Dong Fangmin
Affiliation:(College of Computer&Information Technology,China Three Gorges University,Yichang Hubei 443002,China)
Abstract:The rectangular layout optimization problem with balance constraints is derived from the layout design of satellite module equipment. It belongs to combinatorial optimization problem. Deep reinforcement learning uses reward mechanism to realize high-performance decision optimization through data training. This paper proposed a new deep reinforcement learning algorithm DAR and its extension IDAR. The DAR algorithm output the optimized location sequence with a pointer network, and then used the positioning mechanism to give the layout results. The training of pointer network was realized by deep reinforcement learning. The time complexity of the DAR algorithm was O(n3). The IDAR algorithm was an iteration version of DAR, which had better results but with a time complexity of O(n4). The test results show that DAR algorithm has good learning ability, and the model obtained by small layout problems can be effectively applied to large-scale problems. The results on two typical large-scaled instances show that the proposed algorithms have achieved or approached to the current best results, so that they have advantages both in time and solution quality.
Keywords:layout optimization problem  pointer network  reinforcement learning  deep learning
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号