基于强化学习的多阶段网络分组路由方法 Packet Routing Method for Multi-Stage Networks Based on Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于强化学习的多阶段网络分组路由方法

引用本文：	高远翔,罗龙,孙罡.基于强化学习的多阶段网络分组路由方法[J].电子科技大学学报（自然科学版）,2022,51(2):200-206.

作者姓名：	高远翔罗龙孙罡

作者单位：	1.电子科技大学光纤传感与通信教育部重点实验室　成都　611731

基金项目：	国家重点研发计划(2019YFB1802800)；

摘要：	多阶段网络被广泛应用于机器学习集群，由于多阶段网络中可用路径多，分组的路由是一个组合优化难题。现有基于启发式的路由算法由于缺乏性能保证，严重影响分组传输延迟。提出了基于强化学习的多阶段网络分组路由方法，使用一个新颖的策略迭代算法，通过学习的方式计算出最佳路由策略。算法通过在策略评估步骤中使用价值函数的最大似然估计器，克服了强化学习方法中蒙泰卡罗(MC)或时间差分(TD)价值估计器样本效率低的问题。为了应对组合优化时计算复杂度高的问题，算法在策略改进步骤中将组合动作空间上的优化分解为各组成动作的序列优化，以提高求解效率。基于NS-3网络模拟器的仿真实验结果表明，相较于现有最优的启发式路由策略，该算法学习到的路由策略降低了13.9%的平均分组延迟。
关键词：	集群网络策略迭代分组路由强化学习
收稿时间：	2021-09-14
Packet Routing Method for Multi-Stage Networks Based on Reinforcement Learning

GAO Yuanxiang,LUO Long,SUN Gang.Packet Routing Method for Multi-Stage Networks Based on Reinforcement Learning[J].Journal of University of Electronic Science and Technology of China,2022,51(2):200-206.

Authors:	GAO Yuanxiang LUO Long SUN Gang

Affiliation:	1.Key Lab of Optical Fiber Sensing and Communications, University of Electronic Science and Technology of China　Chengdu　611731

Abstract:	Multi-stage networks are widely used in machine learning clusters. Due to the large number of available paths in a multi-stage network, packet routing is a combinatorial optimization problem. Existing routing algorithms based on heuristics lack performance guarantee, which seriously affects the packet transmission delay. This paper proposes a packet routing method based on reinforcement learning for multi-stage networks, using a novel policy iteration algorithm to compute an optimal routing policy by learning. In the policy evaluation step, this algorithm uses the maximum likelihood estimator of the value function, which overcomes the low sample efficiency problem of Monte Carlo (MC) or Temporal-Difference (TD) value function estimators in reinforcement learning. To deal with the high computational complexity of the combinatorial optimization problem in the policy improvement step, this algorithm decomposes the optimization over a combinatorial action space into a sequential optimization of each action. Experiments based on NS-3 network simulator show that the routing policy learnt by the algorithm reduces 13.9% of the average packet transmission delay compared to existing best routing heuristics.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《电子科技大学学报（自然科学版）》浏览原始摘要信息
	点击此处可从《电子科技大学学报（自然科学版）》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏