首页 | 官方网站   微博 | 高级检索  
     

基于异步优势演员-评论家学习的服务功能链资源分配算法
引用本文:唐伦,贺小雨,王晓,谭颀,胡彦娟,陈前斌.基于异步优势演员-评论家学习的服务功能链资源分配算法[J].电子与信息学报,2021,43(6):1733-1741.
作者姓名:唐伦  贺小雨  王晓  谭颀  胡彦娟  陈前斌
作者单位:1.重庆邮电大学通信与信息工程学院 重庆 4000652.重庆邮电大学移动通信重点实验室 重庆 400065
基金项目:重庆市教委科学技术研究项目(KJZD-M20180601),重庆市重大主题专项(cstc2019jscx-zdztzxX0006)
摘    要:考虑网络全局信息难以获悉的实际情况,针对接入网切片场景下用户终端(UE)的移动性和数据包到达的动态性导致的资源分配优化问题,该文提出了一种基于异步优势演员-评论家(A3C)学习的服务功能链(SFC)资源分配算法。首先,该算法建立基于区块链的资源管理机制,通过区块链技术实现可信地共享并更新网络全局信息,监督并记录SFC资源分配过程。然后,建立UE移动和数据包到达时变情况下的无线资源、计算资源和带宽资源联合分配的时延最小化模型,并进一步将其转化为马尔科夫决策过程(MDP)。最后,在所建立的MDP中采用A3C学习方法,实现资源分配策略的求解。仿真结果表明,该算法能够更加合理高效地利用资源,优化系统时延并保证UE需求。

关 键 词:网络切片    服务功能链资源分配    马尔科夫决策过程    异步优势演员-评论家学习    区块链
收稿时间:2020-04-21

Resource allocation Algorithm of Service Function Chain Based on Asynchronous Advantage Actor-Critic Learning
Lun TANG,Xiaoyu HE,Xiao WANG,Qi TAN,Yanjuan HU,Qianbin CHEN.Resource allocation Algorithm of Service Function Chain Based on Asynchronous Advantage Actor-Critic Learning[J].Journal of Electronics & Information Technology,2021,43(6):1733-1741.
Authors:Lun TANG  Xiaoyu HE  Xiao WANG  Qi TAN  Yanjuan HU  Qianbin CHEN
Affiliation:1.School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China2.Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Abstract:Considering the fact that global network information is hard to obtain, and the slice resource allocation optimization problem caused by mobility of User Equipment (UE) and dynamics of packet arrival in the radio access network slice, a Service Function Chain(SFC)resource allocation algorithm based on Asynchronous Advantage Actor-Critic (A3C) learning is proposed. Firstly, a resource management mechanism based on blockchain technology is established, which can credibly share and update the global network information, also supervise and record SFC resource allocation process. Then, a delay minimization model based on joint allocation of radio resources, computing resources and bandwidth resources is built under the circumstance of UE moving and time-varying packet arrival, and further transformed into an Markov Decision Process(MDP) problem. At last, A3C learning method is adopted to obtain the resource allocation optimization strategy in this MDP. Simulation results show that the proposed algorithm could utilize resources more efficiently to optimize the system delay while guarantee the requirement of each UE.
Keywords:
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号