基于迁移深度强化学习的低轨卫星跳波束资源分配方案 A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于迁移深度强化学习的低轨卫星跳波束资源分配方案

引用本文：	陈前斌,麻世庆,段瑞吉,唐伦,梁承超.基于迁移深度强化学习的低轨卫星跳波束资源分配方案[J].电子与信息学报,2023,45(2):407-417.

作者姓名：	陈前斌麻世庆段瑞吉唐伦梁承超

作者单位：	重庆邮电大学通信与信息工程学院重庆 400065

基金项目：	国家自然科学基金(62071078, 62001076)，重庆市教委科学技术研究项目(KJZD-M201800601, KJQN-201900645)

摘要：	针对低轨(LEO)卫星场景下，传统资源分配方案容易造成特定小区资源分配无法满足需求的问题，该文提出一种基于迁移深度强化学习(TDRL)的低轨卫星跳波束资源分配方案。首先，该方案联合星上缓冲信息、业务到达情况和信道状态，以最小化卫星上数据包平均时延为目标，建立支持跳波束技术的低轨卫星资源分配优化模型。其次，针对低轨卫星网络的动态多变性，该文考虑动态随机变化的通信资源和通信需求，采用深度Q网络(DQN)算法利用神经网络作为非线性近似函数。进一步，为实现并加速深度强化学习(DRL)算法在其他目标任务中的收敛过程，该文引入迁移学习(TL)概念，利用源卫星学习的调度任务快速寻找目标卫星的波束调度和功率分配策略。仿真结果表明，该文所提出的算法能够优化卫星服务过程中的时隙分配，减少数据包的平均传输时延，并有效提高系统的吞吐量和资源利用效率。
关键词：	低轨卫星网络跳波束资源分配深度强化学习迁移学习
收稿时间：	2021-12-08
A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning

CHEN Qianbin,MA Shiqing,DUAN Ruiji,TANG Lun,LIANG Chengchao.A Novel Beam Hopping Resource Allocation Scheme of Low Earth Orbit Satellite Based on Transfer Deep Reinforcement Learning[J].Journal of Electronics & Information Technology,2023,45(2):407-417.

Authors:	CHEN Qianbin MA Shiqing DUAN Ruiji TANG Lun LIANG Chengchao

Affiliation:	School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Abstract:	In the Low Earth Orbit (LEO) scenario, traditional resource allocation schemes can cause unbalanced resource allocation in specific cells. A beam hopping resource allocation scheme of LEO based on Transfer Deep Reinforcement Learning (TDRL) is proposed in this paper. Firstly, considering on-board buffer information, service arrival status and channel status, a LEO resource allocation optimization model that supports beam hopping technology is proposed with the goal of minimizing the average delay of data packets. Secondly, in view of the dynamic variability of the LEO network, the dynamic and random change of communication resources and requirements are considered, then the Deep Q Network (DQN) algorithm is adopted, and its neural network is used as a nonlinear approximation function. Further, to realize and accelerate the convergence process of the Deep Reinforcement Learning (DRL) algorithm in other target tasks, the concept of Transfer Learning (TL) is introduced in this paper, which uses the scheduling task learned by the source satellite to find quickly the beam scheduling and power allocation strategy of the target satellite. The simulation results demonstrate that the algorithm can optimize the time slot allocation in the satellite service process while decreasing the average delay of data packets and improving the throughput and resource utilization efficiency of the system.

Keywords:

	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏