基于深度确定性策略梯度学习的无线反向散射数据卸载优化 A Deep Deterministic Policy Gradient Optimization Approach for Multi-users Data Offloading in Wireless PoweredCommunication Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度确定性策略梯度学习的无线反向散射数据卸载优化

引用本文：	耿天立,高昂,王琦,段渭军,胡延苏.基于深度确定性策略梯度学习的无线反向散射数据卸载优化[J].兵工学报,2021,42(12):2655-2663.

作者姓名：	耿天立高昂王琦段渭军胡延苏

作者单位：	西北工业大学电子信息学院, 陕西西安710072;物联网技术及应用国家地方联合工程实验室, 陕西西安710072;长安大学电子与控制学院,陕西西安710072

基金项目：	中国博士后基金项目（2017M623243）；陕西省博士后基金项目（2018BSHYDZZ26）；陕西省重点研发计划项目（2019ZDLGY13-02-02）；广西壮族自治区重点研发计划项目（AB19110036）；太仓市重点研发计划项目（TC2018SF03、TC2019SF03）；西安市科技计划项目(201805042YD20CG26(4)、GXYD21.2)；西北工业大学种子基金项目（CX2020159）；陕西省自然科学基金项目（2021JM-186）

摘要：	无线驱动通信网络中，无线设备（WD）可以通过无线反向散射和主动射频传输两种方式进行数据卸载。如何合理分配系统中WD的主动传输和反向散射传输工作模式及其对应的工作时间，从而减小传输延迟、提高传输效率就显得尤为必要。在综合考虑卸载数据量大小、信道条件和WD之间公平性情况下，提出一种基于深度确定性策略梯度（DDPG）的数据卸载方法，在连续动作空间内搜索多个WD的最优时间分配。仿真实验结果表明：DDPG可在有限时间步长内实现算法收敛；由于引入了Jain公平指数，多个WD可同时完成数据卸载；与传统的均分算法、贪心算法对比，DDPG算法可将平均传输延迟减小77.4%和24.2%，可有效提高WD的能耗效率，尤其对于卸载数据量较小的WD效果更加显著。
关键词：	反向散射数据卸载深度确定性策略梯度强化学习
A Deep Deterministic Policy Gradient Optimization Approach for Multi-users Data Offloading in Wireless PoweredCommunication Network

GENG Tianli,GAO Ang,WANG Qi,DUAN Weijun,HU Yansu.A Deep Deterministic Policy Gradient Optimization Approach for Multi-users Data Offloading in Wireless PoweredCommunication Network[J].Acta Armamentarii,2021,42(12):2655-2663.

Authors:	GENG Tianli GAO Ang WANG Qi DUAN Weijun HU Yansu

Affiliation:	(1.School of Electronics and Information，Northwestern Polytechnical University，Xi'an 710072，Shaanxi，China；2.State-Province Joint Engineering Laboratory of IoT Technology and Application，Xi'an 710072，Shaanxi，China; 3.School of Electronic Control, Chang'an University, Xi'an 710072, Shaanxi, China)

Abstract:	In the wireless powered communication network (WPCN)，the wireless devices can offload data through wireless backscattering and active radio frequency transmission.How to adjust the working mode as well as manage the time allocation of ambient backscattering and active RF transmission properly is a great challenge for reducing the system transmission delay and enhancing the transmission efficiency.A deep deterministic policy gradient(DDPG) algorithm is proposed to search the best time allocation in a continuous domain，in which the data size，the channel conditions and the fairness between wireless devices are considered.The experimental results show that DDPG algorithm achieves the algorithm convergence in finite time step， and all the wireless devices can complete the data offloading at the same time by introducing Jain fairness index.Compared with the traditional Round-Robin and Greedy algorithms，DDPG algorithm can be used to reduce the average transmission delay by 77.7% and 24.2%，respectively，and the energy efficiency is largely improved especially for wireless devices with a small amount of offloading data.

Keywords:	backscattering dataoffloading deepdeterministicpolicygradient reinforcedlearning
本文献已被万方数据等数据库收录！
	点击此处可从《兵工学报》浏览原始摘要信息
	点击此处可从《兵工学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏