改进TD3算法在四旋翼无人机避障中的应用 Application for Improved TD3 Algorithm in Obstacle Avoidance of Quad-Rotor UAV期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

改进TD3算法在四旋翼无人机避障中的应用

引用本文：	唐蕾,刘广钟.改进TD3算法在四旋翼无人机避障中的应用[J].计算机工程与应用,2021,57(11):254-259.

作者姓名：	唐蕾刘广钟

作者单位：	上海海事大学信息工程学院，上海 201306

摘要：	为了提高无人机（Unmanned Aerial Vehicle，UAV）系统的智能避障性能，提出了一种基于双延迟深度确定性策略梯度（Twin Delayed Deep Deterministic Policy Gradient，TD3）的改进算法（Improved Twin Delayed Deep Deterministic Policy Gradient，I-TD3）。该算法通过设置两个经验缓存池分离成功飞行经验和失败飞行经验，并根据两个经验缓存池的不同使用目的分别结合优先经验回放（Prioritized Experience Replay）方法和经验回放（Experience Replay）方法，提高有效经验的采样效率，缓解因无效经验过高导致的训练效率低问题。改进奖励函数，解决因奖励设置不合理导致的训练效果差问题。在AirSim平台上实现仿真实验，结果表明在四旋翼无人机的避障问题上，I-TD3算法的避障效果优于TD3算法和深度确定性策略梯度（Deep Deterministic Policy Gradient，DDPG）算法。
关键词：	双延迟深度确定性策略梯度（TD3）优先经验回放避障四旋翼无人机
Application for Improved TD3 Algorithm in Obstacle Avoidance of Quad-Rotor UAV

TANG Lei,LIU Guangzhong.Application for Improved TD3 Algorithm in Obstacle Avoidance of Quad-Rotor UAV[J].Computer Engineering and Applications,2021,57(11):254-259.

Authors:	TANG Lei LIU Guangzhong

Affiliation:	College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

Abstract:	In order to improve the intelligent obstacle avoidance performance of Unmanned Aerial Vehicle（UAV）, an improved algorithm called Improved Twin Delayed Deep Deterministic Policy Gradient（I-TD3）based on Twin Delayed Deep Deterministic Policy Gradient（TD3）is proposed. According to the different purposes of experience buffer pools, combined with the Prioritized Experience Replay and the Experience Replay, the success flight experience and failure flight experience are separated by setting two experience buffer pools to enhance the sample efficiency of effective experience, alleviate the problem of low training efficiency prompted by too much invalid experience. Meantime, the reward function is ameliorated to solve the problem of poor training effect caused by unreasonable reward setting. By applying the simulation experiment of quad-rotor UVA on AirSim platform, it is indicated that the obstacle avoidance effect of I-TD3 algorithm is superior to the TD3 algorithm and the Deep Deterministic Policy Gradient（DDPG） algorithm.

Keywords:	Twin Delayed Deep Deterministic Policy Gradient（TD3） prioritized experience replay obstacle avoidance quad-rotor unmanned aerial vehicle
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏