融合类人驾驶行为的无人驾驶深度强化学习方法 Human-Like Driving Strategy Based on Deep Reinforcement Learning for Autonomous Vehicles期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合类人驾驶行为的无人驾驶深度强化学习方法

引用本文：	吕迪,徐坤,李慧云,潘仲鸣.融合类人驾驶行为的无人驾驶深度强化学习方法[J].集成技术,2020,9(5):34-47.

作者姓名：	吕迪徐坤李慧云潘仲鸣

作者单位：	中国科学院深圳先进技术研究院深圳518055;深圳市电动汽车动力平台与安全技术重点实验室深圳518055;哈尔滨理工大学哈尔滨 150000;中国科学院深圳先进技术研究院深圳518055;深圳市电动汽车动力平台与安全技术重点实验室深圳518055

基金项目：	国家重点研发计划项目(2016YFD0700602)；国家自然科学基金项目(61603377)

摘要：	现有无人车辆的驾驶策略过于依赖感知-控制映射过程的“正确性”，而忽视了人类驾驶汽车时所遵循的驾驶逻辑。该研究基于深度确定性策略梯度算法，提出了一种具备类人驾驶行为的端到端无人驾驶控制策略。通过施加规则约束对智能体连续行为的影响，建立了能够输出符合类人驾驶连续有序行为的类人驾驶端到端控制网络，对策略输出采用了后验反馈方式，降低了控制策略的危险行为输出率。针对训练过程中出现的稀疏灾难性事件，提出了一种更符合控制策略优化期望的连续奖励函数，提高了算法训练的稳定性。不同仿真环境下的实验结果表明，改进后的奖励塑造方式在评价稀疏灾难性事件时，对目标函数优化期望的近似程度提高了 85.57%，训练效率比传统深度确定性策略梯度算法提高了 21%，任务成功率提高了 19%，任务执行效率提高了 15.45%，验证了该方法在控制效率和平顺性方面具备明显优势，显著减少了碰撞事故。
关键词：	深度强化学习端到端控制无人驾驶类人驾驶奖励塑造
收稿时间：	2020/5/15 0:00:00
修稿时间：	2020/6/4 0:00:00
Human-Like Driving Strategy Based on Deep Reinforcement Learning for Autonomous Vehicles

Authors:	LV Di XU Kun LI Huiyun PAN Zhongming

Abstract:	The driving decisions of human drivers have the social intelligence to handle complex conditions in addition to the driving correctness. However, the existing autonomous driving strategies mainly focus on the correctness of the perception-control mapping, which deviates from the driving logic that human drivers follow. To solve this problem, this paper proposes a human-like autonomous driving strategy in an end-toend control framework based on deep deterministic policy gradient (DDPG). By applying rule constraints to the continuous behavior of the agents, an unmanned end-to-end control strategy was established. This strategy can output continuous and reasonable driving behavior that is consistent with the human driving logic. To enhance the driving safety of the end-to-end decision-making scheme, it utilizes the posterior feedback of the policy output to reduce the output rate of dangerous behaviors. To deal with the catastrophic events in the training process, a continuous reward function is proposed to improve the stability of the training algorithm. The results validated in different simulation environments showed that, the proposed human-like autonomous driving strategy has better control performance than the traditional DDPG algorithm. And the improved reward shaping method is more in line with the control strategy to model the catastrophic events of sparse rewards. The optimization expectation of the objective function can be increased by 85.57%. The human-like DDPG autonomous driving strategy proposed in this paper improves the training efficiency of the traditional DDPG algorithm by 21%, the task success rate by 19%, and the task execution efficiency by 15.45%, which significantly reduces collision accidents.

Keywords:	deep reinforcement learning end-to-end control autonomous driving human-like driving reward shaping
本文献已被万方数据等数据库收录！
	点击此处可从《集成技术》浏览原始摘要信息
	点击此处可从《集成技术》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏