首页 | 官方网站   微博 | 高级检索  
     

融合类人驾驶行为的无人驾驶深度强化学习方法
引用本文:吕 迪,徐 坤,李慧云,潘仲鸣.融合类人驾驶行为的无人驾驶深度强化学习方法[J].集成技术,2020,9(5):34-47.
作者姓名:吕 迪  徐 坤  李慧云  潘仲鸣
作者单位:中国科学院深圳先进技术研究院 深圳518055;深圳市电动汽车动力平台与安全技术重点实验室 深圳518055;哈尔滨理工大学 哈尔滨 150000;中国科学院深圳先进技术研究院 深圳518055;深圳市电动汽车动力平台与安全技术重点实验室 深圳518055
基金项目:国家重点研发计划项目(2016YFD0700602);国家自然科学基金项目(61603377)
摘    要:现有无人车辆的驾驶策略过于依赖感知-控制映射过程的“正确性”,而忽视了人类驾驶汽车 时所遵循的驾驶逻辑。该研究基于深度确定性策略梯度算法,提出了一种具备类人驾驶行为的端到端 无人驾驶控制策略。通过施加规则约束对智能体连续行为的影响,建立了能够输出符合类人驾驶连续 有序行为的类人驾驶端到端控制网络,对策略输出采用了后验反馈方式,降低了控制策略的危险行为 输出率。针对训练过程中出现的稀疏灾难性事件,提出了一种更符合控制策略优化期望的连续奖励函 数,提高了算法训练的稳定性。不同仿真环境下的实验结果表明,改进后的奖励塑造方式在评价稀疏 灾难性事件时,对目标函数优化期望的近似程度提高了 85.57%,训练效率比传统深度确定性策略梯度 算法提高了 21%,任务成功率提高了 19%,任务执行效率提高了 15.45%,验证了该方法在控制效率和 平顺性方面具备明显优势,显著减少了碰撞事故。

关 键 词:深度强化学习  端到端控制  无人驾驶  类人驾驶  奖励塑造
收稿时间:2020/5/15 0:00:00
修稿时间:2020/6/4 0:00:00

Human-Like Driving Strategy Based on Deep Reinforcement Learning for Autonomous Vehicles
Authors:LV Di  XU Kun  LI Huiyun  PAN Zhongming
Abstract:The driving decisions of human drivers have the social intelligence to handle complex conditions in addition to the driving correctness. However, the existing autonomous driving strategies mainly focus on the correctness of the perception-control mapping, which deviates from the driving logic that human drivers follow. To solve this problem, this paper proposes a human-like autonomous driving strategy in an end-toend control framework based on deep deterministic policy gradient (DDPG). By applying rule constraints to the continuous behavior of the agents, an unmanned end-to-end control strategy was established. This strategy can output continuous and reasonable driving behavior that is consistent with the human driving logic. To enhance the driving safety of the end-to-end decision-making scheme, it utilizes the posterior feedback of the policy output to reduce the output rate of dangerous behaviors. To deal with the catastrophic events in the training process, a continuous reward function is proposed to improve the stability of the training algorithm. The results validated in different simulation environments showed that, the proposed human-like autonomous driving strategy has better control performance than the traditional DDPG algorithm. And the improved reward shaping method is more in line with the control strategy to model the catastrophic events of sparse rewards. The optimization expectation of the objective function can be increased by 85.57%. The human-like DDPG autonomous driving strategy proposed in this paper improves the training efficiency of the traditional DDPG algorithm by 21%, the task success rate by 19%, and the task execution efficiency by 15.45%, which significantly reduces collision accidents.
Keywords:deep reinforcement learning  end-to-end control  autonomous driving  human-like driving  reward shaping
本文献已被 万方数据 等数据库收录!
点击此处可从《集成技术》浏览原始摘要信息
点击此处可从《集成技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号