首页 | 官方网站   微博 | 高级检索  
     

基于深度强化学习的网联车辆队列纵向控制
引用本文:李永福,周发涛,黄龙旺,于树友,施树明.基于深度强化学习的网联车辆队列纵向控制[J].控制与决策,2024,39(6):1879-1887.
作者姓名:李永福  周发涛  黄龙旺  于树友  施树明
作者单位:重庆邮电大学 智能空地协同控制重庆市高校重点实验室,重庆 400065;吉林大学 控制科学与工程系,长春 130012;吉林大学 交通学院,长春 130012
基金项目:国家自然科学基金项目(U1964202,62273027);重庆市自然科学基金创新发展联合基金项目(CSTB2022NSCQ-LZX0025);重庆市教育委员会科学技术研究项目(KJZD-M202300602).
摘    要:针对车辆队列中多目标控制优化问题,研究基于强化学习的车辆队列控制方法.控制器输入为队列各车辆状态信息以及车辆间状态误差,输出为基于车辆纵向动力学的期望加速度,实现在V2X通信下的队列单车稳定行驶和队列稳定行驶.根据队列行驶场景以及采用的间距策略、通信拓扑结构等特性,建立队列马尔科夫决策过程(Markov decision process,MDP)模型.同时根据队列多输入-多输出高维样本特性,引入优先经验回放策略,提高算法收敛效率.为贴近实际车辆队列行驶工况,仿真基于PreScan构建多自由度燃油车动力学模型,联合Matlab/ Simulink搭建仿真环境,同时引入噪声对队列控制器中动作网络和评价网络进行训练.仿真结果表明基于强化学习的车辆队列控制燃油消耗更低,且控制器实时性更高,对车辆的控制更为平滑.

关 键 词:车辆队列  深度确定性策略梯度  强化学习  燃油消耗  V2X  PreScan

Longitudinal control of connected vehicle platoon based on deep reinforcement learning
LI Yong-fu,ZHOU Fa-tao,HUANG Long-wang,YU Shu-you,SHI Shu-ming.Longitudinal control of connected vehicle platoon based on deep reinforcement learning[J].Control and Decision,2024,39(6):1879-1887.
Authors:LI Yong-fu  ZHOU Fa-tao  HUANG Long-wang  YU Shu-you  SHI Shu-ming
Affiliation:Key Laboratory of Intelligent Air-Ground Cooperative Control for Universities in Chongqing,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Department of Control Science and Engineering,Jilin University,Changchun 130012,China; Transportation College,Jilin University,Changchun 130012,China
Abstract:This paper presents a vehicle platoon control method based on reinforcement learning(RL) to solve the multi-objective optimization problem. The actor network is designed to receive the state information of each vehicle in the platoon and the inter-vehicle state error, and outputs the desired acceleration based on the longitudinal dynamics of the vehicle. The proposed approach ensures both the individual vehicle stability and the string stability of the platoon under V2X communication. To model the platoon driving scenario with the spacing policy and communication topology, the Markov decision process(MDP) model of the platoon is established. In addition, considering the multi-input and multi-output high-level sample characteristics of the platoon, the deep deterministic policy gradient(DDPG) algorithm is adopted with the priority experience replay strategy to improve the convergence efficiency. To better approximate the actual platoon vehicle fuel consumption, the simulation is based on PreScan to build a high-degree fuel vehicle dynamics model. A co-simulation environment is created using Matlab/Simulink to train the actor network and critic network in the platoon controller by adding noise. The simulation results demonstrate that the reinforcement learning-based vehicle platoon control approach reduces fuel consumption and achieves faster and smoother vehicle control.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号