共查询到18条相似文献,搜索用时 343 毫秒
1.
2.
3.
4.
未知环境下基于有先验知识的滚动Q学习机器人路径规划 总被引:1,自引:0,他引:1
提出一种未知环境下基于有先验知识的滚动Q学习机器人路径规划算法.该算法在对Q值初始化时加入对环境的先验知识作为搜索启发信息,以避免学习初期的盲目性,可以提高收敛速度.同时,以滚动学习的方法解决大规模环境下机器人视野域范围有限以及因Q学习的状态空间增大而产生的维数灾难等问题.仿真实验结果表明,应用该算法,机器人可在复杂的未知环境中快速地规划出一条从起点到终点的优化避障路径,效果令人满意. 相似文献
5.
针对传统导航方法对地图精度依赖和动态复杂场景适应差问题,提出一种基于课程学习的深度强化学习无地图自主导航算法.为了克服智能体稀疏奖励情况下学习困难的问题,借鉴课程学习思想,提出一种基于能力圈课程引导的深度强化学习训练方法.此外,为了更好地利用机器人当前的碰撞信息辅助机器人做动作决策,引入碰撞概率的概念,将机器人当前感知到的障碍物信息以一种高层语义的形式进行表示,并将其作为导航策略输入的一部分编码至机器人当前观测中,以简化观测到动作的映射,进一步降低学习的难度.实验结果表明,所提出的课程引导训练和碰撞概率可令导航策略收敛速度明显加快,习得的导航策略在空间更大的场景成功率到达90%以上,行驶耗时减少53.5%sim73.1%,可为非结构化未知环境下的无人化作业提供可靠导航. 相似文献
6.
针对机器人未知环境下信号源搜寻依赖于多机器人协同定位问题,提出一种基于天牛须搜索的单机器人寻源算法,进行未知环境下无线基站搜索.将天牛须搜索的两个触角改进为一个触角,引入避障策略,动态改变机器人的移动位置和方向,将所提算法应用于单机器人室内与室外两种环境下的无线基站搜索.与PSO、GA算法对比结果表明,收敛速度明显更快.该算法运算量小、收敛速度快,能够在复杂的未知环境中找到无线信号源. 相似文献
7.
该文提出了一种新颖的全局静态环境未知时基于同心圆策略的机器人路径滚动规划遗传算法.该算法在机器人视野域内产生若干个同心圆进行环境建模,然后基于遗传算法根据当前机器人视野域信息和目标点规划出一条临时的导航路径,机器人沿着该导航路径前进一步,再由遗传算法重新规划新的导航路径.机器人导航路径不断进行动态修改,使机器人沿着一条... 相似文献
8.
由于未知环境下机器人导航容易出现死锁问题,设计了一种基于栅格的地图模型叫“数据栅格”,并在此基础上提出了一种基于行为的导航方法即“安全导航法”。数据栅格记录了周围环境中障碍物信息和机器人路径信息,安全导航法就是应用数据栅格技术来解决未知环境下机器人导航遇到的死锁问题。模糊逻辑用来设计和协调各种导航行为。仿真和实际环境的实验结果也证实了该方法的良好性能。 相似文献
9.
针对移动机器人在完全未知或者部分未知的环境中进行自主导航容易陷入各种陷阱的问题,提出了一种基于多行为控制的导航方法;机器人通过激光雷达对周边环境进行感知,并将采集到的信息与行为转换条件进行匹配用于行为转换的决策;同时在该方法中通过栅格地图引入了记忆信息,从而增强机器人对周边环境的认知能力,从而提高机器人的决策能力;通过仿真实验证明了在简单环境中算法的有效性,同时也证明该算法对于某些复杂的环境有效可行,具有优化性、实时性与智能性的特点。 相似文献
10.
11.
State-chain sequential feedback reinforcement learning for path planning of autonomous mobile robots
This paper deals with a new approach based on Q-learning for solving the problem of mobile robot path planning in complex unknown static environments.As a computational approach to learning through interaction with the environment,reinforcement learning algorithms have been widely used for intelligent robot control,especially in the field of autonomous mobile robots.However,the learning process is slow and cumbersome.For practical applications,rapid rates of convergence are required.Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning,a state-chain sequential feedback Q-learning algorithm is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments.The state chain is built during the searching process.After one action is chosen and the reward is received,the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning.With the increasing number of Q-values updated after one action,the number of actual steps for convergence decreases and thus,the learning time decreases,where a step is a state transition.Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments.The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time,compared with the one-step Q-learning algorithm and the Q(λ)-learning algorithm. 相似文献
12.
13.
《Advanced Robotics》2013,27(11):1577-1593
In this paper, we report a robust and low-cost navigation algorithm for an unknown environment based on integration of a grid-based map building algorithm with behavior learning. The study focuses on mobile robots that utilize ultrasonic sensors as their prime interface with the outside world. The proposed algorithm takes into account environmental information to augment the readings from the low angular accuracy sonar measurements for behavior learning. The environmental information is obtained by an online grid-based map learning design that is concurrently operating with the behavior learning algorithm. The proposed algorithm is implemented and tested on an in-house-built mobile robot, and its performance is verified through online navigation in an indoor environment. 相似文献
14.
In this paper, a new approach is developed for solving the problem of mobile robot path planning in an unknown dynamic environment based on Q-learning. Q-learning algorithms have been used widely for solving real world problems, especially in robotics since it has been proved to give reliable and efficient solutions due to its simple and well developed theory. However, most of the researchers who tried to use Q-learning for solving the mobile robot navigation problem dealt with static environments; they avoided using it for dynamic environments because it is a more complex problem that has infinite number of states. This great number of states makes the training for the intelligent agent very difficult. In this paper, the Q-learning algorithm was applied for solving the mobile robot navigation in dynamic environment problem by limiting the number of states based on a new definition for the states space. This has the effect of reducing the size of the Q-table and hence, increasing the speed of the navigation algorithm. The conducted experimental simulation scenarios indicate the strength of the new proposed approach for mobile robot navigation in dynamic environment. The results show that the new approach has a high Hit rate and that the robot succeeded to reach its target in a collision free path in most cases which is the most desirable feature in any navigation algorithm. 相似文献
15.
针对传统遗传算法求解机器人路径规划问题存在的收敛速度较慢的缺陷,将蚂蚁算法、模拟退火算法、滚动规划和遗传算法相结合,提出了一种新颖的基于正反馈自适应遗传算法的滚动规划。仿真实验表明,即使在复杂的未知环境下,利用本算法也可以规划出一条全局优化路径,且能安全避碰。 相似文献
16.
本文分析了未知远程环境下移动机器人导航过程中进化学习的效率和知识更新
问题,提出了并行进化模型来解决此问题,并设计和论证了高效的并行进化计算机.最后通
过实验和仿真证实基于并行进化模型移动机器人在未知环境中导航是可行的和有效的. 相似文献
17.
针对未知环境中六足机器人的自主导航问题,设计了一种基于模糊神经网络的自主导航闭环控制算法,并依据该算法设计了六足机器人的导航控制系统.算法融合了模糊控制的逻辑推理能力与神经网络的学习训练能力,并引入闭环控制方法对算法进行优化.所设计的控制系统由信息输入、模糊神经网络、指令执行以及信息反馈4个模块组成.环境及位置信息的感知由GPS(全球定位系统)传感器、电子罗盘传感器和超声波传感器共同完成.采用C语言重建模糊神经网络控制算法,并应用于该系统.通过仿真实验,从理论上论证了基于模糊神经网络的闭环控制算法性能优于开环控制算法,闭环控制算法能够减小六足机器人在遇到障碍物时所绕行的距离,行进速度提高了6.14%,行进时间缩短了8.74%.在此基础上,开展了实物试验.试验结果表明,该控制系统能够实现六足机器人自主导航避障控制功能,相对于开环控制系统,能有效地缩短行进路径,行进速度提高了5.66%,行进时间缩短了7.25%,验证了闭环控制系统的可行性和实用性. 相似文献
18.
针对现有基于策略梯度的深度强化学习方法应用于办公室、走廊等室内复杂场景下的机器人导航时,存在训练时间长、学习效率低的问题,本文提出了一种结合优势结构和最小化目标Q值的深度强化学习导航算法.该算法将优势结构引入到基于策略梯度的深度强化学习算法中,以区分同一状态价值下的动作差异,提升学习效率,并且在多目标导航场景中,对状态价值进行单独估计,利用地图信息提供更准确的价值判断.同时,针对离散控制中缓解目标Q值过估计方法在强化学习主流的Actor-Critic框架下难以奏效,设计了基于高斯平滑的最小目标Q值方法,以减小过估计对训练的影响.实验结果表明本文算法能够有效加快学习速率,在单目标、多目标连续导航训练过程中,收敛速度上都优于柔性演员评论家算法(SAC),双延迟深度策略性梯度算法(TD3),深度确定性策略梯度算法(DDPG),并使移动机器人有效远离障碍物,训练得到的导航模型具备较好的泛化能力. 相似文献