期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reduction of state space in reinforcement learning by sensor selection

Yasutaka Kishima Kentarou Kurashige 《Artificial Life and Robotics》2013,18(1-2):7-14

Much research has been conducted on the application of reinforcement learning to robots. Learning time is a matter of concern in reinforcement learning. In reinforcement learning, information from sensors is projected on to a state space. A robot learns the correspondence between each state and action in state space and determines the best correspondence. When the state space is expanded according to the number of sensors, the number of correspondences learnt by the robot is increased. Therefore, learning the best correspondence becomes time consuming. In this study, we focus on the importance of sensors for a robot to perform a particular task. The sensors that are applicable to a task differ for different tasks. A robot does not need to use all installed sensors to perform a task. The state space should consist of only those sensors that are essential to a task. Using such a state space consisting of only important sensors, a robot can learn correspondences faster than in the case of a state space consisting of all installed sensors. Therefore, in this paper, we propose a relatively fast learning system in which a robot can autonomously select those sensors that are essential to a task and a state space for only such important sensors is constructed. We define the measure of importance of a sensor for a task. The measure is the coefficient of correlation between the value of each sensor and reward in reinforcement learning. A robot determines the importance of sensors based on this correlation. Consequently, the state space is reduced based on the importance of sensors. Thus, the robot can efficiently learn correspondences owing to the reduced state space. We confirm the effectiveness of our proposed system through a simulation. 相似文献

2.

Hidden state and reinforcement learning with instance-based stateidentification

McCallum R.A. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1996,26(3):464-473

Real robots with real sensors are not omniscient. When a robot's next course of action depends on information that is hidden from the sensors because of problems such as occlusion, restricted range, bounded field of view and limited attention, we say the robot suffers from the hidden state problem. State identification techniques use history information to uncover hidden state. Some previous approaches to encoding history include: finite state machines, recurrent neural networks and genetic programming with indexed memory. A chief disadvantage of all these techniques is their long training time. This paper presents instance-based state identification, a new approach to reinforcement learning with state identification that learns with much fewer training steps. Noting that learning with history and learning in continuous spaces both share the property that they begin without knowing the granularity of the state space, the approach applies instance-based (or "memory-based") learning to history sequences-instead of recording instances in a continuous geometrical space, we record instances in action-percept-reward sequence space. The first implementation of this approach, called Nearest Sequence Memory, learns with an order of magnitude fewer steps than several previous approaches. 相似文献

3.

多机器人动态编队的强化学习算法研究 总被引：8，自引：0，他引：8

王醒策张汝波顾国昌《计算机研究与发展》2003,40(10):1444-1450

在人工智能领域中，强化学习理论由于其自学习性和自适应性的优点而得到了广泛关注．随着分布式人工智能中多智能体理论的不断发展，分布式强化学习算法逐渐成为研究的重点．首先介绍了强化学习的研究状况，然后以多机器人动态编队为研究模型，阐述应用分布式强化学习实现多机器人行为控制的方法．应用SOM神经网络对状态空间进行自主划分，以加快学习速度；应用BP神经网络实现强化学习，以增强系统的泛化能力；并且采用内、外两个强化信号兼顾机器人的个体利益及整体利益．为了明确控制任务，系统使用黑板通信方式进行分层控制．最后由仿真实验证明该方法的有效性．相似文献

4.

人群环境中基于深度强化学习的移动机器人避障算法

孙立香孙晓娴刘成菊靖文《信息与控制》2022,51(1):107-118

为了控制移动机器人在人群密集的复杂环境中高效友好地完成避障任务,本文提出了一种人群环境中基于深度强化学习的移动机器人避障算法。首先,针对深度强化学习算法中值函数网络学习能力不足的情况,基于行人交互（crowd interaction）对值函数网络做了改进,通过行人角度网格（angel pedestrian grid）对行人之间的交互信息进行提取,并通过注意力机制（attention mechanism）提取单个行人的时序特征,学习得到当前状态与历史轨迹状态的相对重要性以及对机器人避障策略的联合影响,为之后多层感知机的学习提供先验知识;其次,依据行人空间行为（human spatial behavior）设计强化学习的奖励函数,并对机器人角度变化过大的状态进行惩罚,实现了舒适避障的要求;最后,通过仿真实验验证了人群环境中基于深度强化学习的移动机器人避障算法在人群密集的复杂环境中的可行性与有效性。相似文献

5.

一种新的多智能体强化学习算法及其在多机器人协作任务中的应用 总被引：1，自引：0，他引：1

顾国昌仲宇张汝波《机器人》2003,25(4):344-348

在多机器人系统中，评价一个机器人行为的好坏常常依赖于其它机器人的行为，此时必须采用组合动作以实现多机器人的协作，但采用组合动作的强化学习算法由于学习空间异常庞大而收敛得极慢．本文提出的新方法通过预测各机器人执行动作的概率来降低学习空间的维数，并应用于多机器人协作任务之中．实验结果表明，基于预测的加速强化学习算法可以比原始算法更快地获得多机器人的协作策略．相似文献

6.

强化学习在足球机器人基本动作学习中的应用 总被引：1，自引：0，他引：1

段勇杨淮清崔宝侠徐心和《机器人》2008,30(5):1

主要研究了强化学习算法及其在机器人足球比赛技术动作学习问题中的应用．强化学习的状态空间和动作空间过大或变量连续,往往导致学习的速度过慢甚至难于收敛．针对这一问题,提出了基于T-S 模型模糊神经网络的强化学习方法,能够有效地实现强化学习状态空间到动作空间的映射．此外,使用提出的强化学习方法设计了足球机器人的技术动作,研究了在不需要专家知识和环境模型情况下机器人的行为学习问题．最后,通过实验证明了所研究方法的有效性,其能够满足机器人足球比赛的需要．相似文献

7.

A consideration of human immunity-based reinforcement learning with continuous states 总被引：1，自引：1，他引：0

Shu Hosokawa Kazushi Nakano Kazunori Sakurama 《Artificial Life and Robotics》2010,15(4):560-564

Many reinforcement learning methods have been studied on the assumption that a state is discretized and the environment size is predetermined. However, an operating environment may have a continuous state and its size may not be known in advance, e.g., in robot navigation and control. When applying these methods to the environment described above, we may need a large amount of time for learning or failing to learn. In this study, we improve our previous human immunity-based reinforcement learning method so that it will work in continuous state space environments. Since our method selects an action based on the distance between the present state and the memorized action, information about the environment (e.g., environment size) is not required in advance. The validity of our method is demonstrated through simulations for the swingup control of an inverted pendulum. 相似文献

8.

融合元学习和PPO算法的四足机器人运动技能学习方法

朱晓庆刘鑫源阮晓钢张思远李春阳李鹏《控制理论与应用》2024,41(1):155-162

具备学习能力是高等动物智能的典型表现特征, 为探明四足动物运动技能学习机理, 本文对四足机器人步态学习任务进行研究, 复现了四足动物的节律步态学习过程. 近年来, 近端策略优化(PPO)算法作为深度强化学习的典型代表, 普遍被用于四足机器人步态学习任务, 实验效果较好且仅需较少的超参数. 然而, 在多维输入输出场景下, 其容易收敛到局部最优点, 表现为四足机器人学习到步态节律信号杂乱且重心震荡严重. 为解决上述问题, 在元学习启发下, 基于元学习具有刻画学习过程高维抽象表征优势, 本文提出了一种融合元学习和PPO思想的元近端策略优化(MPPO)算法, 该算法可以让四足机器人进化学习到更优步态. 在PyBullet仿真平台上的仿真实验结果表明, 本文提出的算法可以使四足机器人学会行走运动技能, 且与柔性行动者评价器(SAC)和PPO算法的对比实验显示, 本文提出的MPPO算法具有步态节律信号更规律、行走速度更快等优势. 相似文献

9.

Continuous Action Generation of Q‐Learning in Multi‐Agent Cooperation

Tzung‐Feng Lin 《Asian journal of control》2013,15(4):1011-1020

Conventional Q‐learning requires pre‐defined quantized state space and action space. It is not practical for real robot applications since discrete and finite numbers of action sets cannot precisely identify the variances in the different positions on the same state element on which the robot is located. In this paper, a Q‐Learning composed continuous action generator, called the fuzzy cerebellar model articulation controller (FCMAC) method, is presented to solve the problem. The FCMAC displays continuous action generation by linear combination of the weighting distribution of the state space where the optimal policy of each state is derived from Q‐learning. This provides better resolution of the weighting distribution for the state space where the robot is located. The algorithm not only solves the single‐agent problem but also solves the multi‐agent problem by extension. An experiment is implemented in a task where two robots are taking action independently and both are connected with a straight bar. Their goal is to cooperate with each other to pass through a gate in the middle of a grid environment. 相似文献

10.

A multi-agent reinforcement learning approach to robot soccer

Yong Duan Bao Xia Cui Xin He Xu 《Artificial Intelligence Review》2012,38(3):193-211

In this paper, a multi-agent reinforcement learning method based on action prediction of other agent is proposed. In a multi-agent system, action selection of the learning agent is unavoidably impacted by other agents’ actions. Therefore, joint-state and joint-action are involved in the multi-agent reinforcement learning system. A novel agent action prediction method based on the probabilistic neural network (PNN) is proposed. PNN is used to predict the actions of other agents. Furthermore, the sharing policy mechanism is used to exchange the learning policy of multiple agents, the aim of which is to speed up the learning. Finally, the application of presented method to robot soccer is studied. Through learning, robot players can master the mapping policy from the state information to the action space. Moreover, multiple robots coordination and cooperation are well realized. 相似文献

11.

Using communication to reduce locality in distributed multiagent learning

MAJA J. MATARIC 《人工智能实验与理论杂志》2013,25(3):357-369

Abstract. This paper attempts to bridge the fields of machine learning, robotics, and distributed AI. It discusses the use of communication in reducing the undesirable effects of locality in fully distributed multi-agent systems with multiple agents robots learning in parallel while interacting with each other. Two key problems, hidden state and credit assignment, are addressed by applying local undirected broadcast communication in a dual role: as sensing and as reinforcement. The methodology is demonstrated on two multi-robot learning experiments. The first describes learning a tightly-coupled coordination task with two robots, the second a loosely-coupled task with four robots learning social rules. Communication is used to (1) share sensory data to overcome hidden state and (2) share reinforcement to overcome the credit assignment problem between the agents and bridge the gap between local individual and global group pay-off. 相似文献

12.

基于深度强化学习的机器人运动控制研究进展

董豪杨静李少波王军段仲静《控制与决策》2022,37(2):278-292

复杂未知环境下智能感知与自动控制是目前机器人在控制领域的研究热点之一,而新一代人工智能为其实现智能自动化赋予了可能.近年来,在高维连续状态-动作空间中,尝试运用深度强化学习进行机器人运动控制的新兴方法受到了相关研究人员的关注.首先,回顾了深度强化学习的兴起与发展,将用于机器人运动控制的深度强化学习算法分为基于值函数和策略梯度2类,并对各自典型算法及其特点进行了详细介绍;其次,针对仿真至现实之前的学习过程,简要介绍5种常用于深度强化学习的机器人运动控制仿真平台;然后,根据研究类型的不同,综述了目前基于深度强化学习的机器人运动控制方法在自主导航、物体抓取、步态控制、人机协作以及群体协同等5个方面的研究进展;最后,对其未来所面临的挑战以及发展趋势进行了总结与展望. 相似文献

13.

基于DDPG算法的路径规划研究

张义郭坤《数字社区&智能家居》2021,(4)

路径规划是人工智能领域的一个经典问题,在国防军事、道路交通、机器人仿真等诸多领域有着广泛应用,然而现有的路径规划算法大多存在着环境单一、离散的动作空间、需要人工构筑模型的问题。强化学习是一种无须人工提供训练数据自行与环境交互的机器学习方法,深度强化学习的发展更使得其解决现实问题的能力得到进一步提升,本文将深度强化学习的DDPG(Deep Deterministic Policy Gradient)算法应用到路径规划领域,完成了连续空间、复杂环境的路径规划。相似文献

14.

基于深度Q网络和人工势场的移动机器人路径规划研究

下载免费PDF全文

王冰晨连晓峰颜湘白天昕董兆阳《计算机测量与控制》2022,30(11):226-232

随着移动机器人在各个领域的研究与发展,人们对移动机器人路径规划的能力提出了更高的要求;为了解决传统的深度Q网络算法在未知环境下,应用于自主移动机器人路径规划时存在的收敛速度慢、训练前期产生较大迭代空间、迭代的次数多等问题,在传统DQN算法初始化Q值时,加入人工势场法的引力势场来协助初始化环境先验信息,进而可以引导移动机器人向目标点运动,来减少算法在最初几轮探索中形成的大批无效迭代,进而减少迭代次数,加快收敛速度;在栅格地图环境中应用pytorch框架验证加入初始引力势场的改进DQN算法路径规划效果;仿真实验结果表明,改进算法能在产生较小的迭代空间且较少的迭代次数后,快速有效地规划出一条从起点到目标点的最优路径。相似文献

15.

一种空间仿生柔性机器人设计与智能规划仿真方法

刘物己敬忠良陈务军潘汉《机器人》2022,44(3):361-367

针对传统空间刚体机器人存在的自由度有限和环境适应性差等缺陷,基于生物体结构提出了一种受“尺蠖”与“蛇”启发的适用于空间在轨服务的柔性机器人。首先,搭建了柔性机器人原型样机,研究了镍钛形状记忆合金（SMA）驱动器的驱动特性,设计了可视化控制界面并通过实物实验验证了机器人原型样机的可操控性。然后,设计了一种基于所提柔性机器人结构的Q学习算法和相应的奖励函数,搭建了柔性机器人仿真模型并在仿真环境中完成了基于Q学习的机器臂自主学习规划仿真实验。实验结果显示机器臂能够在较短时间内收敛到稳定状态并自主完成规划任务,表明所提出算法具有有效性和可行性,强化学习方法在柔性机器人的智能规划与控制中具有良好的应用前景。相似文献

16.

Reinforcement learning based on movement primitives for contact tasks

《Robotics and Computer》2020

Recently, robot learning through deep reinforcement learning has incorporated various robot tasks through deep neural networks, without using specific control or recognition algorithms. However, this learning method is difficult to apply to the contact tasks of a robot, due to the exertion of excessive force from the random search process of reinforcement learning. Therefore, when applying reinforcement learning to contact tasks, solving the contact problem using an existing force controller is necessary. A neural-network-based movement primitive (NNMP) that generates a continuous trajectory which can be transmitted to the force controller and learned through a deep deterministic policy gradient (DDPG) algorithm is proposed for this study. In addition, an imitation learning algorithm suitable for NNMP is proposed such that the trajectories similar to the demonstration trajectory are stably generated. The performance of the proposed algorithms was verified using a square peg-in-hole assembly task with a tolerance of 0.1 mm. The results confirm that the complicated assembly trajectory can be learned stably through NNMP by the proposed imitation learning algorithm, and that the assembly trajectory is improved by learning the proposed NNMP through the DDPG algorithm. 相似文献

17.

基于改进深度强化学习的动态移动机器人协同计算卸载

李少波刘意杨《计算机应用研究》2022,39(7)

移动边缘计算是解决机器人大计算量任务需求的一种方法。传统算法基于智能算法或凸优化方法,迭代时间长。深度强化学习通过一次前向传递即可求解,但只针对固定数量机器人进行求解。通过对深度强化学习分析研究,在深度强化学习神经网络中输入层前进行输入规整,在输出层后添加卷积层,使得网络能够自适应满足动态移动机器人数量的卸载需求。最后通过仿真实验验证,与自适应遗传算法和强化学习进行对比,验证了所提出算法的有效性及可行性。相似文献

18.

多智能体强化学习及其在足球机器人角色分配中的应用 总被引：2，自引：0，他引：2

段勇崔宝侠徐心和《控制理论与应用》2009,26(4):371-376

足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作. 相似文献

19.

结合优势结构和最小目标Q值的深度强化学习导航算法

朱威洪力栋施海东何德峰《控制理论与应用》2024,41(4):716-728

针对现有基于策略梯度的深度强化学习方法应用于办公室、走廊等室内复杂场景下的机器人导航时,存在训练时间长、学习效率低的问题,本文提出了一种结合优势结构和最小化目标Q值的深度强化学习导航算法.该算法将优势结构引入到基于策略梯度的深度强化学习算法中,以区分同一状态价值下的动作差异,提升学习效率,并且在多目标导航场景中,对状态价值进行单独估计,利用地图信息提供更准确的价值判断.同时,针对离散控制中缓解目标Q值过估计方法在强化学习主流的Actor-Critic框架下难以奏效,设计了基于高斯平滑的最小目标Q值方法,以减小过估计对训练的影响.实验结果表明本文算法能够有效加快学习速率,在单目标、多目标连续导航训练过程中,收敛速度上都优于柔性演员评论家算法(SAC),双延迟深度策略性梯度算法(TD3),深度确定性策略梯度算法(DDPG),并使移动机器人有效远离障碍物,训练得到的导航模型具备较好的泛化能力. 相似文献

20.

Reinforcement learning state estimator

Morimoto J Doya K 《Neural computation》2007,19(3):730-756

In this study, we propose a novel use of reinforcement learning for estimating hidden variables and parameters of nonlinear dynamical systems. A critical issue in hidden-state estimation is that we cannot directly observe estimation errors. However, by defining errors of observable variables as a delayed penalty, we can apply a reinforcement learning frame-work to state estimation problems. Specifically, we derive a method to construct a nonlinear state estimator by finding an appropriate feedback input gain using the policy gradient method. We tested the proposed method on single pendulum dynamics and show that the joint angle variable could be successfully estimated by observing only the angular velocity, and vice versa. In addition, we show that we could acquire a state estimator for the pendulum swing-up task in which a swing-up controller is also acquired by reinforcement learning simultaneously. Furthermore, we demonstrate that it is possible to estimate the dynamics of the pendulum itself while the hidden variables are estimated in the pendulum swing-up task. Application of the proposed method to a two-linked biped model is also presented. 相似文献