首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
现有神经网络处理器已广泛应用于计算机视觉、自然语言处理等领域。然而,现有片上加速方案对控制领域的强化学习算法支持较少,而基于神经网络的强化学习是智能系统决策技术的核心。该文采用可重构阵列体系结构,通过片上配置、动作与奖励存储的系统设计方案,可实现多种神经网 络算法的灵活部署,并支持强化学习使用模式。基于 65 nm CMOS 工艺的逻辑综合结果显示,处理器主频为 200 MHz 时,计算模块面积仅需 0.32 mm2,计算功率约 15.46 mW。  相似文献   

2.
强化学习是机器学习领域的研究热点, 是考察智能体与环境的相互作用, 做出序列决策、优化策略并最大化累积回报的过程. 强化学习具有巨大的研究价值和应用潜力, 是实现通用人工智能的关键步骤. 本文综述了强化学习算法与应用的研究进展和发展动态, 首先介绍强化学习的基本原理, 包括马尔可夫决策过程、价值函数、探索-利用问题. 其次, 回顾强化学习经典算法, 包括基于价值函数的强化学习算法、基于策略搜索的强化学习算法、结合价值函数和策略搜索的强化学习算法, 以及综述强化学习前沿研究, 主要介绍多智能体强化学习和元强化学习方向. 最后综述强化学习在游戏对抗、机器人控制、城市交通和商业等领域的成功应用, 以及总结与展望.  相似文献   

3.
The success of any reinforcement learning (RL) application is in large part due to the design of an appropriate reinforcement function. A methodological framework to support the design of reinforcement functions has not been defined yet, and this critical and often underestimated activity is left to the ability of the RL application designer. We propose an approach to support reinforcement function design in RL applications concerning learning behaviors for autonomous agents. We define some dimensions along which we can describe reinforcement functions; we consider the distribution of reinforcement values, their coherence and their matching with the designer's perspective. We give hints to define measures that objectively describe the reinforcement function; we discuss the trade-offs that should be considered to improve learning and we introduce the dimensions along which this improvement can be expected. The approach we are presenting is general enough to be adopted in a large number of RL projects. We show how to apply it in the design of learning classifier systems (LCS) applications. We consider a simple, but quite complete case study in evolutionary robotics, and we discuss reinforcement function design issues in this sample context.  相似文献   

4.
强化学习用于解决无模型情况下的优化决策问题,是实现人工智能的重要技术之一,但传统的表格型强化学习方法难以处理具有大规模、连续空间的控制问题。近似强化学习受到函数逼近思想的启发,对价值函数或策略函数参数化表示,通过参数优化间接获得最优行为策略,在视频游戏、棋类对抗及机器人控制等领域应用效果显著。基于此,对近似强化学习算法的研究现状与应用进展进行了梳理和综述。介绍了近似强化学习相关的基础理论;分类总结了近似强化学习的经典算法及一些相应的改进方法;概述了近似强化学习在机器人控制领域的研究进展,并总结了当前面临的若干主要问题,为后续的研究提供参考。  相似文献   

5.
《Advanced Robotics》2013,27(1):91-118
In recent years, advances and improvements in engineering and robotics have in part been due to strengthened interactions with the biological sciences. Robots that mimic the complexity and adaptability of biological systems have become a central goal in research and development in robotics. Usually, such a collaboration is addressed to a 2-fold perspective of (i) setting up anthropomorphic platforms as test beds for studies in neuroscience and (ii) promoting new mechatronic and robotic technologies for the development of bio-inspired or humanoid high-performance robotic platforms. This paper provides a brief overview of recent studies on sensorimotor coordination in human motor control and proposes a novel paradigm of adaptive learning for sensorimotor control, based on a multi-network high-level control architecture. The proposed neurobiologically inspired model has been applied to a robotic platform, purposely designed to provide anthropomorphic solutions to neuroscientific requirements. The goal of this work is to use the bio-inspired robotic platform as a test bed for validating the proposed model of high-level sensorimotor control, with the aim of demonstrating adaptive and modular control based on acquired competences, with a higher degree of flexibility and generality than conventional robotic controllers, while preserving their robustness. To this purpose, a set of object-dependent, visually guided reach-and-grasp tasks and the associated training phases were first implemented in a multi-network control architecture in simulation. Subsequently, the offline learning realized in simulation was used to produce the input command of reach-and-grasp to the low-level position control of the robotic platform. Experimental trials demonstrated that the adaptive and modular high-level control allowed reaching and grasping of objects located at different positions and objects of variable size, shape and orientation. A future goal would be to address autonomous and progressive learning based on growing competences.  相似文献   

6.
This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent’s decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for three research fields. For machine learning (ML) researchers, emotion models may improve learning efficiency. For the interactive ML and human–robot interaction community, emotions can communicate state and enhance user investment. Lastly, it allows affective modelling researchers to investigate their emotion theories in a successful AI agent class. This survey provides background on emotion theory and RL. It systematically addresses (1) from what underlying dimensions (e.g. homeostasis, appraisal) emotions can be derived and how these can be modelled in RL-agents, (2) what types of emotions have been derived from these dimensions, and (3) how these emotions may either influence the learning efficiency of the agent or be useful as social signals. We also systematically compare evaluation criteria, and draw connections to important RL sub-domains like (intrinsic) motivation and model-based RL. In short, this survey provides both a practical overview for engineers wanting to implement emotions in their RL agents, and identifies challenges and directions for future emotion-RL research.  相似文献   

7.
In recent years, a large number of relatively advanced and often ready-to-use robotic hardware components and systems have been developed for small-scale use. As these tools are mature, there is now a shift towards advanced applications. These often require automation and demand reliability, efficiency and decisional autonomy. New software tools and algorithms for artificial intelligence (AI) and machine learning (ML) can help here. However, since there are many software-based control approaches for small-scale robotics, it is rather unclear how these can be integrated and which approach may be used as a starting point. Therefore, this paper attempts to shed light on existing approaches with their advantages and disadvantages compared to established requirements. For this purpose, a survey was conducted in the target group. The software categories presented include vendor-provided software, robotic software frameworks (RSF), scientific software and in-house developed software (IHDS). Typical representatives for each category are described in detail, including SmarAct precision tool commander, MathWorks Matlab and national instruments LabVIEW, as well as the robot operating system (ROS). The identified software categories and their representatives are rated for end user satisfaction based on functional and non-functional requirements, recommendations and learning curves. The paper concludes with a recommendation of ROS as a basis for future work.  相似文献   

8.
In recent robotics fields, much attention has been focused on utilizing reinforcement learning (RL) for designing robot controllers, since environments where the robots will be situated in should be unpredictable for human designers in advance. However there exist some difficulties. One of them is well known as ‘curse of dimensionality problem’. Thus, in order to adopt RL for complicated systems, not only ‘adaptability’ but also ‘computational efficiencies’ should be taken into account. The paper proposes an adaptive state recruitment strategy for NGnet-based actor-critic RL. The strategy enables the learning system to rearrange/divide its state space gradually according to the task complexity and the progress of learning. Some simulation results and real robot implementations show the validity of the method.  相似文献   

9.
In general symbiotic robotic systems can be considered three-agent systems in which the agents are performing interdependent tasks. In such systems the robot's actions are generally controllable and observable the environment's actions are partially controllable and partially observable, and the human's actions are not (directly) controllable but are partially observable. This new paradigm will enable us to incorporate techniques from AI and robotics in everyday, smart environments. Symbiotic robotic systems will shift our focus from the development of fully autonomous, almighty robot servants to the development of an open, extensible environment in which humans will cohabitate with many cooperating robotic devices.  相似文献   

10.
机器人因其高效的感知、决策和执行能力,在人工智能、信息技术和智能制造等领域中具有巨大的应用价值。目前,机器人学习与控制已成为机器人研究领域的重要前沿技术之一。各种基于神经网络的智能算法被设计,从而为机器人系统提供同步学习与控制的规划框架。首先从神经动力学(ND)算法、前馈神经网络(FNNs)、递归神经网络(RNNs)和强化学习(RL)四个方面介绍了基于神经网络的机器人学习与控制的研究现状,回顾了近30年来面向机器人学习与控制的智能算法和相关应用技术。最后展望了该领域存在的问题和发展趋势,以期促进机器人学习与控制理论的推广及应用场景的拓展。  相似文献   

11.
自主机器人的强化学习研究进展   总被引:9,自引:1,他引:8  
陈卫东  席裕庚  顾冬雷 《机器人》2001,23(4):379-384
虽然基于行为控制的自主机器人具有较高的鲁棒性,但其对于动态环境缺乏必要的自 适应能力.强化学习方法使机器人可以通过学习来完成任务,而无需设计者完全预先规定机 器人的所有动作,它是将动态规划和监督学习结合的基础上发展起来的一种新颖的学习方法 ,它通过机器人与环境的试错交互,利用来自成功和失败经验的奖励和惩罚信号不断改进机 器人的性能,从而达到目标,并容许滞后评价.由于其解决复杂问题的突出能力,强化学习 已成为一种非常有前途的机器人学习方法.本文系统论述了强化学习方法在自主机器人中的 研究现状,指出了存在的问题,分析了几种问题解决途径,展望了未来发展趋势.  相似文献   

12.
We present a case study of reinforcement learning on a real robot that learns how to back up a trailer and discuss the lessons learned about the importance of proper experimental procedure and design. We identify areas of particular concern to the experimental robotics community at large. In particular, we address concerns pertinent to robotics simulation research, implementing learning algorithms on real robotic hardware, and the difficulties involved with transferring research between the two.  相似文献   

13.
The last decade witnessed increasingly rapid progress in self‐driving vehicle technology, mainly backed up by advances in the area of deep learning and artificial intelligence (AI). The objective of this paper is to survey the current state‐of‐the‐art on deep learning technologies used in autonomous driving. We start by presenting AI‐based self‐driving architectures, convolutional and recurrent neural networks, as well as the deep reinforcement learning paradigm. These methodologies form a base for the surveyed driving scene perception, path planning, behavior arbitration, and motion control algorithms. We investigate both the modular perception‐planning‐action pipeline, where each module is built using deep learning methods, as well as End2End systems, which directly map sensory information to steering commands. Additionally, we tackle current challenges encountered in designing AI architectures for autonomous driving, such as their safety, training data sources, and computational hardware. The comparison presented in this survey helps gain insight into the strengths and limitations of deep learning and AI approaches for autonomous driving and assist with design choices.  相似文献   

14.
A problem related to the use of reinforcement learning (RL) algorithms on real robot applications is the difficulty of measuring the learning level reached after some experience. Among the different RL algorithms, the Q-learning is the most widely used in accomplishing robotic tasks. The aim of this work is to a priori evaluate the optimal Q-values for problems where it is possible to compute the distance between the current state and the goal state of the system. Starting from the Q-learning updating formula the equations for the maximum Q-weights, for optimal and non-optimal actions, have been computed considering delayed and immediate rewards. Deterministic and non deterministic grid-world environments have been also considered to test in simulations the obtained equations. Besides the convergence rates of the Q-learning algorithm have been compared using different learning rate parameters.  相似文献   

15.
一类值函数激励学习的遗忘算法   总被引:14,自引:1,他引:13  
大状态空间值函数的激励学习是当今国际激励学习领域的一个热点和难点的问题,将记忆心理中有关遗忘的基本原理引入值函数的激励学习,形成了一类适合于值函数激励学习的遗忘算法,首先简要介绍了解决马尔夫决策问题的基本概念,比较了离策略和在策略激励学习算法的差别,概述了标准的SARSA(λ)算法,在分析了人类记忆和遗忘的一些特征后,提出了一个智能遗忘准则,进而将SARSA(λ)算法改进为具有遗忘功能的Forget-SARSA(λ)算法,最后给出了实结果。  相似文献   

16.
Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware.  相似文献   

17.
在线学习时长是强化学习算法的一个重要指标.传统在线强化学习算法如Q学习、状态–动作–奖励–状态–动作(state-action-reward-state-action,SARSA)等算法不能从理论分析角度给出定量的在线学习时长上界.本文引入概率近似正确(probably approximately correct,PAC)原理,为连续时间确定性系统设计基于数据的在线强化学习算法.这类算法有效记录在线数据,同时考虑强化学习算法对状态空间探索的需求,能够在有限在线学习时间内输出近似最优的控制.我们提出算法的两种实现方式,分别使用状态离散化和kd树(k-dimensional树)技术,存储数据和计算在线策略.最后我们将提出的两个算法应用在双连杆机械臂运动控制上,观察算法的效果并进行比较.  相似文献   

18.
基于路径匹配的在线分层强化学习方法   总被引:1,自引:0,他引:1  
如何在线找到正确的子目标是基于option的分层强化学习的关键问题.通过分析学习主体在子目标处的动作,发现了子目标的有效动作受限的特性,进而将寻找子目标的问题转化为寻找路径中最匹配的动作受限状态.针对网格学习环境,提出了单向值方法表示子目标的有效动作受限特性和基于此方法的option自动发现算法.实验表明,基于单向值方法产生的option能够显著加快Q学习算法,也进一步分析了option产生的时机和大小对Q学习算法性能的影响.  相似文献   

19.
Robust motion control is fundamental to autonomous mobile robots. In the past few years, reinforcement learning (RL) has attracted considerable attention in the feedback control of wheeled mobile robot. However, it is still difficult for RL to solve problems with large or continuous state spaces, which is common in robotics. To improve the generalization ability of RL, this paper presents a novel hierarchical RL approach for optimal path tracking of wheeled mobile robots. In the proposed approach, a graph Laplacian-based hierarchical approximate policy iteration (GHAPI) algorithm is developed, in which the basis functions are constructed automatically using the graph Laplacian operator. In GHAPI, the state space of an Markov decision process is divided into several subspaces and approximate policy iteration is carried out on each subspace. Then, a near-optimal path-tracking control strategy can be obtained by GHAPI combined with proportional-derivative (PD) control. The performance of the proposed approach is evaluated by using a P3-AT wheeled mobile robot. It is demonstrated that the GHAPI-based PD control can obtain better near-optimal control policies than previous approaches.  相似文献   

20.
This paper provides an overview of the reinforcement learning and optimal adaptive control literature and its application to robotics. Reinforcement learning is bridging the gap between traditional optimal control, adaptive control and bio-inspired learning techniques borrowed from animals. This work is highlighting some of the key techniques presented by well known researchers from the combined areas of reinforcement learning and optimal control theory. At the end, an example of an implementation of a novel model-free Q-learning based discrete optimal adaptive controller for a humanoid robot arm is presented. The controller uses a novel adaptive dynamic programming (ADP) reinforcement learning (RL) approach to develop an optimal policy on-line. The RL joint space tracking controller was implemented for two links (shoulder flexion and elbow flexion joints) of the arm of the humanoid Bristol-Elumotion-Robotic-Torso II (BERT II) torso. The constrained case (joint limits) of the RL scheme was tested for a single link (elbow flexion) of the BERT II arm by modifying the cost function to deal with the extra nonlinearity due to the joint constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号