期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Application of reinforcement learning in robot soccer

《Engineering Applications of Artificial Intelligence》2007,20(7):936-950

The robot soccer game has been proposed as a benchmark problem for the artificial intelligence and robotic researches. Decision-making system is the most important part of the robot soccer system. As the environment is dynamic and complex, one of the reinforcement learning (RL) method named FNN-RL is employed in learning the decision-making strategy. The FNN-RL system consists of the fuzzy neural network (FNN) and RL. RL is used for structure identification and parameters tuning of FNN. On the other hand, the curse of dimensionality problem of RL can be solved by the function approximation characteristics of FNN. Furthermore, the residual algorithm is used to calculate the gradient of the FNN-RL method in order to guarantee the convergence and rapidity of learning. The complex decision-making task is divided into multiple learning subtasks that include dynamic role assignment, action selection, and action implementation. They constitute a hierarchical learning system. We apply the proposed FNN-RL method to the soccer agents who attempt to learn each subtask at the various layers. The effectiveness of the proposed method is demonstrated by the simulation and the real experiments. 相似文献

2.

多智能体强化学习及其在足球机器人角色分配中的应用 总被引：2，自引：0，他引：2

段勇崔宝侠徐心和《控制理论与应用》2009,26(4):371-376

足球机器人系统是一个典型的多智能体系统, 每个机器人球员选择动作不仅与自身的状态有关, 还要受到其他球员的影响, 因此通过强化学习来实现足球机器人决策策略需要采用组合状态和组合动作. 本文研究了基于智能体动作预测的多智能体强化学习算法, 使用朴素贝叶斯分类器来预测其他智能体的动作. 并引入策略共享机制来交换多智能体所学习的策略, 以提高多智能体强化学习的速度. 最后, 研究了所提出的方法在足球机器人动态角色分配中的应用, 实现了多机器人的分工和协作. 相似文献

3.

Hierarchical multi-agent reinforcement learning

Mohammad Ghavamzadeh Sridhar Mahadevan Rajbala Makar 《Autonomous Agents and Multi-Agent Systems》2006,13(2):197-229

In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem. 相似文献

4.

多智能体分层强化学习综述

下载免费PDF全文

殷昌盛杨若鹏朱巍邹小飞李峰《智能系统学报》2020,15(4):646-655

作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。相似文献

5.

Reinforcement learning for robot soccer

Martin Riedmiller Thomas Gabel Roland Hafner Sascha Lange 《Autonomous Robots》2009,27(1):55-73

Batch reinforcement learning methods provide a powerful framework for learning efficiently and effectively in autonomous robots. The paper reviews some recent work of the authors aiming at the successful application of reinforcement learning in a challenging and complex domain. It discusses several variants of the general batch learning framework, particularly tailored to the use of multilayer perceptrons to approximate value functions over continuous state spaces. The batch learning framework is successfully used to learn crucial skills in our soccer-playing robots participating in the RoboCup competitions. This is demonstrated on three different case studies.

Martin RiedmillerEmail:

相似文献

6.

A two-layered multi-agent reinforcement learning model and algorithm

Ben-Nian Wang Yang Gao Zhao-Qian Chen Jun-Yuan Xie Shi-Fu Chen 《Journal of Network and Computer Applications》2007,30(4):1366-1376

Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q. 相似文献

7.

A reinforcement learning approach to online clustering

Likas A 《Neural computation》1999,11(8):1915-1932

A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets. 相似文献

8.

A robust approach to robot team learning

Justin Girard M. Reza Emami 《Autonomous Robots》2016,40(8):1441-1457

The paper achieves two outcomes. First, it summarizes previous work on concurrent Markov decision processes (CMDPs) currently demonstrated for use with multi-agent foraging problems. When using CMDPs, each agent models the environment using two Markov decision process (MDP). The two MDPs characterize a multi-agent foraging problem by modeling both a single-agent foraging problem, and multi-agent task allocation problem, for each agent. Second, the paper studies the effects of state uncertainty on a heterogeneous robot team that utilizes the aforementioned CMDP modelling approach. Furthermore, the paper presents a method to maintain performance despite state uncertainty. The resulting robust concurrent individual and social learning (RCISL) mechanism leads to an enhanced team learning behaviour despite state uncertainty. The paper analyzes the performance of the concurrent individual and social learning mechanism with and without a particle filter for a heterogeneous foraging scenario. The RCISL mechanism confers statistically significant performance improvements over the CISL mechanism. 相似文献

9.

MARL-Ped: A multi-agent reinforcement learning based framework to simulate pedestrian groups

《Simulation Modelling Practice and Theory》2014

Pedestrian simulation is complex because there are different levels of behavior modeling. At the lowest level, local interactions between agents occur; at the middle level, strategic and tactical behaviors appear like overtakings or route choices; and at the highest level path-planning is necessary. The agent-based pedestrian simulators either focus on a specific level (mainly in the lower one) or define strategies like the layered architectures to independently manage the different behavioral levels. In our Multi-Agent Reinforcement-Learning-based Pedestrian simulation framework (MARL-Ped) the situation is addressed as a whole. Each embodied agent uses a model-free Reinforcement Learning (RL) algorithm to learn autonomously to navigate in the virtual environment. The main goal of this work is to demonstrate empirically that MARL-Ped generates learned behaviors adapted to the level required by the pedestrian scenario. Three different experiments, described in the pedestrian modeling literature, are presented to test our approach: (i) election of the shortest path vs. quickest path; (ii) a crossing between two groups of pedestrians walking in opposite directions inside a narrow corridor; (iii) two agents that move in opposite directions inside a maze. The results show that MARL-Ped solves the different problems, learning individual behaviors with characteristics of pedestrians (local control that produces adequate fundamental diagrams, route-choice capability, emergence of collective behaviors and path-planning). Besides, we compared our model with that of Helbing’s social forces, a well-known model of pedestrians, showing similarities between the pedestrian dynamics generated by both approaches. These results demonstrate empirically that MARL-Ped generates variate plausible behaviors, producing human-like macroscopic pedestrian flow. 相似文献

10.

A reinforcement learning approach to dynamic resource allocation

《Engineering Applications of Artificial Intelligence》2007,20(3):383-390

This paper presents a general framework for performing adaptive reconfiguration of a distributed system based on maximizing the long-term business value, defined as the discounted sum of all future rewards and penalties. The problem of dynamic resource allocation among multiple entities sharing a common set of resources is used as an example. A specific architecture (DRA-FRL) is presented, which uses the emerging methodology of reinforcement learning in conjunction with fuzzy rulebases to achieve the desired objective. This architecture can work in the context of existing resource allocation policies and learn the values of the states that the system encounters under these policies. Once the learning process begins to converge, the user can allow the DRA-FRL architecture to make some additional resource allocation decisions or override the ones suggested by the existing policies so as to improve the long-term business value of the system. The DRA-FRL architecture can also be deployed in an environment without any existing resource allocation policies. An implementation of the DRA-FRL architecture in Solaris 10 demonstrated a robust performance improvement in the problem of dynamically migrating CPUs and memory blocks between three resource partitions so as to match the stochastically changing workload in each partition, both in the presence and in the absence of resource migration costs. 相似文献

11.

A heuristic approach to reinforcement learning control systems

Waltz M. Fu K. 《Automatic Control, IEEE Transactions on》1965,10(4):390-398

This paper describes a learning control system using a reinforcement technique. The controller is capable of controlling a plant that may be nonlinear and nonstationary. The only a priori information required by the controller is the order of the plant. The approach is to design a controller which partitions the control measurement space into sets called control situations and then learns the best control choice for each control situation. The control measurements are those indicating the state of the plant and environment. The learning is accomplished by reinforcement of the probability of choosing a particular control choice for a given control situation. The system was stimulated on an IBM 1710-GEDA hybrid computer facility. Experimental results obtained from the simulation are presented. 相似文献

12.

混合分布式任务分配机制在足球机器人系统中的应用研究

季秀才崔连虎郑志强《计算机应用》2008,28(3):706-709

在分析各种多智能体任务分配机制的优缺点的基础上,结合基于市场法的任务分配机制和基于规则的任务分配机制,提出了一种混合分布式的多机器人任务分配机制用于足球机器人系统的角色分配。该角色分配算法在动态地分配角色的同时能够有效地避免角色的非期望震荡。仿真和实际比赛均验证了该算法的有效性。相似文献

13.

Shaping multi-agent systems with gradient reinforcement learning 总被引：1，自引：0，他引：1

Olivier Buffet Alain Dutech François Charpillet 《Autonomous Agents and Multi-Agent Systems》2007,15(2):197-220

An original reinforcement learning (RL) methodology is proposed for the design of multi-agent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. To that end, we design simple reactive agents in a decentralized way as independent learners. But to cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face a sequence of progressively more complex tasks. We illustrate this general framework by computer experiments where agents have to coordinate to reach a global goal. This work has been conducted in part in NICTA’s Canberra laboratory. 相似文献

14.

事件驱动的强化学习多智能体编队控制

下载免费PDF全文

徐鹏谢广明文家燕高远《智能系统学报》2019,14(1):93-98

针对经典强化学习的多智能体编队存在通信和计算资源消耗大的问题,本文引入事件驱动控制机制,智能体的动作决策无须按固定周期进行,而依赖于事件驱动条件更新智能体动作。在设计事件驱动条件时,不仅考虑智能体的累积奖赏值,还引入智能体与邻居奖赏值的偏差,智能体间通过交互来寻求最优联合策略实现编队。数值仿真结果表明,基于事件驱动的强化学习多智能体编队控制算法,在保证系统性能的情况下,能有效降低多智能体的动作决策频率和资源消耗。相似文献

15.

Cooperative reinforcement learning in topology-based multi-agent systems

Dan Xiao Ah-Hwee Tan 《Autonomous Agents and Multi-Agent Systems》2013,26(1):86-119

Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual agents and enable policy sharing across agents. Our complexity analysis indicates that multi-agent systems with the BTF have a much smaller state space and a higher level of flexibility, compared with the general form of n-ary (n > 2) tree formation. We have applied the proposed cooperative learning strategy to a class of reinforcement learning agents known as temporal difference-fusion architecture for learning and cognition (TD-FALCON). Comparative experiments based on a generic network routing problem, which is a typical TMAS domain, show that the TD-FALCON BTF teams outperform alternative methods, including TD-FALCON teams in single agent and n-ary tree formation, a Q-learning method based on the table lookup mechanism, as well as a classical linear programming algorithm. Our study further shows that TD-FALCON BTF can adapt and function well under various scales of network complexity and traffic volume in TMAS domains. 相似文献

16.

一种基于梯度的多智能体元深度强化学习算法

赵春宇赖俊陈希亮张人文《计算机应用研究》2024,41(5)

多智能体系统在自动驾驶、智能物流、医疗协同等多个领域中广泛应用,然而由于技术进步和系统需求的增加,这些系统面临着规模庞大、复杂度高等挑战,常出现训练效率低和适应能力差等问题。为了解决这些问题,将基于梯度的元学习方法扩展到多智能体深度强化学习中,提出一种名为多智能体一阶元近端策略优化（MAMPPO）方法,用于学习多智能体系统的初始模型参数,从而为提高多智能体深度强化学习的性能提供新的视角。该方法充分利用多智能体强化学习过程中的经验数据,通过反复适应找到在梯度下降方向上最敏感的参数并学习初始参数,使模型训练从最佳起点开始,有效提高了联合策略的决策效率,显著加快了策略变化的速度,面对新情况的适应速度显著加快。在星际争霸II上的实验结果表明,MAMPPO方法显著提高了训练速度和适应能力,为后续提高多智能强化学习的训练效率和适应能力提供了一种新的解决方法。相似文献

17.

A context-aware approach to automated negotiation using reinforcement learning

《Advanced Engineering Informatics》2021

Agents negotiate depending on individual perceptions of facts, events, trends and special circumstances that define the negotiation context. The negotiation context affects in different ways each agent’s preferences, bargaining strategies and resulting benefits, given the possible negotiation outcomes. Despite the relevance of the context, the existing literature on automated negotiation is scarce about how to account for it in learning and adapting negotiation strategies. In this paper, a novel contextual representation of the negotiation setting is proposed, where an agent resorts to private and public data to negotiate using an individual perception of its necessity and risk. A context-aware negotiation agent that learns through Self-Play and Reinforcement Learning (RL) how to use key contextual information to gain a competitive edge over its opponents is discussed in two levels of temporal abstraction. Learning to negotiate in an Eco-Industrial Park (EIP) is presented as a case study. In the Peer-to-Peer (P2P) market of an EIP, two instances of context-aware agents, in the roles of a buyer and a seller, are set to bilaterally negotiate exchanges of electrical energy surpluses over a discrete timeline to demonstrate that they can profit from learning to choose a negotiation strategy while selfishly accounting for contextual information under different circumstances in a data-driven way. Furthermore, several negotiation episodes are conducted in the proposed EIP between a context-aware agent and other types of agents proposed in the existing literature. Results obtained highlight that context-aware agents do not only reap selfishly higher benefits, but also promote social welfare as they resort to contextual information while learning to negotiate. 相似文献

18.

A case-based approach for coordinated action selection in robot soccer

Raquel Ros Josep Lluís Arcos Ramon Lopez de Mantaras Manuela Veloso 《Artificial Intelligence》2009,173(9-10):1014-1039

Designing coordinated robot behaviors in uncertain, dynamic, real-time, adversarial environments, such as in robot soccer, is very challenging. In this work we present a case-based reasoning approach for cooperative action selection, which relies on the storage, retrieval, and adaptation of example cases. We focus on cases of coordinated attacking passes between robots in the presence of the defending opponent robots. We present the case representation explicitly distinguishing between controllable and uncontrollable indexing features, corresponding to the positions of the team members and opponent robots, respectively. We use the symmetric properties of the domain to automatically augment the case library. We introduce a retrieval technique that weights the similarity of a situation in terms of the continuous ball positional features, the uncontrollable features, and the cost of moving the robots from the current situation to match the case controllable features. The case adaptation includes a best match between the positions of the robots in the past case and in the new situation. The robots are assigned an adapted position to which they move to maximize the match to the retrieved case. Case retrieval and reuse are achieved within the distributed team of robots through communication and sharing of own internal states and actions. We evaluate our approach, both in simulation and with real robots, in laboratory scenarios with two attacking robots versus two defending robots as well as versus a defender and a goalie. We show that we achieve the desired coordinated passing behavior, and also outperform a reactive action selection approach. 相似文献

19.

Action-based sensor space segmentation for soccer robot learning

Minoru Asada Sho Ichinoda Koh Hosoda 《Applied Artificial Intelligence》2013,27(2-3):149-164

Robot learning, such as reinforcement learning, generally needs a well-defined state space in order to converge. However, building such a state space is one of the main issues of robot learning because of the interdependence between state and action spaces, which resembles the well-known "chicken and egg" problem. This article proposes a method of action-based state space construction for vision-based mobile robots. Basic ideas to cope with the interdependence are that we define a state as a cluster of input vectors from which the robot can reach the goal state or the state already obtained by a sequence of one kind of action primitive regardless of its length, and that this sequence is defined as one action. To realize these ideas, we need many data (experiences) of the robot and we must cluster the input vectors as hyper ellipsoids so that the whole state space is segmented into a state transition map in terms of action from which the optimal action sequence is obtained. To show the validity of the method, we apply it to a soccer robot that tries to shoot a ball into a goal. The simulation and real experiments are shown. 相似文献

20.

Policy gradient learning for a humanoid soccer robot

A. Cherubini F. Giannone L. Iocchi M. Lombardo G. Oriolo 《Robotics and Autonomous Systems》2009,57(8):808-818

In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at high-level (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning. 相似文献