首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.  相似文献   

2.
We investigate the effectiveness of Stackelberg strategies for atomic congestion games with unsplittable demands. In our setting, only a fraction of the players are selfish, while the rest are willing to follow a predetermined strategy. A Stackelberg strategy assigns the coordinated players to appropriately selected strategies trying to minimize the performance degradation due to the selfish players. We consider two orthogonal cases, namely congestion games with affine latency functions and arbitrary strategies, and congestion games on parallel links with arbitrary non-decreasing latency functions. We restrict our attention to pure Nash equilibria and derive strong upper and lower bounds on the pure Price of Anarchy (PoA) under different Stackelberg strategies.  相似文献   

3.
In recent years, great strides have been made towards creating autonomous agents that can learn via interaction with their environment. When considering just an individual agent, it is often appropriate to model the world as being stationary, meaning that the same action from the same state will always yield the same (possibly stochastic) effects. However, in the presence of other independent agents, the environment is not stationary: an action’s effects may depend on the actions of the other agents. This non-stationarity poses the primary challenge of multiagent learning and comprises the main reason that it is best considered distinctly from single agent learning. The multiagent learning problem is often studied in the stylized settings provided by repeated matrix games. The goal of this article is to introduce a novel multiagent learning algorithm for such a setting, called Convergence with Model Learning and Safety (or CMLeS), that achieves a new set of objectives which have not been previously achieved. Specifically, CMLeS is the first multiagent learning algorithm to achieve the following three objectives: (1) converges to following a Nash equilibrium joint-policy in self-play; (2) achieves close to the best response when interacting with a set of memory-bounded agents whose memory size is upper bounded by a known value; and (3) ensures an individual return that is very close to its security value when interacting with any other set of agents. Our presentation of CMLeS is backed by a rigorous theoretical analysis, including an analysis of sample complexity wherever applicable.  相似文献   

4.
Sampled fictitious play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However, there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems.  相似文献   

5.
Conjectural Equilibrium in Multiagent Learning   总被引:2,自引:0,他引:2  
Wellman  Michael P.  Hu  Junling 《Machine Learning》1998,33(2-3):179-200
Learning in a multiagent environment is complicated by the fact that as other agents learn, the environment effectively changes. Moreover, other agents' actions are often not directly observable, and the actions taken by the learning agent can strongly bias which range of behaviors are encountered. We define the concept of a conjectural equilibrium, where all agents' expectations are realized, and each agent responds optimally to its expectations. We present a generic multiagent exchange situation, in which competitive behavior constitutes a conjectural equilibrium. We then introduce an agent that executes a more sophisticated strategic learning strategy, building a model of the response of other agents. We find that the system reliably converges to a conjectural equilibrium, but that the final result achieved is highly sensitive to initial belief. In essence, the strategic learner's actions tend to fulfill its expectations. Depending on the starting point, the agent may be better or worse off than had it not attempted to learn a model of the other agents at all.  相似文献   

6.
We consider a group of agents on a graph who repeatedly play the prisoner’s dilemma game against their neighbors. The players adapt their actions to the past behavior of their opponents by applying the win-stay lose-shift strategy. On a finite connected graph, it is easy to see that the system learns to cooperate by converging to the all-cooperate state in a finite time. We analyze the rate of convergence in terms of the size and structure of the graph. Dyer et al. (2002) showed that the system converges rapidly on the cycle, but that it takes a time exponential in the size of the graph to converge to cooperation on the complete graph. We show that the emergence of cooperation is exponentially slow in some expander graphs. More surprisingly, we show that it is also exponentially slow in bounded-degree trees, where many other dynamics are known to converge rapidly. Editors: Amy Greenwald and Michael Littman.  相似文献   

7.
A widely accepted rational behavior for non-cooperative players is based on the notion of Nash equilibrium. Although the existence of a Nash equilibrium is guaranteed in the mixed framework (i.e., when players select their actions in a randomized manner) in many real-world applications the existence of “any” equilibrium is not enough. Rather, it is often desirable to single out equilibria satisfying some additional requirements (in order, for instance, to guarantee a minimum payoff to certain players), which we call constrained Nash equilibria.In this paper, a formal framework for specifying these kinds of requirement is introduced and investigated in the context of graphical games, where a player p may directly be interested in some of the other players only, called the neighbors of p. This setting is very useful for modeling large population games, where typically each player does not directly depend on all the players, and representing her utility function extensively is either inconvenient or infeasible.Based on this framework, the complexity of deciding the existence and of computing constrained equilibria is then investigated, in the light of evidencing how the intrinsic difficulty of these tasks is affected by the requirements prescribed at the equilibrium and by the structure of players’ interactions. The analysis is carried out for the setting of mixed strategies as well as for the setting of pure strategies, i.e., when players are forced to deterministically choose the action to perform. In particular, for this latter case, restrictions on players’ interactions and on constraints are identified, that make the computation of Nash equilibria an easy problem, for which polynomial and highly-parallelizable algorithms are presented.  相似文献   

8.
针对多机器人协调问题,利用协调博弈中智能体策略相似性,提出智能体的高阶信念修正模型和学习方法PEL,使智能体站在对手角度进行换位推理,进而根据信念修正将客观观察行为和主观信念推理结合起来。证明了信念修正模型的推理置信度只在0和1两个值上调整即可协调成功。以多机器人避碰为实验背景进行仿真,表明算法比现有方法能够取得更好的协调性能。  相似文献   

9.
The work deals with a class of discrete‐time zero‐sum Markov games under a discounted optimality criterion with random state‐action‐dependent discount factors of the form , where xn,an,bn, and ξn+1 are the state, the actions of players, and a random disturbance at time n, respectively, taking values in Borel spaces. The one‐stage payoff is assumed to be possibly unbounded. In addition, the process {ξn} is formed by observable, independent, and identically distributed random variables with common distribution θ, which is unknown to players. By using the empirical distribution to estimate θ, we introduce a procedure to approximate the value V? of the game; such a procedure yields construction schemes of stationary optimal strategies and asymptotically optimal Markov strategies.  相似文献   

10.
基于特定角色上下文的多智能体Q学习   总被引:1,自引:0,他引:1  
One of the main problems in cooperative multiagent learning is that the joint action space grows exponentially with the number of agents. In this paper, we investigate a sparse representation of the coordination dependencies between agents to employ roles and context-specific coordination graphs to reduce the joint action space. In our framework, the global joint Q-function is decomposed into a number of local Q-functions. Each local Q-function is shared among a small group of agents and is composed of a set of value rules. We propose a novel multiagent Q-learning algorithm which learns the weights in each value rule automatically. We give empirical evidence to show that our learning algorithm converges to the same optimal policy with a significantly faster speed than traditional multiagent learning techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号