首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2×2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes will largely be achieved. The effect can select Pareto outcomes that are not Nash equilibria and it can select Pareto optimal outcomes among Nash equilibria.Acknowledgement This material is based upon work supported by, or in part by, NSF grant number SES-9709548. We wish to thank an anonymous referee for a number of very helpful suggestions.  相似文献   

3.
Imitating successful behavior is a natural and frequently applied approach when facing complex decision problems. In this paper, we design protocols for distributed latency minimization in atomic congestion games based on imitation. We propose to study concurrent dynamics that emerge when each agent samples another agent and possibly imitates this agent’s strategy if the anticipated latency gain is sufficiently large. Our focus is on convergence properties. We show convergence in a monotonic fashion to stable states, in which none of the agents can improve their latency by imitating others. As our main result, we show rapid convergence to approximate equilibria, in which only a small fraction of agents sustains a latency significantly above or below average. Imitation dynamics behave like an FPTAS, and the convergence time depends only logarithmically on the number of agents. Imitation processes cannot discover unused strategies, and strategies may become extinct with non-zero probability. For singleton games we show that the probability of this event occurring is negligible. Additionally, we prove that the social cost of a stable state reached by our dynamics is not much worse than an optimal state in singleton games with linear latency functions. We concentrate on the case of symmetric network congestion games, but our results do not use the network structure and continue to hold accordingly for general symmetric games. They even apply to asymmetric games when agents sample within the set of agents with the same strategy space. Finally, we discuss how the protocol can be extended such that, in the long run, dynamics converge to a pure Nash equilibrium.  相似文献   

4.
The suboptimal control program via memoryless state feedback strategies for LQ differential games with multiple players is studied in this paper. Sufficient conditions for the existence of the suboptimal strategies for LQ differential games are presented. It is shown that the suboptimal strategies of LQ differential games are associated with a coupled algebraic Riccati inequality. Furthermore, the problem of designing suboptimal strategies is considered. A non-convex optimization problem with BMI constrains is formulated to design the suboptimal strategies which minimizes the performance indices of the closed-loop LQ differential games and can be solved by using LMI Toolbox of MATLAB, An example is given to illustrate the proposed results.  相似文献   

5.
年晓红 《自动化学报》2005,31(2):216-222
The suboptimal control program via memoryless state feedback strategies for LQ differential games with multiple players is studied in this paper. Sufficient conditions for the existence of the suboptimal strategies for LQ differential games are presented. It is shown that the suboptimal strategies of LQ differential games are associated with a coupled algebraic Riccati inequality. Furthermore, the problem of designing suboptimal strategies is considered. A non-convex optimization problem with BMI constrains is formulated to design the suboptimal strategies which minimizes the performance indices of the closed-loop LQ differential games and can be solved by using LMI Toolbox of MATLAB. An example is given to illustrate the proposed results.  相似文献   

6.
Task-incremental learning (Task-IL) aims to enable an intelligent agent to continuously accumulate knowledge from new learning tasks without catastrophically forgetting what it has learned in the past. It has drawn increasing attention in recent years, with many algorithms being proposed to mitigate neural network forgetting. However, none of the existing strategies is able to completely eliminate the issues. Moreover, explaining and fully understanding what knowledge and how it is being forgotten during the incremental learning process still remains under-explored. In this paper, we propose KnowledgeDrift, a visual analytics framework, to interpret the network forgetting with three objectives: (1) to identify when the network fails to memorize the past knowledge, (2) to visualize what information has been forgotten, and (3) to diagnose how knowledge attained in the new model interferes with the one learned in the past. Our analytical framework first identifies the occurrence of forgetting by tracking the task performance under the incremental learning process and then provides in-depth inspections of drifted information via various levels of data granularity. KnowledgeDrift allows analysts and model developers to enhance their understanding of network forgetting and compare the performance of different incremental learning algorithms. Three case studies are conducted in the paper to further provide insights and guidance for users to effectively diagnose catastrophic forgetting over time.  相似文献   

7.
Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.  相似文献   

8.
全局游戏策略GGP(General Game Playing)旨在开发一种没有游戏经验支撑下能够精通各类游戏的人工智能。在原有强化学习算法研究的基础上,提出一种基于经验的简化学习方法,通过对游戏状态的筛选和游戏经验的归纳,从而降低决策对经验数量的需求,提高决策效率,并能达到指定胜利、平局或失败的游戏目标。通过在三种不同的游戏规则下与玩家进行游戏比赛实验表明,该学习方法能有效地达到预期结果。  相似文献   

9.
Informally, a first-past-the-post game is a (probabilistic) game where the winner is the person who predicts the event that occurs first among a set of events. Examples of first-past-the-post games include so-called block and hidden patterns and the Penney-Ante game invented by Walter Penney. We formalise the abstract notion of a first-past-the-post game, and the process of extending a probability distribution on symbols of an alphabet to the plays of a game. We establish a number of properties of such games, for example, the property that an incomplete first-past-the-post game is also a first-past-the-post game.Penney-Ante games are multi-player games characterised by a collection of regular, prefix-free languages. Analysis of such games is facilitated by a collection of simultaneous (non-linear) equations in languages. Essentially, the equations are due to Guibas and Odlyzko. However, they did not formulate them as equations in languages but as equations in generating functions detailing lengths of words. For such games, we show how to use the equations in languages to calculate the probability of winning and how to calculate the expected length of a game for a given outcome. We also exploit the properties of first-past-the-post games to show how to calculate the probability of winning in the course of a play of the game. In this way, we avoid the construction of a deterministic finite-state machine or the use of generating functions, the two methods traditionally used for the task.We observe that Aho and Corasick's generalisation of the Knuth–Morris–Pratt pattern-matching algorithm can be used to construct the deterministic finite-state machine that recognises the language underlying a Penney-Ante game. The two methods of calculating the probabilities and expected values, one based on the finite-state machine and the other based on the non-linear equations in languages, have been implemented and verified to yield the same results.  相似文献   

10.
This paper analyzes a legendary Chinese horse race problem involving the King of Qi and General Tianji which took place more than 2000 years ago. In this problem each player owns three horses of different speed classes and must choose the sequence of horses to compete against each other. Depending on the payoffs received by the players as a result of the horse races, we analyze two groups of constant-sum games. In each group, we consider three separate cases where the outcomes of the races are (i) deterministic, (ii) probabilistic within the same class, and (iii) probabilistic across classes. In the first group, the player who wins the majority of races receives a one-unit payoff. For this group we show analytically that the three different games with non-singular payoff matrices have the same solution where each player has a unique optimal mixed strategy with equal probabilities. For the second group of games where the payoff to a player is the total number of races his horses have won, we use linear programming with non-numeric data to show that the solution of the three games are mixed strategies given as a convex combination of two extreme points. We invoke results from information theory to prove that to maximize the opponent's “entropy” the players should use the equal probability mixed strategy that was found for the one-unit games.  相似文献   

11.
This paper studies mean‐field games for multiagent systems with control‐dependent multiplicative noises. For the general systems with nonuniform agents, we obtain a set of decentralized strategies by solving an auxiliary limiting optimal control problem subject to consistent mean‐field approximations. The set of decentralized strategies is further shown to be an ε‐Nash equilibrium. For the integrator multiagent systems, we design a set of ε‐Nash strategies by exploiting the convexity property of the limiting problem. It is shown that under the mild conditions, all the agents achieve mean‐square consensus.  相似文献   

12.
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.  相似文献   

13.
This paper provides a rationale for a class of mobile, casual, and educational games, which we call UbiqGames. The study is motivated by the desire to understand how students use educational games in light of additional distractions on their devices, and how game design can make those games appealing, educationally useful, and practical. In particular, we explain the choices made to build an engaging and educational first example of this line of games, namely Weatherlings. Further, we report results from a pilot study with 20 students that suggest that students are engaged by the game and are interested in learning more about academic content topics, specifically weather and climate, after playing the game. Research should continue to determine whether Weatherlings specifically does increase learning in these areas, and more generally to determine whether any learning gains and similar results with regard to engagement can be replicated in other content areas following the general model for game design.  相似文献   

14.
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, “Win or Learn Fast”, for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method.  相似文献   

15.
In this paper, we consider the feedback control on nonzero-sum linear quadratic (LQ) differential games in finite horizon for discrete-time stochastic systems with Markovian jump parameters and multiplicative noise. Four-coupled generalized difference Riccati equations (GDREs) are obtained, which are essential to find the optimal Nash equilibrium strategies and the optimal cost values of the LQ differential games. Furthermore, an iterative algorithm is given to solve the four-coupled GDREs. Finally, a suboptimal solution of the LQ differential games is proposed based on a convex optimization approach and a simplification of the suboptimal solution is given. Simulation examples are presented to illustrate the effectiveness of the iterative algorithm and the suboptimal solution.  相似文献   

16.
A sense of ‘we‐ness’– enacted through collective identity and culture – is both crucial in online, remote contexts, and particularly difficult to develop in such settings. Using Wittgenstein's concept of language games, we examine how participants of two online forums construct collective identity and culture through their discursive practices. We suggest a strong performative interpretation of the notion of language games, i.e. members of a community produce a sense of we‐ness through their participation in the language game while also defining their expected behaviours and actions. We illustrate how the notion of language games offers an approach for researching and analysing the emergence of collective identity and culture in online forums.  相似文献   

17.
Adversarial decision making is aimed at determining optimal decision strategies to deal with an adaptive opponent. A clear example of such situation is the repeated imitation game presented here. Two agents compete in an adversarial model where one agent wants to learn how to imitate the actions taken by the other agent by means of the observation and memorization of the past actions. One defense against this adversary is to make decisions that are intended to confuse him. To achieve this, randomized strategies that change along time for one of the agents are proposed and their performance is analysed from both a theoretical and empirical point of view. We also study the ability of the imitator to avoid deception and adapt to a new behaviour by forgetting the oldest observations. The results confirm that wrong assumptions about the imitator’s behaviour lead to dramatic losses due to a failure in causing deception.  相似文献   

18.
In order to improve the ability of achieving good performance in self-organizing teams, this paper presents a self-adaptive learning algorithm for team members. Members of the self-organizing teams are simulated by agents. In the virtual self-organizing team, agents adapt their knowledge according to cooperative principles. The self-adaptive learning algorithm is approached to learn from other agents with minimal costs and improve the performance of the self-organizing team. In the algorithm, agents learn how to behave (choose different game strategies) and how much to think about how to behave (choose the learning radius). The virtual team is self-adaptively improved according to the strategies’ ability of generating better quality solutions in the past generations. Six basic experiments are manipulated to prove the validity of the adaptive learning algorithm. It is found that the adaptive learning algorithm often causes agents to converge to optimal actions, based on agents’ continually updated cognitive maps of how actions influence the performance of the virtual self-organizing team. This paper considered the influence of relationships in self-organizing teams over existing works. It is illustrated that the adaptive learning algorithm is beneficial to both the development of self-organizing teams and the performance of the individual agent.  相似文献   

19.
We present systems of logic programming agents (LPAS) to model the interactions between decision-makers while evolving to a conclusion. Such a system consists of a number of agents connected by means of unidirectional communication channels. Agents communicate with each other by passing answer sets obtained by updating the information received from connected agents with their own private information. We introduce a credulous answer set semantics for logic programming agents. As an application, we show how extensive games with perfect information can be conveniently represented as logic programming agent systems, where each agent embodies the reasoning of a game player, such that the equilibria of the game correspond with the semantics agreed upon by the agents in the LPAS.  相似文献   

20.
We use case-injected genetic algorithms (CIGARs) to learn to competently play computer strategy games. CIGARs periodically inject individuals that were successful in past games into the population of the GA working on the current game, biasing search toward known successful strategies. Computer strategy games are fundamentally resource allocation games characterized by complex long-term dynamics and by imperfect knowledge of the game state. CIGAR plays by extracting and solving the game's underlying resource allocation problems. We show how case injection can be used to learn to play better from a human's or system's game-playing experience and our approach to acquiring experience from human players showcases an elegant solution to the knowledge acquisition bottleneck in this domain. Results show that with an appropriate representation, case injection effectively biases the GA toward producing plans that contain important strategic elements from previously successful strategies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号