首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
图型博弈是一种新的博弈表示方法。求解Nash均衡是图型博弈的核心问题。论文把求解图型博弈的Nash均衡看作是离散空间中的优化问题,给出了求解图型博弈ε-Nash均衡的迭代优化算法。另外,为加快算法的收敛速度,提出了一个获得高迭代效率策略剖面的方法:基于博弈的图形结构进行多策略更新。实验结果表明算法是可行、高效的。  相似文献   

2.
柴玉梅  张靖 《计算机应用》2007,27(9):2287-2289
在博弈问题中很多学习机制只能使Agent收敛到Nash均衡解,不能很好地满足实际需要。将博弈问题转化为多目标优化问题,提出了一种新的多目标优化策略机制——保留受控策略机制,并将其应用到囚徒困境问题中得到比Nash均衡更有意义的Pareto最优解,在自博弈实验中取得了较高的满意度。实验结果表明,该策略机制求解Pareto最优解的有效性。  相似文献   

3.
连续博弈中至少存在一个混合策略Nash均衡,但是关于无限策略混合策略Nash均衡的解法,以及局中人的策略集或是效益函数是不确定性博弈均衡问题,国内外相关的研究成果还比较少。运用粒子群算法对目标函数没有严格要求,参数较少,编码简单的优势,创立了一种计算无限策略混合策略的近似算法;并在此基础上提出了粗糙博弈论的概念,以粗糙集和Vague集的理论为基础,发现了一种粗糙博弈论转化为经典博弈论的方法。无限策略混合策略Nash均衡的近似算法和粗糙博弈论的研究为策略集和效益函数不确定时的博弈问题提供了理论依据。算法示例结果表明,基于改进的粒子群算法的无限策略混合策略Nash均衡近似算法和粗糙博弈论的解法是有效可行的。  相似文献   

4.
基于博弈策略强化学习的函数优化算法   总被引:2,自引:0,他引:2  
该文提出了一种基于博弈论的函数优化算法。算法将优化问题的搜索空间映射为博弈的策略组合空间,优化目标函数映射为博弈的效用函数,通过博弈策略的强化学习过程智能地求解函数优化问题。文章给出了算法的形式定义及描述,然后在一组标准的函数优化测试集上进行了仿真运算,验证了算法的有效性。  相似文献   

5.
针对n人非合作博弈多重Nash均衡求解问题,提出一种自适应小生境粒子群算法。该算法融合了序列小生境技术、粒子群优化算法的思想,并加入了变异算子和自动生成小生境半径机制,使得所有粒子尽可能分布到整个搜索空间的不同局部峰值区域,从而有效地求得博弈问题的多重Nash均衡。最后给出几个数值算例,计算结果表明所提出的算法具有较好的性能。  相似文献   

6.
给出了一种求解某类n×n矩阵博弈Nash均衡的近似解的算法。通过剖分单纯形,将混合策略空间离散化,利用初始的单纯形根据标号函数和替换规则求出此类矩阵博弈Nash均衡的近似解。并分析了其最优解与近似解的计算误差。  相似文献   

7.
为了搜索空域扇区优化中的满意解,结合计算几何和模拟退火算法对空域扇区优化问题进行了研究。根据管制空域结构和交通流量空间分布,建立空域扇区分割的模糊多目标函数和约束条件函数,提出划设空域的二分策略,并结合模拟退火算法对扇区优化划设问题进行求解。实例分析表明,结合二分策略的模拟退火方法可获得满意解,扇区划设多目标优化的总体满意度比仅考虑均衡扇区平均流量时提高了2.1%。  相似文献   

8.
针对需求信息不完备的情况,以及产品配置中客户满意度和企业产品成本冲突问题,提出一种基于客户需求的多目标推理求解方法.该方法引入Bayes-Nash均衡理论建立产品多域节点之间的均衡模型;以企业和客户作为决策主体,企业产品成本和客户满意度作为博弈的收益函数,在结构、性能和成本域相似度最大值搜索的基础上确定博弈双方的策略集;采用模拟退火算法求解Nash均衡解,得出符合博弈双方利益的产品配置方案.最后以某车间自动导引运输车配置推理为例,证明了文中方法的有效性.  相似文献   

9.
亢良伊  王建飞  刘杰  叶丹 《软件学报》2018,29(1):109-130
机器学习问题通常会转换成一个目标函数去求解,优化算法是求解目标函数中参数的重要工具.在大数据环境下,需要设计并行与分布式的优化算法,通过多核计算和分布式计算技术来加速训练过程.近年来,该领域涌现了大量研究工作,部分算法也在各机器学习平台得到广泛应用.本文针对梯度下降算法、二阶优化算法、邻近梯度算法、坐标下降算法、交替方向乘子算法五类最常见的优化方法展开研究,每一类算法分别从单机并行和分布式并行来分析相关研究成果,并从模型特性、输入数据特性、算法评价、并行计算模型等角度对每个算法进行详细对比.随后对有代表性的可扩展机器学习平台中优化算法的实现和应用情况进行对比分析.同时对本文中介绍的所有优化算法进行多层次分类,方便用户根据目标函数类型选择合适的优化算法,也可以通过该多层次分类图交叉探索如何将优化算法应用到新的目标函数类型.最后分析了现有优化算法存在的问题,提出可能的解决思路,并对未来研究方向进行展望.  相似文献   

10.
基于协同进化博弈的多学科设计优化   总被引:1,自引:0,他引:1  
复杂系统的设计问题可以非层次分解为并行的多个子空间优化设计问题。多学科优化的迭代过程可看成子空间博弈的过程。各冲突子目标协商一致条件下,子空间合作博弈的均衡点能达成原系统的整体最优,并给出协同进化算法求解博弈的Nash均衡点的计算框架。以某型民用客机的总体优化设计为例,将其分解成气动和重量两个子空间优化。设计变量不重叠地分布于各子空间,两冲突子目标分配相同权值,线性加权组合而形成的单目标作为各子空间共同的优化目标。计算结果表明此方法是有效的。  相似文献   

11.
姜永  胡博  陈山枝 《计算机学报》2012,35(6):1249-1261
针对异构无线网络(Heterogeneous Wireless Networks,HWNs)负载平衡问题,提出了一种基于群体博弈的用户网络关联方案.首先将HWNs系统用户网络关联问题抽象成一个群体博弈模型,根据用户在网络中得到的收益函数,证明该群体博弈满足势博弈的条件.利用复制动态作为演化动态工具,证明演化的结果最终会收敛到纳什均衡,这个特性确保了每个用户关联到一个效用最优的网络.然后证明纳什均衡点能最大化整个HWNs系统的吞吐量,保证了纳什均衡的有效性.最后,基于复制动态原理提出了用户网络关联算法.仿真实验模拟了用户网络选择过程,得到了均衡点,验证了理论分析的结果.  相似文献   

12.
This paper is concerned with anti-disturbance Nash equilibrium seeking for games with partial information. First, reduced-order disturbance observer-based algorithms are proposed to achieve Nash equilibrium seeking for games with first-order and second-order players, respectively. In the developed algorithms, the observed disturbance values are included in control signals to eliminate the influence of disturbances, based on which a gradient-like optimization method is implemented for each player. Second, a signum function based distributed algorithm is proposed to attenuate disturbances for games with second-order integrator-type players. To be more specific, a signum function is involved in the proposed seeking strategy to dominate disturbances, based on which the feedback of the velocity-like states and the gradients of the functions associated with players achieves stabilization of system dynamics and optimization of players’ objective functions. Through Lyapunov stability analysis, it is proven that the players’ actions can approach a small region around the Nash equilibrium by utilizing disturbance observer-based strategies with appropriate control gains. Moreover, exponential (asymptotic) convergence can be achieved when the signum function based control strategy (with an adaptive control gain) is employed. The performance of the proposed algorithms is tested by utilizing an integrated simulation platform of virtual robot experimentation platform (V-REP) and MATLAB.   相似文献   

13.
We consider a continuous-time form of repeated matrix games in which player strategies evolve in reaction to opponent actions. Players observe each other's actions, but do not have access to other player utilities. Strategy evolution may be of the best response sort, as in fictitious play, or a gradient update. Such mechanisms are known to not necessarily converge. We introduce a form of "dynamic" fictitious and gradient play strategy update mechanisms. These mechanisms use derivative action in processing opponent actions and, in some cases, can lead to behavior converging to Nash equilibria in previously nonconvergent situations. We analyze convergence in the case of exact and approximate derivative measurements of the dynamic update mechanisms. In the ideal case of exact derivative measurements, we show that convergence to Nash equilibrium can always be achieved. In the case of approximate derivative measurements, we derive a characterization of local convergence that shows how the dynamic update mechanisms can converge if the traditional static counterparts do not. We primarily discuss two player games, but also outline extensions to multiplayer games. We illustrate these methods with convergent simulations of the well known Shapley and Jordan counterexamples.  相似文献   

14.
We study graphical games where the payoff function of each player satisfies one of four types of symmetry in the actions of his neighbors. We establish that deciding the existence of a pure Nash equilibrium is NP-hard in general for all four types. Using a characterization of games with pure equilibria in terms of even cycles in the neighborhood graph, as well as a connection to a generalized satisfiability problem, we identify tractable subclasses of the games satisfying the most restrictive type of symmetry. Hardness for a different subclass leads us to identify a satisfiability problem that remains NP-hard in the presence of a matching, a result that may be of independent interest. Finally, games with symmetries of two of the four types are shown to possess a symmetric mixed equilibrium which can be computed in polynomial time. We thus obtain a natural class of games where the pure equilibrium problem is computationally harder than the mixed equilibrium problem, unless P=NP.  相似文献   

15.
《Automatica》2014,50(12):3038-3053
This paper introduces a new class of multi-agent discrete-time dynamic games, known in the literature as dynamic graphical games. For that reason a local performance index is defined for each agent that depends only on the local information available to each agent. Nash equilibrium policies and best-response policies are given in terms of the solutions to the discrete-time coupled Hamilton–Jacobi equations. Since in these games the interactions between the agents are prescribed by a communication graph structure we have to introduce a new notion of Nash equilibrium. It is proved that this notion holds if all agents are in Nash equilibrium and the graph is strongly connected. A novel reinforcement learning value iteration algorithm is given to solve the dynamic graphical games in an online manner along with its proof of convergence. The policies of the agents form a Nash equilibrium when all the agents in the neighborhood update their policies, and a best response outcome when the agents in the neighborhood are kept constant. The paper brings together discrete Hamiltonian mechanics, distributed multi-agent control, optimal control theory, and game theory to formulate and solve these multi-agent dynamic graphical games. A simulation example shows the effectiveness of the proposed approach in a leader-synchronization case along with optimality guarantees.  相似文献   

16.
We consider the learning problem faced by two self-interested agents repeatedly playing a general-sum stage game. We assume that the players can observe each other’s actions but not the payoffs received by the other player. The concept of Nash Equilibrium in repeated games provides an individually rational solution for playing such games and can be achieved by playing the Nash Equilibrium strategy for the single-shot game in every iteration. Such a strategy, however can sometimes lead to a Pareto-Dominated outcome for games like Prisoner’s Dilemma. So we prefer learning strategies that converge to a Pareto-Optimal outcome that also produces a Nash Equilibrium payoff for repeated two-player, n-action general-sum games. The Folk Theorem enable us to identify such outcomes. In this paper, we introduce the Conditional Joint Action Learner (CJAL) which learns the conditional probability of an action taken by the opponent given its own actions and uses it to decide its next course of action. We empirically show that under self-play and if the payoff structure of the Prisoner’s Dilemma game satisfies certain conditions, a CJAL learner, using a random exploration strategy followed by a completely greedy exploitation technique, will learn to converge to a Pareto-Optimal solution. We also show that such learning will generate Pareto-Optimal payoffs in a large majority of other two-player general sum games. We compare the performance of CJAL with that of existing algorithms such as WOLF-PHC and JAL on all structurally distinct two-player conflict games with ordinal payoffs.  相似文献   

17.
In this paper, examining some games, we show that classical techniques are not always effective for games with not many stages and players and it can’t be claimed that these techniques of solution always obtain the optimal and actual Nash equilibrium point. For solving these problems, two evolutionary algorithms are then presented based on the population to solve general dynamic games. The first algorithm is based on the genetic algorithm and we use genetic algorithms to model the players' learning process in several models and evaluate them in terms of their convergence to the Nash Equilibrium. in the second algorithm, a Particle Swarm Intelligence Optimization (PSO) technique is presented to accelerate solutions’ convergence. It is claimed that both techniques can find the actual Nash equilibrium point of the game keeping the problem’s generality and without imposing any limitation on it and without being caught by the local Nash equilibrium point. The results clearly show the benefits of the proposed approach in terms of both the quality of solutions and efficiency.  相似文献   

18.
Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games, e.g., StarCraft and poker. Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge. However, it needs to train a neural network in an off-policy manner to approximate the action values. For games with large search spaces, the training may suffer from unnecessary exploration and sometimes fails to converge. In this paper, we propose a new Neural Fictitious Self-Play algorithmthat combinesMonte Carlo tree search with NFSP, called MC-NFSP, to improve the performance in real-time zero-sum imperfect-information games. With experiments and empirical analysis, we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not. Furthermore, we develop an Asynchronous Neural Fictitious Self-Play framework (ANFSP). It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality. The experiments with th e games with hidden state information (Texas Hold’em), and the FPS (firstperson shooter) games demonstrate effectiveness of our algorithms.  相似文献   

19.
This paper addresses output regulation of heterogeneous linear multi‐agent systems. We first show that output regulation can be achieved through local controller design, then we formulate output regulation in the graphical game framework. To solve output regulation of heterogeneous linear multi‐agent system in the graphical game framework, one needs to derive a solution to the coupled Hamilton–Jacobi equations. Both offline and online algorithms are suggested for that solution. Using the online method, the profile policy converges to a Nash Equilibrium. Besides, it is shown that the graphical formulation is robust to the multiplicative uncertainty satisfying an upper bound and it has an infinite gain margin. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
In game theory the interaction among players obligates each player to develop a belief about the possible strategies of the other players, to choose a best-reply given those beliefs, and to look for an adjustment of the best-reply and the beliefs using a learning mechanism until they reach an equilibrium point. Usually, the behavior of an individual cost-function, when such best-reply strategies are applied, turns out to be non-monotonic and concluding that such strategies lead to some equilibrium point is a non-trivial task. Even in repeated games the convergence to a stationary equilibrium is not always guaranteed. The best-reply strategies analyzed in this paper represent the most frequent type of behavior applied in practice in problems of bounded rationality of agents considered within the Artificial Intelligence research area. They are naturally related with the, so-called, fixed-local-optimal actions or, in other words, with one step-ahead optimization algorithms widely used in the modern Intelligent Systems theory.This paper shows that for an ergodic class of finite controllable Markov games the best-reply strategies lead necessarily to a Lyapunov/Nash equilibrium point. One of the most interesting properties of this approach is that an expedient (or absolutely expedient) behavior of an ergodic system (repeated game) can be represented by a Lyapunov-like function non-decreasing in time. We present a method for constructing a Lyapunov-like function: the Lyapunov-like function replaces the recursive mechanism with the elements of the ergodic system that model how players are likely to behave in one-shot games. To show our statement, we first propose a non-converging state-value function that fluctuates (increases and decreases) between states of the Markov game. Then, we prove that it is possible to represent that function in a recursive format using a one-step-ahead fixed-local-optimal strategy. As a result, we prove that a Lyapunov-like function can be built using the previous recursive expression for the Markov game, i.e., the resulting Lyapunov-like function is a monotonic function which can only decrease (or remain the same) over time, whatever the initial distribution of probabilities. As a result, a new concept called Lyapunov games is suggested for a class of repeated games. Lyapunov games allow to conclude during the game whether the applied strategy provides the convergence to an equilibrium point (or not). The time for constructing a Potential (Lyapunov-like) function is exponential. Our algorithm tractably computes the Nash, Lyapunov and the correlated equilibria: a Lyapunov equilibrium is a Nash equilibrium, as well it is also a correlated equilibrium. Validity of the proposed method is successfully demonstrated both theoretically and practically by a simulated experiment related to the Duel game.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号