首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 62 毫秒
1.
部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的.  相似文献   

2.
徐明  刘广钟 《计算机应用》2015,35(11):3047-3050
针对水声传感器网络低带宽、高延迟特性造成的空时不确定性以及网络状态不能充分观察的问题,提出一种基于部分可观测马尔可夫决策过程(POMDP)的水声传感器网络介质访问控制协议.该协议首先将每个传感器节点的链路质量和剩余能量划分为多个离散等级来表达节点的状态信息.此后,接收节点通过信道状态观测和接入动作的历史信息对信道的占用概率进行预测,从而得出发送节点的信道最优调度策略;发送节点按照该策略中的调度序列在各自所分配的时隙内依次与接收节点进行通信,传输数据包.通信完成后,相关节点根据网络转移概率的统计量估计下一个时隙的状态.仿真实验表明,与传统的水声传感器网络介质访问控制协议相比,基于POMDP的介质访问控制协议可以提高数据包传输成功率和网络吞吐量,并且降低网络的能量消耗.  相似文献   

3.
近些年来,群体动画在机器人学、电影、游戏等领域得到了广泛的研究和应用,但传统的群体动画技术均涉及复杂的运动规划或碰撞避免操作,计算效率较低.本文提出了一种基于马尔可夫决策过程(MDPs)的群体动画运动轨迹生成算法,该算法无需碰撞检测即可生成各智能体的无碰撞运动轨迹.同时本文还提出了一种改进的值迭代算法用于求解马尔可夫决策过程的状态-值,利用该算法在栅格环境中进行实验,结果表明该算法的计算效率明显高于使用欧氏距离作为启发式的值迭代算法和Dijkstra算法.利用本文提出的运动轨迹生成算法在三维(3D)动画场景中进行群体动画仿真实验,结果表明该算法可实现群体无碰撞地朝向目标运动,并具有多样性.  相似文献   

4.
张汝波  孟雷  史长亭 《计算机应用》2015,35(8):2375-2379
针对智能水下机器人(AUV)软件故障修复过程中存在的修复代价过高和系统环境只有部分可观察的问题,提出了一种基于微重启技术和部分客观马尔可夫决策(POMDP)模型的AUV软件故障修复方法。该方法结合AUV软件系统分层结构特点,构建了基于微重启的三层重启结构,便于细粒度的自修复微重启策略的实施;并依据部分可观马尔可夫决策过程理论,给出AUV软件自修复POMDP模型,同时采用基于点的值迭代(PBVI)算法求解生成修复策略,以最小化累积修复代价为目标,使系统在部分可观环境下能够以较低的修复代价执行修复动作。仿真实验结果表明,基于微重启技术和POMDP模型的AUV软件故障修复方法能够解决由软件老化及系统调用引起的AUV软件故障,同与两层微重启策略和三层微重启固定策略相比,该方法在累积故障修复时间和运行稳定性上明显更优。  相似文献   

5.
马尔可夫决策过程两种抽象模式   总被引:1,自引:1,他引:1  
抽象层次上马尔可夫决策过程的引入,使得人们可简洁地、陈述地表达复杂的马尔可夫决策过程,解决常规马尔可夫决策过程(MDPs)在实际中所遇到的大型状态空间的表达问题.介绍了结构型和概括型两种不同类型抽象马尔可夫决策过程基本概念以及在各种典型抽象MDPs中的最优策略的精确或近似算法,其中包括与常规MDPs根本不同的一个算法:把Bellman方程推广到抽象状态空间的方法,并且对它们的研究历史进行总结和对它们的发展做一些展望,使得人们对它们有一个透彻的、全面而又重点的理解.  相似文献   

6.
逻辑马尔可夫决策过程和关系马尔可夫决策过程的引入,使得人们可能简洁地、陈述地表达复杂的马尔可夫决策过程。本文首先介绍有关逻辑马尔可夫决策过程和关系马尔可夫决策过程的概念,然后重点介绍它们与普通的马尔可夫决策过程根本不同的一些算法:①依赖于基本状态空间RL的转换法;②把Bellman方程推广到抽象状态空间的方法;③利用策略偏置空间寻求近似最优策略方法。最后对它们的研究现状进行总结及其对它们发展的一些展望。  相似文献   

7.
本文利用(1)中的马尔可夫链,用FORTRAN语言设计了股票价格上扬,下跌平均时间的计算程序,利用马尔可夫决策,实现了股票买进卖出最佳策略的实用程序。  相似文献   

8.
多智能体系统分散式通信决策研究   总被引:1,自引:0,他引:1  
郑延斌  郭凌云  刘晶晶 《计算机应用》2012,32(10):2875-2878
通信是多智能体系统(MAS)之间协调与协作的最有效和最直接的方法,然而通信的代价却限制了该方法的使用。为了减少MAS协调过程中的通信量,提出一种启发式算法,使Agent仅选择能够改善团队期望回报的观察信息进行通信。实验结果证明,对通信信息的选择能够高效的利用通信带宽,有助于提高系统的性能。  相似文献   

9.
马尔可夫决策过程自适应决策的进展   总被引:6,自引:0,他引:6  
在介绍一般马尔可夫决策过程的基础上,分析了当前主要马尔可夫过程自适应决策方法的基本思想、具体算法实现以及相应结论,总结了现有马尔可夫过程自适应决策算法的特点,并指出了需要进一步解决的问题。  相似文献   

10.
马尔可夫决策过程复杂性的熵测度   总被引:3,自引:1,他引:3       下载免费PDF全文
应用Shannon熵和其他熵指数来度量马尔可夫决策的复杂性.将马尔可夫链的复杂性、不确定性和不可预测性的度量扩展到马尔可夫决策,提出一套基于信息理论的复杂性度量方法,可用于随机和确定性策略下的完全观测和不完全观测马尔可夫决策.对有关数值进行仿真研究,并给出了计算结果.  相似文献   

11.
王学宁  贺汉根  徐昕 《控制与决策》2004,19(11):1263-1266
针对部分可观测马氏决策过程(POMDP)中,由于感知混淆现象的存在,利用Sarsa等算法得到的无记忆策略可能发生振荡的现象,研究了一种基于记忆的强化学习算法——CPnSarsa(λ)学习算法来解决该问题.它通过重新定义状态,Agent结合观测历史来识别混淆状态.将CPnSarsa(λ)算法应用到一些典型的POMDP,最后得到的是最优或近似最优策略,与以往算法相比,该算法的收敛速度有了很大提高.  相似文献   

12.
This paper presents a method for learning decision theoretic models of human behaviors from video data. Our system learns relationships between the movements of a person, the context in which they are acting, and a utility function. This learning makes explicit that the meaning of a behavior to an observer is contained in its relationship to actions and outcomes. An agent wishing to capitalize on these relationships must learn to distinguish the behaviors according to how they help the agent to maximize utility. The model we use is a partially observable Markov decision process, or POMDP. The video observations are integrated into the POMDP using a dynamic Bayesian network that creates spatial and temporal abstractions amenable to decision making at the high level. The parameters of the model are learned from training data using an a posteriori constrained optimization technique based on the expectation-maximization algorithm. The system automatically discovers classes of behaviors and determines which are important for choosing actions that optimize over the utility of possible outcomes. This type of learning obviates the need for labeled data from expert knowledge about which behaviors are significant and removes bias about what behaviors may be useful to recognize in a particular situation. We show results in three interactions: a single player imitation game, a gestural robotic control problem, and a card game played by two people.  相似文献   

13.
Decision processes with incomplete state feedback have been traditionally modelled as partially observable Markov decision processes. In this article, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalises the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a framework is far more computationally tractable to the classical alternative. In particular, we show that the infinite horizon decision problem under partial observation, modelled in the proposed framework, is λ-approximable and, in general, is not harder to solve compared to the fully observable case. The approach is illustrated via two simple examples.  相似文献   

14.
针对AUV软件在部分可观环境中的故障修复问题,依据部分可观马尔科夫决策过程理论,提出基于POMDP模型和微重启技术的AUV软件故障修复方法。根据AUV 分层结构特点设计了多层次的微重启修复方法,构建了AUV软件自修复POMDP模型,同时采用基于点的值迭代算法求解生成修复策略使系统在部分可观环境下能够以较低的修复代价执行修复动作。仿真实验验证了算法有效性和模型适用性。  相似文献   

15.
《Artificial Intelligence》2007,171(8-9):453-490
This study extends the framework of partially observable Markov decision processes (POMDPs) to allow their parameters, i.e., the probability values in the state transition functions and the observation functions, to be imprecisely specified. It is shown that this extension can reduce the computational costs associated with the solution of these problems. First, the new framework, POMDPs with imprecise parameters (POMDPIPs), is formulated. We consider (1) the interval case, in which each parameter is imprecisely specified by an interval that indicates possible values of the parameter, and (2) the point-set case, in which each probability distribution is imprecisely specified by a set of possible distributions. Second, a new optimality criterion for POMDPIPs is introduced. As in POMDPs, the criterion is to regard a policy, i.e., an action-selection rule, as optimal if it maximizes the expected total reward. The expected total reward, however, cannot be calculated precisely in POMDPIPs, because of the parameter imprecision. Instead, we estimate the total reward by adopting arbitrary second-order beliefs, i.e., beliefs in the imprecisely specified state transition functions and observation functions. Although there are many possible choices for these second-order beliefs, we regard a policy as optimal as long as there is at least one of such choices with which the policy maximizes the total reward. Thus there can be multiple optimal policies for a POMDPIP. We regard these policies as equally optimal, and aim at obtaining one of them. By appropriately choosing which second-order beliefs to use in estimating the total reward, computational costs incurred in obtaining such an optimal policy can be reduced significantly. We provide an exact solution algorithm for POMDPIPs that does this efficiently. Third, the performance of such an optimal policy, as well as the computational complexity of the algorithm, are analyzed theoretically. Last, empirical studies show that our algorithm quickly obtains satisfactory policies to many POMDPIPs.  相似文献   

16.
In a spoken dialog system, determining which action a machine should take in a given situation is a difficult problem because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Much of the research in spoken dialog systems centres on mitigating this uncertainty and recent work has focussed on three largely disparate techniques: parallel dialog state hypotheses, local use of confidence scores, and automated planning. While in isolation each of these approaches can improve action selection, taken together they currently lack a unified statistical framework that admits global optimization. In this paper we cast a spoken dialog system as a partially observable Markov decision process (POMDP). We show how this formulation unifies and extends existing techniques to form a single principled framework. A number of illustrations are used to show qualitatively the potential benefits of POMDPs compared to existing techniques, and empirical results from dialog simulations are presented which demonstrate significant quantitative gains. Finally, some of the key challenges to advancing this method – in particular scalability – are briefly outlined.  相似文献   

17.
在智能规划问题上,寻找规划解都是NP甚至NP完全问题,如果动作的执行效果带有不确定性,如在Markov决策过程的规划问题中,规划的求解将会更加困难,现有的Markov决策过程的规划算法往往用一个整体状态节点来描述某个动作的实际执行效果,试图回避状态内部的复杂性,而现实中的大量动作往往都会产生多个命题效果,对应多个命题节点。为了能够处理和解决这个问题,提出了映像动作,映像路节和映像规划图等概念,并在其基础上提出了Markov决策过程的蚁群规划算法,从而解决了这一问题。并且证明了算法得到的解,即使在不确定的执行环境下,也具有不低于一定概率的可靠性。  相似文献   

18.
针对连续时间部分可观Markov决策过程(CTPOMDP)的优化问题,本文提出一种策略梯度估计方法. 运用一致化方法,将离散时间部分可观Markov决策过程(DTPOMDP)的梯度估计算法推广到连续时间模型, 研究了算法的收敛性和误差估计问题,并用一个数值例子来说明该算法的应用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号