首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 234 毫秒
1.
仵博  吴敏 《控制与决策》2013,28(6):925-929
针对部分可观察马尔可夫决策过程(POMDPs)的信念状态空间是一个双指数规模问题,提出一种基于 Monte Carlo 粒子滤波的 POMDPs 在线算法.首先,分别采用粒子滤波和粒子映射更新和扩展信念状态,建立可达信念状态与或树;然后,采用分支界限裁剪方法对信念状态与或树进行裁剪,降低求解规模.实验结果表明,所提出算法具有较低的误差率和较快的收敛性,能够满足系统实时性的要求.  相似文献   

2.
在连续状态的部分可观察马尔可夫决策过程中,在线规划无法同时满足高实时性与低误差的要求.为此,提出一种基于后验信念聚类的在线规划算法.使用KL散度分析连续状态下后验信念之间的误差,根据误差分析结果对后验信念进行聚类,利用聚类后验信念计算报酬值,并采用分支界限裁剪方法裁剪后验信念与或树.实验结果表明,该算法能够有效降低求解问题的规模,消除重复计算,具有较好的实时性和较低的误差.  相似文献   

3.
针对无线传感器网络(WSNs)中目标跟踪性能与传感器能量消耗难以平衡问题,提出一种信念重用的WSNs能量高效跟踪算法。使用部分可观察马尔可夫决策过程(POMDPs)对动态不确定环境下的WSNs进行建模,将跟踪性能与能量消耗平衡优化问题转化为POMDPs最优值函数求解过程;采用最大报酬值启发式查找方法获得跟踪性能的逼近最优值;采用信念重用方法避免重复获取信念,有效降低传感器通信带来的能量消耗。实验结果表明:信念重用算法能够有效优化跟踪性能与能量消耗之间的平衡,达到以较低的能量消耗获得较高跟踪性能的目的。  相似文献   

4.
针对大规模部分可观察马尔可夫决策过程(POMDP)算法中策略树规模指数级增长、已证信念点(witness point,WP)求解困难的问题,根据策略树值函数是分段线性凸函数的特点,提出一种基于信念点的策略树增量裁剪和值迭代求解算法.在策略树生成过程中,利用边界点进行无损裁剪,利用中间点进行有损裁剪,并利用实时信念状态分布求取近似最优解.对比实验结果表明,该算法能快速收敛,以更少的时间获得相当精度的奖赏值.  相似文献   

5.
冯延蓬  仵博  郑红燕  孟宪军 《计算机工程》2012,38(11):96-99,103
针对目标追踪无线传感器网络节点能量有限、感知信息存在不确定性等问题,提出一种基于部分可观察马尔可夫决策过程的在线节点调度算法。通过状态转移函数和观察函数描述移动目标的不确定性,根据奖赏函数平衡追踪性能和节点能量消耗,并构造有限深度的可达信念与或树降低运算复杂度,实现调度策略在线求解。实验结果表明,该算法能平衡目标追踪质量与节点能量消耗,且满足实时性 要求。  相似文献   

6.
部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的.  相似文献   

7.
基于采样的POMDP近似算法   总被引:1,自引:0,他引:1  
部分可观察马尔科夫决策过程(POMDP)是一种描述机器人在动态不确定环境下行动选择的问题模型。对于具有稀疏转移矩阵的POMDP问题模型,该文提出了一种求解该问题模型的快速近似算法。该算法首先利用QMDP算法产生的策略进行信念空间采样,并通过点迭代算法快速生成POMDP值函数,从而产生近似的最优行动选择策略。在相同的POMDP试验模型上,执行该算法产生的策略得到的回报值与执行其他近似算法产生的策略得到的回报值相当,但该算法计算速度快,它产生的策略表示向量集合小于现有其他近似算法产生的集合。因此,它比这些近似算法更适应于大规模的稀疏状态转移矩阵POMDP模型求解计算。  相似文献   

8.
约束模型预测控制(model predictive control,MPC)在实际应用中优化计算复杂度高,无法在采样周期内完成优化以保证系统实时性.本文针对这一问题,提出采用双速率框架的快速预测控制算法(DSF–MPC).该算法将实时控制量的求解分解到两个时间尺度上进行,即双速率框架:每隔数个采样周期,慢速率层负责完成一次对完整MPC优化问题的求解;而在每个采样周期,快速率层负责根据系统反馈信息和慢速率层算法预测信息的差值,朝着使目标函数值下降的负梯度方向,修正慢速率层的优化结果来获取实际控制量,以满足控制的实时性要求.该算法不要求在每个采样周期内都完成MPC中的在线优化,故能在继承MPC优点的同时,满足快速系统的控制实时性要求.针对直流电动机和倒立摆组合系统的仿真结果,验证了该算法的有效性,反映了其在快速系统中的应用潜力.  相似文献   

9.
基于试探(trial-based)的值迭代算法是求解部分可观察Markov决策过程(partially observable Markov decision process,POMDP)模型的一类有效算法,其中FSVI算法是目前最快的算法之一.然而对于较大规模的POMDP问题,FSVI计算MDP值函数的时间是不容忽视的.提出一种基于最短哈密顿通路(shortest Hamiltonian path)的值迭代算法(shortest Hamiltonian path-based value iteration,SHP-VI).该方法用求解最短哈密顿通路问题的蚁群算法计算一条最优信念状态轨迹,然后在这些信念状态上反向更新值函数.通过与FSVI算法的实验比较,结果表明SHP-VI算法很大程度地提高了基于试探的算法计算信念状态轨迹的效率.  相似文献   

10.
陈道琨  刘芳芳  杨超 《软件学报》2022,33(8):4452-4463
很多强化学习方法较少地考虑决策的安全性,但研究领域和工业应用领域都要求的智能体所做决策是安全的.解决智能体决策安全问题的传统方法主要有改变目标函数、改变智能体的探索过程等,然而这些方法忽略了智能体遭受的损害和成本,因此不能有效地保障决策的安全性.在受限马尔可夫决策过程的基础上,通过对动作空间添加安全约束,设计了安全Sarsa (λ)方法和安全Sarsa方法.在求解过程中,不仅要求智能体得到最大的状态-动作值,还要求其满足安全约束的限制,从而获得安全的最优策略.由于传统的强化学习求解方法不再适用于求解带约束的安全Sarsa (λ)模型和安全Sarsa模型,为在满足约束条件下得到全局最优状态-动作值函数,提出了安全强化学习的求解模型.求解模型基于线性化多维约束,采用拉格朗日乘数法,在保证状态-动作值函数和约束函数具有可微性的前提下,将安全强化学习模型转化为凸模型,避免了在求解过程中陷入局部最优解的问题,提高了算法的求解效率和精确度.同时,给出了算法的可行性证明.最后,实验验证了算法的有效性.  相似文献   

11.
Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system.  相似文献   

12.
Continuous-state partially observable Markov decision processes (POMDPs) are an intuitive choice of representation for many stochastic planning problems with a hidden state. We consider a continuous-state POMDPs with finite action and observation spaces, where the POMDP is parametrised by weighted sums of Gaussians, or Gaussian mixture models (GMMs). In particular, we study the problem of optimising the selection of measurement channel in such a framework. A new error bound for a point-based value iteration algorithm is derived, and a method for constructing a subset of belief states that attempts to reduce the error bound is implemented. In the experiments, applying continuous-state POMDPs for optimal selection of the measurement channel is demonstrated, and the performance of three GMM simplification methods is compared. Convergence of a point-based value iteration algorithm is investigated by considering various metrics for the obtained control policies.  相似文献   

13.
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.  相似文献   

14.
Solving partially observable Markov decision processes (POMDPs) is a complex task that is often intractable. This paper examines the problem of finding an optimal policy for POMDPs. While a lot of effort has been made to develop algorithms to solve POMDPs, the question of automatically finding good low-dimensional spaces in multi-agent co-operative learning domains has not been explored thoroughly. To identify this question, an online algorithm CMEAS is presented to improve the POMDP model. This algorithm is based on a look-ahead search to find the best action to execute at each cycle. Thus the overwhelming complexity of computing a policy for each possible situation is avoided. A series of simulations demonstrate this good strategy and performance of the proposed algorithm when multiple agents co-operate to find an optimal policy for POMDPs.  相似文献   

15.
In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.  相似文献   

16.
Uncertainty in motion planning is often caused by three main sources: motion error, sensing error, and imperfect environment map. Despite the significant effect of all three sources of uncertainty to motion planning problems, most planners take into account only one or at most two of them. We propose a new motion planner, called Guided Cluster Sampling (GCS), that takes into account all three sources of uncertainty for robots with active sensing capabilities. GCS uses the Partially Observable Markov Decision Process (POMDP) framework and the point-based POMDP approach. Although point-based POMDPs have shown impressive progress over the past few years, it performs poorly when the environment map is imperfect. This poor performance is due to the extremely high dimensional state space, which translates to the extremely large belief space?B. We alleviate this problem by constructing a more suitable sampling distribution based on the observations that when the robot has active sensing capability, B can be partitioned into a collection of much smaller sub-spaces, and an optimal policy can often be generated by sufficient sampling of a small subset of the collection. Utilizing these observations, GCS samples B in two-stages, a subspace is sampled from the collection and then a belief is sampled from the subspace. It uses information from the set of sampled sub-spaces and sampled beliefs to guide subsequent sampling. Simulation results on marine robotics scenarios suggest that GCS can generate reasonable policies for motion planning problems with uncertain motion, sensing, and environment map, that are unsolvable by the best point-based POMDPs today. Furthermore, GCS handles POMDPs with continuous state, action, and observation spaces. We show that for a class of POMDPs that often occur in robot motion planning, given enough time, GCS converges to the optimal policy. To the best of our knowledge, this is the first convergence result for point-based POMDPs with continuous action space.  相似文献   

17.
戴帅  殷苌茗  张欣 《计算机工程》2009,35(13):190-192
提出一种新的基于因素法方法的TD(λ)算法。其基本思想是状态因素化表示,通过动态贝叶斯网络表示Markov决策过程(MDP)中的状态转移概率函数,结合决策树表示TD(λ)算法中的状态值函数,降低状态空间的搜索与计算复杂度,因而适用于求解大状态空间的MDPs问题,实验证明该表示方法是有效的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号