首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
多Agent系统中(MAS),所有的Agent都在不断学习,对于单个Agent来说就是一个学习移动目标的问题.PHC(policy hill climb)算法理性但自博弈时并不收敛.不过,PHC自博弈时的平均策略却能够快速且精确地收敛到纳什均衡(NE).在一些需要NE作为先验知识或需要收敛到NE的算法中,可以通过增加一个PHC自博弈过程来估计NE,然后再根据对手策略采取进一步措施.这样,不仅可以避免使用其他算法计算NE,而且能够使学习者至少获得均衡回报.Exploiter-PHC算法(Exploiter算法)能够击败大多数公平对手但需要NE作为先验知识,且自博弈时也不收敛.在其中加入预检测过程后的算法ExploiterWT(exploiter with testing)能够收敛且不需要先验知识.除此之外,该过程也可以加入其他一些算法之中.  相似文献   

2.
本文研究具有人机交互能力的强化学习算法.通过人机交互给出操作者对学习结果的性能评价,智能体系统能获得当前状态与目标状态距离的度量,有效地结合操作者的先验知识和专业知识,从而使智能体在状态空间中能进行更有效的搜索,简化复杂任务的学习过程.以猜数字游戏为例,利用提出的学习框架训练智能体具有猜数字的能力.实验结果表明,结合人机交互的强化学习算法大大提高了学习效率.加快了学习过程的收敛速度.  相似文献   

3.
针对强化学习收敛速度慢的问题,提出可在线更新的信息强度引导的启发式Q学习算法。该算法在启发式强化学习算法的基础上引入依据每次训练回报情况进行在线更新的信息强度,通过结合强弱程度不同的动作信息强度更新的启发函数和状态-动作值函数来确定策略,从而提高算法收敛速度。给出该算法并对其收敛性进行证明,同时针对不同仿真环境和参数设置进行路径规划的仿真对比实验以验证其性能。实验结果表明信息强度引导的启发式Q学习算法能更快地得到回报较高的策略且不会陷入局部收敛,能够有效提高算法的收敛速度。  相似文献   

4.
针对强化学习的大多数探索/利用策略在探索过程中忽略智能体随机选择动作带来的风险的问题,提出一种基于因子分解机(FM)用于安全探索的Q表初始化方法。首先,引入Q表中已探索的Q值作为先验知识;然后,利用FM建立先验知识中状态和行动间潜在的交互作用的模型;最后,基于该模型预测Q表中的未知Q值,从而进一步引导智能体探索。在OpenAI Gym的网格强化学习环境Cliffwalk中进行的A/B测试里,基于所提方法的Boltzmann和置信区间上界(UCB)探索/利用策略的不良探索幕数分别下降了68.12%和89.98%。实验结果表明,所提方法提高了传统策略的探索安全性,同时加快了收敛。  相似文献   

5.
基于经验知识的Q-学习算法   总被引:1,自引:0,他引:1  
为了提高智能体系统中的典型的强化学习Q-学习的学习速度和收敛速度,使学习过程充分利用环境信息,本文提出了一种基于经验知识的Q-学习算法.该算法利用具有经验知识信息的函数,使智能体在进行无模型学习的同时学习系统模型,避免对环境模型的重复学习,从而加速智能体的学习速度.仿真实验结果表明:该算法使学习过程建立在较好的学习基础上,从而更快地趋近于最优状态,其学习效率和收敛速度明显优于标准的Q-学习.  相似文献   

6.
傅启明  刘全  伏玉琛  周谊成  于俊 《软件学报》2013,24(11):2676-2686
在大规模状态空间或者连续状态空间中,将函数近似与强化学习相结合是当前机器学习领域的一个研究热点;同时,在学习过程中如何平衡探索和利用的问题更是强化学习领域的一个研究难点.针对大规模状态空间或者连续状态空间、确定环境问题中的探索和利用的平衡问题,提出了一种基于高斯过程的近似策略迭代算法.该算法利用高斯过程对带参值函数进行建模,结合生成模型,根据贝叶斯推理,求解值函数的后验分布.在学习过程中,根据值函数的概率分布,求解动作的信息价值增益,结合值函数的期望值,选择相应的动作.在一定程度上,该算法可以解决探索和利用的平衡问题,加快算法收敛.将该算法用于经典的Mountain Car 问题,实验结果表明,该算法收敛速度较快,收敛精度较好.  相似文献   

7.
针对现有机器人路径规划强化学习算法收敛速度慢的问题,提出了一种基于人工势能场的移动机器人强化学习初始化方法.将机器人工作环境虚拟化为一个人工势能场,利用先验知识确定场中每点的势能值,它代表最优策略可获得的最大累积回报.例如障碍物区域势能值为零,目标点的势能值为全局最大.然后定义Q初始值为当前点的立即回报加上后继点的最大折算累积回报.改进算法通过Q值初始化,使得学习过程收敛速度更快,收敛过程更稳定.最后利用机器人在栅格地图中的路径对所提出的改进算法进行验证,结果表明该方法提高了初始阶段的学习效率,改善了算法性能.  相似文献   

8.
现有强化学习方法的收敛性分析大多针对离散状态问题,对于连续状态问题强化学习的收敛性分析仅局限于简单的LQR控制问题.本文对现有两种用于LQR问题收敛的强化学习方法进行分析,针对存在的问题,提出一种只需部分模型信息的强化学习方法.该方法使用递推最小二乘TD(RLS-TD)方法估计值函数参数,递推最小二乘方法(RLS)估计贪心改进策略.并给出理想情况下此方法收敛的理论分析.仿真实验表明该方法收敛到最优控制策略.  相似文献   

9.
路径积分方法源于随机最优控制,是一种数值迭代方法,可求解连续非线性系统的最优控制问题,不依赖于系统模型,快速收敛.文中将基于路径积分强化学习的策略改善方法用于蛇形机器人的目标导向运动.使用路径积分强化学习方法学习蛇形机器人步态方程的参数,不仅可以在仿真环境下使蛇形机器人规避障碍到达目标点,利用仿真环境的先验知识也能在实际环境下快速完成相同的任务.实验结果验证方法的正确性.  相似文献   

10.
强化学习通过与环境交互的方式进行学习,在较大状态空间中其学习效率却很低.植入先验知识能够提高学习速度,然而不恰当的先验知识反而会误导学习过程,对学习性能不利.提出一种基于BP神经网络的双层启发式强化学习方法NNH‐QL ,改变了传统强化学习过程的盲目性.作为定性层,高层由BP神经网络构成,它不需要由外界提供背景知识,利用Shaping技术,将在线获取的动态知识对底层基于表格的Q学习过程进行趋势性启发.算法利用资格迹技术训练神经网络以提高学习效率.NN H‐QL方法既发挥了标准Q学习的灵活性,又利用了神经网络的泛化性能,为解决较大状态空间下的强化学习问题提供了一个可行的方法.实验结果表明:该方法能够较好地提高强化学习的性能且具有明显的加速效果.  相似文献   

11.
In this introduction, we define the termbias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating and selecting biases using a framework of bias selection as search in bias and meta-bias spaces. Recent research in the field of machine learning bias is summarized.  相似文献   

12.
Using Genetic Algorithms for Concept Learning   总被引:23,自引:0,他引:23  
In this article, we explore the use of genetic algorithms (GAs) as a key element in the design and implementation of robust concept learning systems. We describe and evaluate a GA-based system called GABIL that continually learns and refines concept classification rules from its interaction with the environment. The use of GAs is motivated by recent studies showing the effects of various forms of bias built into different concept learning systems, resulting in systems that perform well on certain concept classes (generally, those well matched to the biases) and poorly on others. By incorporating a GA as the underlying adaptive search mechanism, we are able to construct a concept learning system that has a simple, unified architecture with several important features. First, the system is surprisingly robust even with minimal bias. Second, the system can be easily extended to incorporate traditional forms of bias found in other concept learning systems. Finally, the architecture of the system encourages explicit representation of such biases and, as a result, provides for an important additional feature: the ability todynamically adjust system bias. The viability of this approach is illustrated by comparing the performance of GABIL with that of four other more traditional concept learners (AQ14, C4.5, ID5R, and IACL) on a variety of target concepts. We conclude with some observations about the merits of this approach and about possible extensions.  相似文献   

13.
Kernel-Based Reinforcement Learning   总被引:5,自引:0,他引:5  
Ormoneit  Dirk  Sen  Śaunak 《Machine Learning》2002,49(2-3):161-178
We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.  相似文献   

14.
Prior knowledge, or bias, regarding a concept can reduce the number of examples needed to learn it. Probably Approximately Correct (PAC) learning is a mathematical model of concept learning that can be used to quantify the reduction in the number of examples due to different forms of bias. Thus far, PAC learning has mostly been used to analyzesyntactic bias, such as limiting concepts to conjunctions of boolean prepositions. This paper demonstrates that PAC learning can also be used to analyzesemantic bias, such as a domain theory about the concept being learned. The key idea is to view the hypothesis space in PAC learning as that consistent withall prior knowledge, syntactic and semantic. In particular, the paper presents an analysis ofdeterminations, a type of relevance knowledge. The results of the analysis reveal crisp distinctions and relations among different determinations, and illustrate the usefulness of an analysis based on the PAC learning model.  相似文献   

15.
Multitask Learning   总被引:10,自引:0,他引:10  
Caruana  Rich 《Machine Learning》1997,28(1):41-75
Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with k-nearest neighbor and kernel regression. In this paper we demonstrate multitask learning in three domains. We explain how multitask learning works, and show that there are many opportunities for multitask learning in real domains. We present an algorithm and results for multitask learning with case-based methods like k-nearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. Because multitask learning works, can be applied to many different kinds of domains, and can be used with different learning algorithms, we conjecture there will be many opportunities for its use on real-world problems.  相似文献   

16.
深度学习是机器学习研究中的一个重要领域,它具有强大的特征提取能力,且在许多应用中表现出先进的性能,因此在工业界中被广泛应用.然而,由于训练数据标注和模型设计存在偏见,现有的研究表明深度学习在某些应用中可能会强化人类的偏见和歧视,导致决策过程中的不公平现象产生,从而对个人和社会产生潜在的负面影响.为提高深度学习的应用可靠性、推动其在公平领域的发展,针对已有的研究工作,从数据和模型2方面出发,综述了深度学习应用中的偏见来源、针对不同类型偏见的去偏方法、评估去偏效果的公平性评价指标、以及目前主流的去偏平台,最后总结现有公平性研究领域存在的开放问题以及未来的发展趋势.  相似文献   

17.
Subramanian  Devika 《Machine Learning》1995,20(1-2):155-191
In this paper, we describe a domain-independent principle for justified shifts of vocabulary bias in speedup learning. This principle advocates the minimization of wasted computational effort. It explains as well as generates a special class of granularity shifts. We describe its automation for definite as well as stratified Horn theories, and present an implementation for a general class of reachability computations.  相似文献   

18.
It is well known that prior knowledge or bias can speed up learning, at least in theory. It has proved difficult to make constructive use of prior knowledge, so that approximately correct hypotheses can be learned efficiently. In this paper, we consider a particular form of bias which consists of a set of determinations. A set of attributes is said to determine a given attribute if the latter is purely a function of the former. The bias is tree-structured if there is a tree of attributes such that the attribute at any node is determined by its children, where the leaves correspond to input attributes and the root corresponds to the target attribute for the learning problem. The set of allowed functions at each node is called the basis. The tree-structured bias restricts the target functions to those representable by a read-once formula (a Boolean formula in which each variable occurs at most once) of a given structure over the basis functions. We show that efficient learning is possible using a given tree-structured bias from random examples and membership queries, provided that the basis class itself is learnable and obeys some mild closure conditions. The algorithm uses a form of controlled experimentation in order to learn each part of the overall function, fixing the inputs to the other parts of the function at appropriate values. We present empirical results showing that when a tree-structured bias is available, our method significantly improves upon knowledge-free induction. We also show that there are hard cryptographic limitations to generalizing these positive results to structured determinations in the form of a directed acyclic graph.  相似文献   

19.
基于递归神经网络的序列到序列的模型在文本摘要生成任务中取得了非常好的效果,但这类模型大多存在生成文本重复、曝光偏差等问题。针对重复问题,提出一种由存储注意力和解码自注意力构成的混合注意力,通过存储历史注意力和增加对历史生成单词的注意力来克服该问题;使用强化学习作为一种新的训练方式来解决曝光偏差问题,同时修正损失函数。在CNN/Daily Mail数据集对模型进行测试,以ROUGE为评价指标,结果证明了混合注意力对重复问题有较大的改善,借助强化学习可以消除曝光偏差,整合后的模型在测试集上超越先进算法。  相似文献   

20.
一个设计良好的学习率策略可以显著提高深度学习模型的收敛速度, 减少模型的训练时间. 本文针对AdaGrad和AdaDec学习策略只对模型所有参数提供单一学习率方式的问题, 根据模型参数的特点, 提出了一种组合型学习策略: AdaMix. 该策略为连接权重设计了一个仅与当前梯度有关的学习率, 为偏置设计使用了幂指数型学习率.利用深度学习模型Autoencoder对图像数据库MNIST进行重构, 以模型反向微调过程中测试阶段的重构误差作为评价指标, 验证几种学习策略对模型收敛性的影响.实验结果表明, AdaMix比AdaGrad和AdaDec的重构误差小并且计算量也低, 具有更快的收敛速度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号