首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 281 毫秒
1.
自适应模糊RBF神经网络的多智能体机器人强化学习   总被引:3,自引:0,他引:3  
多机器人环境中的学习,由于机器人所处的环境是连续状态,连续动作,而且包含多个机器人,因此学习空间巨大,直接应用Q学习算法难以获得满意的结果。文章研究中针对多智能体机器人系统的学习问题,提出自适应模糊RBF神经网络强化学习算法,网络本身具有模糊推理能力、较强的函数逼近能力以及泛化能力,因此,实现了人类专家知识与机器学习方法的结合,减少学习问题的复杂度;实现连续状态空间与动作空间的策略学习。  相似文献   

2.
段勇  徐心和 《控制与决策》2007,22(5):525-529
研究基于行为的移动机器人控制方法.将模糊神经网络与强化学习理论相结合,构成模糊强化系统.它既可获取模糊规则的结论部分和模糊隶属度函数参数,也可解决连续状态空间和动作空间的强化学习问题.将残差算法用于神经网络的学习,保证了函数逼近的快速性和收敛性.将该系统的学习结果作为反应式自主机器人的行为控制器,有效地解决了复杂环境中的机器人导航问题.  相似文献   

3.
基于自适应归一化 RBF 网络的Q-V 值函数协同逼近模型   总被引:1,自引:0,他引:1  
径向基函数网络逼近模型可以有效地解决连续状态空间强化学习问题。然而,强化学习的在线特性决定了 RBF 网络逼近模型会面临“灾难性扰动”,即新样本作用于学习模型后非常容易对先前学习到的输入输出映射关系产生破坏。针对 RBF 网络逼近模型的“灾难性扰动”问题,文中提出了一种基于自适应归一化 RBF(ANRBF)网络的 Q-V 值函数协同逼近模型及对应的协同逼近算法———QV(λ)。该算法对由 RBFs 提取得到的特征向量进行归一化处理,并在线自适应地调整 ANRBF 网络隐藏层节点的个数、中心及宽度,可以有效地提高逼近模型的抗干扰性和灵活性。协同逼近模型中利用 Q 和V 值函数协同塑造 TD 误差,在一定程度上利用了环境模型的先验知识,因此可以有效地提高算法的收敛速度和初始性能。从理论上分析了 QV(λ)算法的收敛性,并对比其他的函数逼近算法,通过实验验证了 QV(λ)算法具有较优的性能。  相似文献   

4.
基于强化学习规则的两轮机器人自平衡控制   总被引:1,自引:0,他引:1  
两轮机器人是一个典型的不稳定,非线性,强耦合的自平衡系统,在两轮机器人系统模型未知和没有先验经验的条件下,将强化学习算法和模糊神经网络有效结合,保证了函数逼近的快速性和收敛性,成功地实现两轮机器人的自学习平衡控制,并解决了两轮机器人连续状态空间和动作空间的强化学习问题;仿真和实验表明:该方法不仅在很短的时间内成功地完成对两轮机器人的平衡控制,而且在两轮机器人参数变化较大时,仍能维持两轮机器人的平衡。  相似文献   

5.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。  相似文献   

6.
在基于动态规划的强化学习中,利用状态集结方法可以减小状态空间的大小,从而在一定程度上克服了维数灾的困难,同时还可以加快学习速度。但状态集结是一种逼近方法,由此产生的问题是,状态集结后的Q-hat强化学习收敛所得的最优Q值函数与集结前相应的最优Q值函数会有多大的误差。为此提出了基于最小最大逼近强化学习的误差估计。  相似文献   

7.
近年来,学习分类器LCS已广泛用于基于归纳学习的强化学习领域,但很少用于多机器人领域.提出了一种基于集成强化学习和遗传算法的学习分类器用于多机器人路径规划领域.由于遗传算法具有早熟收敛、局部最优解和占据较大的存储空间等缺陷,针对静态和动态环境因素对多机器人路径规划的不同影响,设计了在静态和动态环境下不同的适应度函数,在理论上推导并证明了信用分配算法的收敛性,为路径规划算法的收敛提供了理论保证.仿真实验结果也表明遗传算法和学习分类器结合用于多机器人的路径规划是有效的,遗传算法的早熟收敛、局部最优解、占据存储空间较大和收敛速度慢等难题得到很大改善,提高了多机器人发现安全路径的能力.所以LCS在机器人领域的应用前景是非常广阔的,是今后需要努力研究的方向.  相似文献   

8.
陈鑫  魏海军  吴敏  曹卫华 《自动化学报》2013,39(12):2021-2031
提高适应性、实现连续空间的泛化、降低维度是实现多智能体强化学习(Multi-agent reinforcement learning,MARL)在连续系统中应用的几个关键. 针对上述需求,本文提出连续多智能体系统(Multi-agent systems,MAS)环境下基于模型的智能体跟踪式学习机制和算法(MAS MBRL-CPT).以学习智能体适应同伴策略为出发点,通过定义个体期望即时回报,将智能体对同伴策略的观测融入环境交互效果中,并运用随机逼近实现个体期望即时回报的在线学习.定义降维的Q函数,在降低学习空间维度的同时,建立MAS环境下智能体跟踪式学习的Markov决策过程(Markov decision process,MDP).在运用高斯回归建立状态转移概率模型的基础上,实现泛化样本集Q值函数的在线动态规划求解.基于离散样本集Q函数运用高斯回归建立值函数和策略的泛化模型. MAS MBRL-CPT在连续空间Multi-cart-pole控制系统的仿真实验表明,算法能够使学习智能体在系统动力学模型和同伴策略未知的条件下,实现适应性协作策略的学习,具有学习效率高、泛化能力强等特点.  相似文献   

9.
在多机器人系统中,协作环境探索的强化学习的空间规模是机器人个数的指数函数,学习空间非常庞大造成收敛速度极慢。为了解决这个问题,将基于动作预测的强化学习方法及动作选择策略应用于多机器人协作研究中,通过预测机器人可能执行动作的概率以加快学习算法的收敛速度。实验结果表明,基于动作预测的强化学习方法能够比原始算法更快速地获取多机器人的协作策略。  相似文献   

10.
强化学习用于解决无模型情况下的优化决策问题,是实现人工智能的重要技术之一,但传统的表格型强化学习方法难以处理具有大规模、连续空间的控制问题。近似强化学习受到函数逼近思想的启发,对价值函数或策略函数参数化表示,通过参数优化间接获得最优行为策略,在视频游戏、棋类对抗及机器人控制等领域应用效果显著。基于此,对近似强化学习算法的研究现状与应用进展进行了梳理和综述。介绍了近似强化学习相关的基础理论;分类总结了近似强化学习的经典算法及一些相应的改进方法;概述了近似强化学习在机器人控制领域的研究进展,并总结了当前面临的若干主要问题,为后续的研究提供参考。  相似文献   

11.
The accuracy-based XCS classifier system has been shown to solve typical data mining problems in a machine-learning competitive way. However, successful applications in multistep problems, modeled by a Markov decision process, were restricted to very small problems. Until now, the temporal difference learning technique in XCS was based on deterministic updates. However, since a prediction is actually generated by a set of rules in XCS and Learning Classifier Systems in general, gradient-based update methods are applicable. The extension of XCS to gradient-based update methods results in a classifier system that is more robust and more parameter independent, solving large and difficult maze problems reliably. Additionally, the extension to gradient methods highlights the relation of XCS to other function approximation methods in reinforcement learning.  相似文献   

12.
Recent analysis of the XCS classifier system have shown that successful genetic learning strongly depends on the amount of fitness pressure towards accurate classifiers. Since the traditionally used proportionate selection is dependent on fitness scaling and fitness distribution, the resulting evolutionary fitness pressure may be neither stable nor sufficiently strong. Thus, we apply tournament selection to XCS. In particular, we exhibit the weakness of proportionate selection and suggest tournament selection as a more reliable alternative. We show that tournament selection results in a learning classifier system that is more parameter independent, noise independent, and more efficient in exploiting fitness guidance in single-step problems as well as multistep problems. The evolving population is more focused on promising subregions of the problem space and thus finds the desired accurate, maximally general representation faster and more reliably.  相似文献   

13.
度量亦称距离函数,是度量空间中满足特定条件的特殊函数,一般用来反映数据间存在的一些重要距离关系.而距离对于各种分类聚类问题影响很大,因此度量学习对于这类机器学习问题有重要影响.受到现实存在的各种噪声影响,已有的各种度量学习算法在处理各种分类问题时,往往出现分类准确率较低以及分类准确率波动大的问题.针对该问题,本文提出一种基于最大相关熵准则的鲁棒度量学习算法.最大相关熵准则的核心在于高斯核函数,本文将其引入到度量学习中,通过构建以高斯核函数为核心的损失函数,利用梯度下降法进行优化,反复测试调整参数,最后得到输出的度量矩阵.通过这样的方法学习到的度量矩阵将有更好的鲁棒性,在处理受噪声影响的各种分类问题时,将有效地提高分类准确率.本文将在一些常用机器学习数据集(UCI)还有人脸数据集上进行验证实验.  相似文献   

14.
Feedforward neural networks (FNN) have been proposed to solve complex problems in pattern recognition, classification and function approximation. Despite the general success of learning methods for FNN, such as the backpropagation (BP) algorithm, second-order algorithms, long learning time for convergence remains a problem to be overcome. In this paper, we propose a new hybrid algorithm for a FNN that combines unsupervised training for the hidden neurons (Kohonen algorithm) and supervised training for the output neurons (gradient descent method). Simulation results show the effectiveness of the proposed algorithm compared with other well-known learning methods.  相似文献   

15.
神经网络增强学习的梯度算法研究   总被引:11,自引:1,他引:11  
徐昕  贺汉根 《计算机学报》2003,26(2):227-233
针对具有连续状态和离散行为空间的Markov决策问题,提出了一种新的采用多层前馈神经网络进行值函数逼近的梯度下降增强学习算法,该算法采用了近似贪心且连续可微的Boltzmann分布行为选择策略,通过极小化具有非平稳行为策略的Bellman残差平方和性能指标,以实现对Markov决策过程最优值函数的逼近,对算法的收敛性和近似最优策略的性能进行了理论分析,通过Mountain-Car学习控制问题的仿真研究进一步验证了算法的学习效率和泛化性能。  相似文献   

16.
Reinforcement learning has been widely applied to solve a diverse set of learning tasks, from board games to robot behaviours. In some of them, results have been very successful, but some tasks present several characteristics that make the application of reinforcement learning harder to define. One of these areas is multi-robot learning, which has two important problems. The first is credit assignment, or how to define the reinforcement signal to each robot belonging to a cooperative team depending on the results achieved by the whole team. The second one is working with large domains, where the amount of data can be large and different in each moment of a learning step. This paper studies both issues in a multi-robot environment, showing that introducing domain knowledge and machine learning algorithms can be combined to achieve successful cooperative behaviours.  相似文献   

17.
Michigan-style learning classifier systems (LCSs) are online machine learning techniques that incrementally evolve distributed subsolutions which individually solve a portion of the problem space. As in many machine learning systems, extracting accurate models from problems with class imbalances-that is, problems in which one of the classes is poorly represented with respect to the other classes-has been identified as a key challenge to LCSs. Empirical studies have shown that Michigan-style LCSs fail to provide accurate subsolutions that represent the minority class in domains with moderate and large disproportion of examples per class; however, the causes of this failure have not been analyzed in detail. Therefore, the aim of this paper is to carefully examine the effect of class imbalances on different LCS components. The analysis focuses on XCS, which is the most-relevant Michigan-style LCS, although the models could be easily adapted to other LCSs. Design decomposition is used to identify five elements that are crucial to guaranteeing the success of LCSs in domains with class imbalances, and facetwise models that explain these different elements for XCS are developed. All theoretical models are validated with artificial problems. The integration of all these models enables us to identify the sweet spot where XCS is able to scalably and efficiently evolve accurate models of rare classes; furthermore, facetwise analysis is used as a tool for designing a set of configuration guidelines that have to be followed to ensure convergence. When properly configured, XCS is shown to be able to solve highly unbalanced problems that previously eluded solution.  相似文献   

18.
The evolutionary learning mechanism in XCS strongly depends on its accuracy-based fitness approach. The approach is meant to result in an evolutionary drive from classifiers of low accuracy to those of high accuracy. Since, given inaccuracy, lower specificity often corresponds to lower accuracy, fitness pressure most often also results in a pressure towards higher specificity. Moreover, fitness pressure should cause the evolutionary process to be innovative in that it combines low-order building blocks of lower accurate classifiers, to higher-order building blocks with higher accuracy. This paper investigates how, when, and where accuracy-based fitness results in successful rule evolution in XCS. Along the way, a weakness in the current proportionate selection method in XCS is identified. Several problem bounds are derived that need to be obeyed to enable proper evolutionary pressure. Moreover, a fitness dilemma is identified that causes accuracy-based fitness to be misleading. Improvements are introduced to XCS to make fitness pressure more robust and overcome the fitness dilemma. Specifically, (1) tournament selection results in a much better fitness-bias exploitation, and (2) bilateral accuracy prevents the fitness dilemma. While the improvements stand for themselves, we believe they also contribute to the ultimate goal of an evolutionary learning system that is able to solve decomposable machine-learning problems quickly, accurately,and reliably. The paper also contributes to the further understanding of XCS in general and the fitness approach in XCS in particular.  相似文献   

19.
再励学习(Reinforcement Learning,RL)是一种成功地结合动态编程和控制问题的机器智能方法,它将动态编程和有监督学习方法结合到机器学习系统中,通常用于解决预测和控制两类问题。提出了以矢量形式表示的评估函数,为了实现多维再励学习,用一专门的神经网络(Q网络)实现评判网络,研究其在移动机器人行为规划中的应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号