期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

连续状态自适应离散化基于K-均值聚类的强化学习方法 总被引：5，自引：1，他引：5

文锋陈宗海卓睿周光明《控制与决策》2006,21(2):143-0148

使用聚类算法对连续状态空间进行自适应离散化．得到了基于K-均值聚类的强化学习方法．该方法的学习过程分为两部分：对连续状态空间进行自适应离散化的状态空间学习，使用K-均值聚类算法；寻找最优策略的策略学习．使用替代合适迹Sarsa学习算法．对连续状态的强化学习基准问题进行仿真实验，结果表明该方法能实现对连续状态空间的自适应离散化，并最终学习到最优策略．与基于CMAC网络的强化学习方法进行比较．结果表明该方法具有节省存储空间和缩短计算时间的优点．相似文献

2.

基于核方法的连续动作Actor-Critic学习

陈兴国高阳范顺国俞亚君《模式识别与人工智能》2014,(2):103-110

强化学习算法通常要处理连续状态及连续动作空间问题以实现精确控制.就此文中结合Actor-Critic方法在处理连续动作空间的优点及核方法在处理连续状态空间的优势,提出一种基于核方法的连续动作Actor-Critic学习算法(KCACL).该算法中,Actor根据奖赏不作为原则更新动作概率,Critic采用基于核方法的在线选择时间差分算法学习状态值函数.对比实验验证该算法的有效性. 相似文献

3.

基于核方法的强化学习算法

何源张文生《微计算机信息》2008,24(4):243-245

传统的强化学习算法通常假设状态空间和行动空间是离散的,而实际上很多问题的状态空间是连续的,这就大大地限制了强化学习在实际中的应用.为克服以上不足,本文提出了一种基于核方法的强化学习算法,能直接处理具有连续状态空间的问题.最后,通过具有连续状态空间和离散行动空间的mountain car问题来验证算法.实验表明,这种算法在处理具有连续状态空间的问题时,和传统的先把状态空间离散化的方法相比,能以较少的训练数据收敛到更好的策略. 相似文献

4.

基于自组织模糊RBF网络的连续空间Q学习

程玉虎王雪松易建强孙伟《信息与控制》2008,37(1):1-1

针对连续空间下的强化学习控制问题,提出了一种基于自组织模糊RBF网络的Q学习方法．网络的输入为状态,输出为连续动作及其Q值,从而实现了“连续状态—连续动作”的映射关系．首先将连续动作空间离散化为确定数目的离散动作,采用完全贪婪策略选取具有最大Q值的离散动作作为每条模糊规则的局部获胜动作．然后采用命令融合机制对获胜的离散动作按其效用值进行加权,得到实际作用于系统的连续动作．另外,为简化网络结构和提高学习速度,采用改进的RAN算法和梯度下降法分别对网络的结构和参数进行在线自适应调整．倒立摆平衡控制的仿真结果验证了所提Q学习方法的有效性．相似文献

5.

一种状态自动划分的模糊小脑模型关节控制器值函数拟合方法

闵华清曾嘉安罗荣华朱金辉《控制理论与应用》2011,28(2):256-260

在庞大离散状态空间或连续状态空间中,强化学习(RL)需要进行值函数拟合以寻找最优策略.但函数拟合器的结构往往由设计者预先设定,在学习过程中不能动态调整缺乏自适应性.为了自动构建函数拟合器的结构,提出一种可以进行状态自动划分的模糊小脑模型关节控制(FCMAC)值函数拟合方法.该方法利用Bellman误差的变化趋势实现状态自动划分,并且探讨了两种选择划分区域的机制.汽车爬坡问题和机器人足球仿真平台中的实验结果表明新算法能有效拟合值函数,而且利用所提出的函数拟合器智能体可以进行有效的强化学习. 相似文献

6.

一种模糊强化学习算法及其在RoboCup中的应用 总被引：1，自引：0，他引：1

高建清王浩于磊方宝富《计算机工程与应用》2006,42(6):52-54

传统的强化学习算法只能解决离散状态空间和动作空间的学习问题。论文提出一种模糊强化学习算法,通过模糊推理系统将连续的状态空间映射到连续的动作空间,然后通过学习得到一个完整的规则库。这个规则库为Agent的行为选择提供了先验知识,通过这个规则库可以实现动态规划。作者在RoboCup环境中验证了这个算法,实现了踢球策略的优化。相似文献

7.

基于最小二乘策略迭代的无人机航迹规划方法

下载免费PDF全文

陈晓倩刘瑞祥《计算机工程与应用》2020,56(1):191-195

针对传统强化学习方法因对状态空间进行离散化而无法保证无人机在复杂应用场景中航迹精度的问题,使用最小二乘策略迭代（Least-Squares Policy Iteration,LSPI）算法开展连续状态航迹规划问题研究。该算法采用带参线性函数逼近器近似表示动作值函数,无需进行空间离散化,提高了航迹精度,并基于样本数据离线计算策略,直接对策略进行评价和改进。与Q学习算法的对比仿真实验结果表明LSPI算法规划出的三维航迹更为平滑,有利于飞机实际飞行。相似文献

8.

一种高斯过程的带参近似策略迭代算法

傅启明刘全伏玉琛周谊成于俊《软件学报》2013,24(11):2676-2686

在大规模状态空间或者连续状态空间中,将函数近似与强化学习相结合是当前机器学习领域的一个研究热点;同时,在学习过程中如何平衡探索和利用的问题更是强化学习领域的一个研究难点.针对大规模状态空间或者连续状态空间、确定环境问题中的探索和利用的平衡问题,提出了一种基于高斯过程的近似策略迭代算法.该算法利用高斯过程对带参值函数进行建模,结合生成模型,根据贝叶斯推理,求解值函数的后验分布.在学习过程中,根据值函数的概率分布,求解动作的信息价值增益,结合值函数的期望值,选择相应的动作.在一定程度上,该算法可以解决探索和利用的平衡问题,加快算法收敛.将该算法用于经典的Mountain Car 问题,实验结果表明,该算法收敛速度较快,收敛精度较好. 相似文献

9.

一种新的连续动作集学习自动机

刘晓毛宁《数据采集与处理》2015,30(6):1310-1317

学习自动机（Learning automation,LA）是一种自适应决策器。其通过与一个随机环境不断交互学习从一个允许的动作集里选择最优的动作。在大多数传统的LA模型中,动作集总是被取作有限的。因此,对于连续参数学习问题,需要将动作空间离散化,并且学习的精度取决于离散化的粒度。本文提出一种新的连续动作集学习自动机（Continuous action set learning automaton,CALA）,其动作集为一个可变区间,同时按照均匀分布方式选择输出动作。学习算法利用来自环境的二值反馈信号对动作区间的端点进行自适应更新。通过一个多模态学习问题的仿真实验,演示了新算法相对于3种现有CALA算法的优越性。相似文献

10.

自适应RBF网络Q学习控制 总被引：1，自引：0，他引：1

徐明亮须文波《控制与决策》2010,25(2):303-306

利用RBF网络逼近连续空间的Q值函数,实现连续空间的Q学习.RBF网络输入为状态-动作对,输出为该状态-动作对的Q值.状态由系统的状态转移特性确定,动作由优化网络输出得到的贪婪动作与服从高斯分布的噪声干扰动作两部分叠加而成.利用RNA算法和梯度下降法自适应调整网络的结构和参数.倒立摆平衡控制的实验结果验证了该方法的有效性. 相似文献

11.

A proposition of adaptive state space partition in reinforcement learning with Voronoi tessellation

Takayasu Fuchida Kathy Thi Aung 《Artificial Life and Robotics》2013,18(3-4):172-177

This paper presents a new adaptive segmentation of continuous state space based on vector quantization algorithm such as Linde–Buzo–Gray for high-dimensional continuous state spaces. The objective of adaptive state space partitioning is to develop the efficiency of learning reward values with an accumulation of state transition vector in a single-agent environment. We constructed our single-agent model in continuous state and discrete actions spaces using Q-learning function. Moreover, the study of the resulting state space partition reveals a Voronoi tessellation. In addition, the experimental results show that this proposed method can partition the continuous state space appropriately into Voronoi regions according to not only the number of actions, but also achieve a good performance of reward-based learning tasks compared with other approaches such as square partition lattice on discrete state space. 相似文献

12.

Semi-supervised clustering with metric learning: An adaptive kernel method

Xuesong Yin Author Vitae Songcan Chen Author Vitae Enliang Hu Author Vitae Author Vitae 《Pattern recognition》2010,43(4):1320-1333

Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually tuning the kernel parameters due to the fact that no sufficient supervision is provided, but also lack a measure that achieves better effectiveness of clustering. In this paper, we propose an adaptive Semi-supervised Clustering Kernel Method based on Metric learning (SCKMM) to mitigate the above problems. Specifically, we first construct an objective function from pairwise constraints to automatically estimate the parameter of the Gaussian kernel. Then, we use pairwise constraint-based K-means approach to solve the violation issue of constraints and to cluster the data. Furthermore, we introduce metric learning into nonlinear semi-supervised clustering to improve separability of the data for clustering. Finally, we perform clustering and metric learning simultaneously. Experimental results on a number of real-world data sets validate the effectiveness of the proposed method. 相似文献

13.

A fuzzy Actor-Critic reinforcement learning network

Xue-Song Wang Yu-Hu Cheng 《Information Sciences》2007,177(18):3764-3781

One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor-Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart-pole balancing control illustrate the performance and applicability of the proposed FACRLN. 相似文献

14.

具备历史借鉴能力的软划分聚类模型

孙寿伟钱鹏江陈爱国蒋亦樟《计算机应用》2015,35(2):435-439

在数据稀少或失真等场景下,传统软划分聚类算法无法获得满意的聚类效果。为解决该问题,以极大熵聚类算法为基础,基于历史知识利用的途径,提出两种新的具备历史借鉴能力的软划分聚类模型(分别简称SPBC-RHK-1和SPBC-RHK-2)。SPBC-RHK-1是仅借鉴历史类中心的基础模型,SPBC-RHK-2则是以历史类中心和历史隶属度相融合为手段的高级模型。通过历史知识借鉴,两种模型的聚类有效性均得到有效提高,比较而言具备更高知识利用能力的SPBC-RHK-2模型在聚类有效性和鲁棒性上具有更好的表现。由于所用历史知识不暴露历史源数据,因此两种方法还具有良好的历史数据隐私保护效果。最后在模拟数据集和真实数据集上的实验验证了上述优点。相似文献

15.

Distance metric learning guided adaptive subspace semi-supervised clustering

Xuesong YIN Enliang HU 《Frontiers of Computer Science》2011,5(1):100-108

Most existing semi-supervised clustering algorithms are not designed for handling high-dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the inherent relationship between subspace selection and clustering is ignored. In order to mitigate the above problems, we present a semi-supervised clustering algorithm using adaptive distance metric learning (SCADM) which performs semi-supervised clustering and distance metric learning simultaneously. SCADM applies the clustering results to learn a distance metric and then projects the data onto a low-dimensional space where the separability of the data is maximized. Experimental results on real-world data sets show that the proposed method can effectively deal with high-dimensional data and provides an appealing clustering performance. 相似文献

16.

熵加权多视角协同划分模糊聚类算法

蒋亦樟邓赵红王骏钱鹏江王士同《软件学报》2014,25(10):2293-2311

当前,基于协同学习机制的多视角聚类技术存在如下两点不足：第一,以往构造的用于各视角协同学习的逼近准则物理含义不明确且控制简单;第二,以往算法均默认各视角的重要性程度是相等的,缺少各视角重要性自适应调整的能力。针对上述不足：首先,基于具有良好物理解释性的Havrda-Charvat熵构造了一个全新的异视角空间划分逼近准则,该准则能有效地控制异视角间的空间划分相似程度;其次,基于香农熵理论提出了多视角自适应加权策略,可有效地控制各视角的重要性程度,提高算法的聚类性能;最后,基于FCM框架提出了熵加权多视角协同划分模糊聚类算法(entropy weight-collaborative partition-multi-view fuzzy clustering algorithm,简称EW-CoP-MVFCM)。在模拟数据集以及 UCI 数据集上的实验结果均显示,所提算法较之已有多视角聚类算法在应对多视角聚类任务时具有更好的适应性。相似文献

17.

Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data

Mahdieh Soleymani Baghshah Author Vitae Saeed Bagheri Shouraki^{Author Vitae} 《Pattern recognition》2010,43(8):2982-2992

The problem of clustering with side information has received much recent attention and metric learning has been considered as a powerful approach to this problem. Until now, various metric learning methods have been proposed for semi-supervised clustering. Although some of the existing methods can use both positive (must-link) and negative (cannot-link) constraints, they are usually limited to learning a linear transformation (i.e., finding a global Mahalanobis metric). In this paper, we propose a framework for learning linear and non-linear transformations efficiently. We use both positive and negative constraints and also the intrinsic topological structure of data. We formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem. The proposed non-linear method can be considered as an efficient kernel learning method that yields an explicit non-linear transformation and thus shows out-of-sample generalization ability. Experimental results on synthetic and real-world data sets show the effectiveness of our metric learning method for semi-supervised clustering tasks. 相似文献

18.

属性加权的类属型数据非模聚类

陈黎飞郭躬德《软件学报》2013,24(11):2628-2641

类属型数据广泛分布于生物信息学等许多应用领域,其离散取值的特点使得类属数据聚类成为统计机器学习领域一项困难的任务.当前的主流方法依赖于类属属性的模进行聚类优化和相关属性的权重计算.提出一种非模的类属型数据统计聚类方法.首先,基于新定义的相异度度量,推导了属性加权的类属数据聚类目标函数.该函数以对象与簇之间的平均距离为基础,从而避免了现有方法以模为中心导致的问题.其次,定义了一种类属型数据的软子空间聚类算法.该算法在聚类过程中根据属性取值的总体分布,而不仅限于属性的模,赋予每个属性衡量其与簇类相关程度的权重,实现自动的特征选择.在合成数据和实际应用数据集上的实验结果表明,与现有的基于模的聚类算法和基于蒙特卡罗优化的其他非模算法相比,该算法有效地提高了聚类结果的质量. 相似文献

19.

ADAPTIVE HYPER-FUZZY PARTITION PARTICLE SWARM OPTIMIZATION CLUSTERING ALGORITHM

Hsuan-Ming Feng Ching-Yi Chen Fun Ye 《控制论与系统》2013,44(5):463-479

This article presents an adaptive hyper-fuzzy partition particle swarm optimization clustering algorithm to optimally classify different geometrical structure data sets into correct groups. In this architecture, we use a novel hyper-fuzzy partition metric to improve the traditional common-used Euclidean norm metric clustering method. Since one fuzzy rule describes one pattern feature and implies the detection of one cluster center, it is encouraged to decrease the number of fuzzy rules with the hyper-fuzzy partition metric. According to the adaptive particle swarm optimization, it is very suitable to manage the clustering task for a complex, irregular, and high dimensional data set. To demonstrate the robustness of the proposed adaptive hyper-fuzzy partition particle swarm optimization clustering algorithms, various clustering simulations are experimentally compared with K-means and fuzzy c-means learning methods. 相似文献