首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method.  相似文献   

2.
The distributed autonomous robotic system has superiority of robustness and adaptability to dynamical environment, however, the system requires the cooperative behavior mutually for optimality of the system. The acquisition of action by reinforcement learning is known as one of the approaches when the multi-robot works with cooperation mutually for a complex task. This paper deals with the transporting problem of the multi-robot using Q-learning algorithm in the reinforcement learning. When a robot carries luggage, we regard it as that the robot leaves a trace to the own migrational path, which trace has feature of volatility, and then, the other robot can use the trace information to help the robot, which carries luggage. To solve these problems on multi-agent reinforcement learning, the learning control method using stress antibody allotment reward is used. Moreover, we propose the trace information of the robot to urge cooperative behavior of the multi-robot to carry luggage to a destination in this paper. The effectiveness of the proposed method is shown by simulation. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

3.
Multi-agent reinforcement learning technologies are mainly investigated from two perspectives of the concurrence and the game theory. The former chiefly applies to cooperative multi-agent systems, while the latter usually applies to coordinated multi-agent systems. However, there exist such problems as the credit assignment and the multiple Nash equilibriums for agents with them. In this paper, we propose a new multi-agent reinforcement learning model and algorithm LMRL from a layer perspective. LMRL model is composed of an off-line training layer that employs a single agent reinforcement learning technology to acquire stationary strategy knowledge and an online interaction layer that employs a multi-agent reinforcement learning technology and the strategy knowledge that can be revised dynamically to interact with the environment. An agent with LMRL can improve its generalization capability, adaptability and coordination ability. Experiments show that the performance of LMRL can be better than those of a single agent reinforcement learning and Nash-Q.  相似文献   

4.
针对现有机器人路径规划强化学习算法收敛速度慢的问题,提出了一种基于人工势能场的移动机器人强化学习初始化方法.将机器人工作环境虚拟化为一个人工势能场,利用先验知识确定场中每点的势能值,它代表最优策略可获得的最大累积回报.例如障碍物区域势能值为零,目标点的势能值为全局最大.然后定义Q初始值为当前点的立即回报加上后继点的最大折算累积回报.改进算法通过Q值初始化,使得学习过程收敛速度更快,收敛过程更稳定.最后利用机器人在栅格地图中的路径对所提出的改进算法进行验证,结果表明该方法提高了初始阶段的学习效率,改善了算法性能.  相似文献   

5.
Personalized production has emerged as a result of the increasing customer demand for more personalized products. Personalized production systems carry a greater amount of uncertainty and variability when compared with traditional manufacturing systems. In this paper, we present a smart manufacturing system using a multi-agent system and reinforcement learning, which is characterized by machines with intelligent agents to enable a system to have autonomy of decision making, sociability to interact with other systems, and intelligence to learn dynamically changing environments. In the proposed system, machines with intelligent agents evaluate the priorities of jobs and distribute them through negotiation. In addition, we propose methods for machines with intelligent agents to learn to make better decisions. The performance of the proposed system and the dispatching rule is demonstrated by comparing the results of the scheduling problem with early completion, productivity, and delay. The obtained results show that the manufacturing system with distributed artificial intelligence is competitive in a dynamic environment.  相似文献   

6.
At AROB5, we proposed a solution to the path planning of a mobile robot. In our approach, we formulated the problem as a discrete optimization problem at each time step. To solve the optimization problem, we used an objective function consisting of a goal term, a smoothness term, and a collision term. While the results of our simulation showed the effectiveness of our approach, the values of the weights in the objective function were not given by any theoretical method. This article presents a theoretical method using reinforcement learning for adjusting the weight parameters. We applied Williams' learning algorithm, episodic REINFORCE, to derive a learning rule for the weight parameters. We verified the learning rule by some experiments. This work was presented, in part, at the Sixth International Symposium on Artificial Life and Robotics, Tokyo, Japan, January 15–17, 2001  相似文献   

7.
针对无线传感器网络面向移动汇聚节点的自适应路由问题,为实现路由过程中对节点能量以及计算、存储、通信资源的优化利用,并对数据传输时延和投递率等服务质量进行优化,提出一种基于强化学习的自适应路由方法,设计综合的奖赏函数以实现对能量、时延和投递率等多个指标的综合优化。从报文结构、路由初始化、路径选择等方面对路由协议进行详细设计,采用汇聚节点声明以及周期性洪泛机制加速收敛速度,从而支持汇聚节点的快速移动。理论分析表明基于强化学习的路由方法具备收敛快、协议开销低以及存储计算需求小等特点,能够适用于能量和资源受限的传感器节点。在仿真平台中通过性能评估和对比分析验证了所述自适应路由算法的可行性和优越性。  相似文献   

8.
In this paper, a dynamic fuzzy energy state based AODV (DFES-AODV) routing protocol for Mobile Ad-hoc NETworks (MANETs) is presented. In DFES-AODV route discovery phase, each node uses a Mamdani fuzzy logic system (FLS) to decide its Route REQuests (RREQs) forwarding probability. The FLS inputs are residual battery level and energy drain rate of mobile node. Unlike previous related-works, membership function of residual energy input is made dynamic. Also, a zero-order Takagi Sugeno FLS with the same inputs is used as a means of generalization for state-space in SARSA-AODV a reinforcement learning based energy-aware routing protocol. The simulation study confirms that using a dynamic fuzzy system ensures more energy efficiency in comparison to its static counterpart. Moreover, DFES-AODV exhibits similar performance to SARSA-AODV and its fuzzy extension FSARSA-AODV. Therefore, the use of dynamic fuzzy logic for adaptive routing in MANETs is recommended.  相似文献   

9.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL. Editors: David Page and Akihiro Yamamoto  相似文献   

10.
In multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of Reinforcement Learning (RL) to the self-emergence of a common lexicon in robot teams. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the robots or the states of the environment itself) into symbols or signals we check whether it is possible for the robot team to converge in an autonomous, decentralized way to a common lexicon by means of RL, so that the communication efficiency of the entire robot team is optimal. We have conducted several experiments aimed at testing whether it is possible to converge with RL to an optimal Saussurean Communication System. We have organized our experiments alongside two main lines: first, we have investigated the effect of the team size centered on teams of moderated size in the order of 5 and 10 individuals, typical of multi-robot systems. Second, and foremost, we have also investigated the effect of the lexicon size on the convergence results. To analyze the convergence of the robot team we have defined the team’s consensus when all the robots (i.e. 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that RL allows the convergence to lexicon consensus in a population of autonomous agents.  相似文献   

11.
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts.  相似文献   

12.
Recently, deep-learning detection methods have achieved huge success in the vision-based monitoring of construction sites in terms of safety control and productivity analysis. However, deep-learning detection methods require large-scale datasets for training purposes, and such datasets are difficult to develop due to the limited accessibility of construction images and the need for labor-intensive annotations. To address this problem, this research proposes a semi-supervised learning detection method for construction site monitoring based on teacher–student networks and data augmentation. The proposed method requires a limited number of labeled data to achieve high detection performance in construction scenarios. Initially, the proposed method trains the teacher object detector with labeled data following weak data augmentation. Next, the trained teacher object detector generates pseudo-detection results from unlabeled images that have been weakly augmented. Finally, the student object detector is trained with the pseudo-detection results and unlabeled images that have been both weakly and strongly augmented. In our experiments, 10,000 annotated construction images from the Alberta Construction Image Dataset (ACID) have been divided into a training set (70%) and a validation set (30%). The proposed method achieved a 91% mean average precision (mAP) on the validation set while only requiring 30% of the training set. In comparison, the existing supervised learning method ResNet50 Faster R-CNN achieved a mAP of 90.8% when training on the full training set. These experimental results show the potential of the proposed method in terms of reducing the time, effort, and costs spent on developing construction datasets. As such, this research has explored the potential of semi-supervised learning methods and increased the practicality of vision-based monitoring systems in the construction industry.  相似文献   

13.
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them.  相似文献   

14.
This paper presents a reinforcement learning algorithm which allows a robot, with a single camera mounted on a pan tilt platform, to learn simple skills such as watch and orientation and to obtain the complex skill called approach combining the previously learned ones. The reinforcement signal the robot receives is a real continuous value so it is not necessary to estimate an expected reward. Skills are implemented with a generic structure which permits complex skill creation from sequencing, output addition and data flow of available simple skills.  相似文献   

15.
进化强化学习及其在机器人路径跟踪中的应用   总被引:2,自引:1,他引:2  
研究了一种基于自适应启发评价(AHC)强化学习的移动机器人路径跟踪控制方法.AHC的评价单元(ACE)采用多层前向神经网络来实现.将TD(λ)算法和梯度下降法相结合来更新神经网络的权值.AHC的动作选择单元(ASE)由遗传算法优化的模糊推理系统(FIS)构成.ACE网络的输出构成二次强化信号,用于指导ASE的学习.最后将所提出的算法应用于移动机器人的行为学习,较好地解决了机器人的复杂路径跟踪问题.  相似文献   

16.
Cognitive radio network (CRN) enables unlicensed users (or secondary users, SUs) to sense for and opportunistically operate in underutilized licensed channels, which are owned by the licensed users (or primary users, PUs). Cognitive radio network (CRN) has been regarded as the next-generation wireless network centered on the application of artificial intelligence, which helps the SUs to learn about, as well as to adaptively and dynamically reconfigure its operating parameters, including the sensing and transmission channels, for network performance enhancement. This motivates the use of artificial intelligence to enhance security schemes for CRNs. Provisioning security in CRNs is challenging since existing techniques, such as entity authentication, are not feasible in the dynamic environment that CRN presents since they require pre-registration. In addition these techniques cannot prevent an authenticated node from acting maliciously. In this article, we advocate the use of reinforcement learning (RL) to achieve optimal or near-optimal solutions for security enhancement through the detection of various malicious nodes and their attacks in CRNs. RL, which is an artificial intelligence technique, has the ability to learn new attacks and to detect previously learned ones. RL has been perceived as a promising approach to enhance the overall security aspect of CRNs. RL, which has been applied to address the dynamic aspect of security schemes in other wireless networks, such as wireless sensor networks and wireless mesh networks can be leveraged to design security schemes in CRNs. We believe that these RL solutions will complement and enhance existing security solutions applied to CRN To the best of our knowledge, this is the first survey article that focuses on the use of RL-based techniques for security enhancement in CRNs.  相似文献   

17.
Petrochemical industry is one of the major sectors contributing to the world-wide economy and the digital transformation is urgent to enhance core competence. In general, ethylene, propylene and butadiene, which are associated with synthetic chemicals, are the main raw materials of this industry with around 70–80% cost structure. In particular, butadiene is one of the key materials for producing synthetic rubber and used for several daily commodities. However, the price of butadiene fluctuates along with the demand–supply mismatch or by the international economy and political events. This study proposes two-stage data science framework to predict the weekly price of butadiene and optimize the procurement decision. The first stage suggests several the price prediction models with a comprehensive information including contract price, supply rate, demand rate, and upstream and downstream information. The second stage applies the analytic hierarchy process and reinforcement learning technique to derive an optimal policy of procurement decision and reduce the total procurement cost. An empirical study is conducted to validate the proposed framework, and the results improve the accuracy of price forecasts and the procurement cost reduction of the raw materials.  相似文献   

18.
强化学习在移动机器人自主导航中的应用   总被引:1,自引:1,他引:1       下载免费PDF全文
概述了移动机器人常用的自主导航算法及其优缺点,在此基础上提出了强化学习方法。描述了强化学习算法的原理,并实现了用神经网络解决泛化问题。设计了基于障碍物探测传感器信息的机器人自主导航强化学习方法,给出了学习算法中各要素的数学模型。经仿真验证,算法正确有效,具有良好的收敛性和泛化能力。  相似文献   

19.
This special section features six articles that provide an overview of the emerging research topics at the intersection learning, security, and multi-agent systems. Recent years have witnessed a surge in the number of works at their intersections, and they have appeared in system and control communities as well as many other communities in artificial intelligence, cyber–physical systems, and economics. The articles in this special section give accessible and comprehensive tutorials and surveys for a broad systems and control audience, covering topics including adversarial machine learning, multi-agent reinforcement learning, cyber resilience, resilient control systems, and game design. It is hopeful that this special section will spawn future interest and cross-disciplinary collaborations in this emerging transdisciplinary research area.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号