首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method.  相似文献   

2.
The distributed autonomous robotic system has superiority of robustness and adaptability to dynamical environment, however, the system requires the cooperative behavior mutually for optimality of the system. The acquisition of action by reinforcement learning is known as one of the approaches when the multi-robot works with cooperation mutually for a complex task. This paper deals with the transporting problem of the multi-robot using Q-learning algorithm in the reinforcement learning. When a robot carries luggage, we regard it as that the robot leaves a trace to the own migrational path, which trace has feature of volatility, and then, the other robot can use the trace information to help the robot, which carries luggage. To solve these problems on multi-agent reinforcement learning, the learning control method using stress antibody allotment reward is used. Moreover, we propose the trace information of the robot to urge cooperative behavior of the multi-robot to carry luggage to a destination in this paper. The effectiveness of the proposed method is shown by simulation. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

3.
At AROB5, we proposed a solution to the path planning of a mobile robot. In our approach, we formulated the problem as a discrete optimization problem at each time step. To solve the optimization problem, we used an objective function consisting of a goal term, a smoothness term, and a collision term. While the results of our simulation showed the effectiveness of our approach, the values of the weights in the objective function were not given by any theoretical method. This article presents a theoretical method using reinforcement learning for adjusting the weight parameters. We applied Williams' learning algorithm, episodic REINFORCE, to derive a learning rule for the weight parameters. We verified the learning rule by some experiments. This work was presented, in part, at the Sixth International Symposium on Artificial Life and Robotics, Tokyo, Japan, January 15–17, 2001  相似文献   

4.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL. Editors: David Page and Akihiro Yamamoto  相似文献   

5.
In multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of Reinforcement Learning (RL) to the self-emergence of a common lexicon in robot teams. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the robots or the states of the environment itself) into symbols or signals we check whether it is possible for the robot team to converge in an autonomous, decentralized way to a common lexicon by means of RL, so that the communication efficiency of the entire robot team is optimal. We have conducted several experiments aimed at testing whether it is possible to converge with RL to an optimal Saussurean Communication System. We have organized our experiments alongside two main lines: first, we have investigated the effect of the team size centered on teams of moderated size in the order of 5 and 10 individuals, typical of multi-robot systems. Second, and foremost, we have also investigated the effect of the lexicon size on the convergence results. To analyze the convergence of the robot team we have defined the team’s consensus when all the robots (i.e. 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that RL allows the convergence to lexicon consensus in a population of autonomous agents.  相似文献   

6.
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them.  相似文献   

7.
进化强化学习及其在机器人路径跟踪中的应用   总被引:2,自引:1,他引:2  
研究了一种基于自适应启发评价(AHC)强化学习的移动机器人路径跟踪控制方法.AHC的评价单元(ACE)采用多层前向神经网络来实现.将TD(λ)算法和梯度下降法相结合来更新神经网络的权值.AHC的动作选择单元(ASE)由遗传算法优化的模糊推理系统(FIS)构成.ACE网络的输出构成二次强化信号,用于指导ASE的学习.最后将所提出的算法应用于移动机器人的行为学习,较好地解决了机器人的复杂路径跟踪问题.  相似文献   

8.
It is argued that the backpropagation learning algorithm is unsuited to tackling real world problems such as sensory-motor coordination learning or the encoding of large amounts of background knowledge in neural networks. One difficulty in the real world - the unavailability of ‘teachers’ who already know the solution to problems, may be overcome by the use of reinforcement learning algorithms in place of backpropagation. It is suggested that the complexity of search space in real world neural network learning problems may be reduced if learning is divided into two components. One component is concerned with abstracting structure from the environment and hence with developing representations of stimuli. The other component involves associating and refining these representations on the basis of feedback from the environment. Time-dependent learning problems are also considered in this hybrid framework. Finally, an ‘open systems’ approach in which subsets of a network may adapt independently on the basis of spatio-temporal patterns is briefly discussed.  相似文献   

9.
In this work, RL is used to find an optimal policy for a marketing campaign. Data show a complex characterization of state and action spaces. Two approaches are proposed to circumvent this problem. The first approach is based on the self-organizing map (SOM), which is used to aggregate states. The second approach uses a multilayer perceptron (MLP) to carry out a regression of the action-value function. The results indicate that both approaches can improve a targeted marketing campaign. Moreover, the SOM approach allows an intuitive interpretation of the results, and the MLP approach yields robust results with generalization capabilities.  相似文献   

10.
This paper presents a distributed smooth time-varying feedback control law for coordinating motions of multiple nonholonomic mobile robots of the Hilare-type to capture/enclose a target by making troop formations. This motion coordination is a cooperative behavior for security against invaders in surveillance areas. Each robot in this control law has its own coordinate system and it senses a target/invader, other robots and obstacles, to achieve this cooperative behavior without making any collision. Each robot especially has a two-dimensional control input referred to as a “formation vector” and the formation is controllable by the vectors. The validity of this control law is supported by computer simulations.  相似文献   

11.
Multi-tier storage systems are becoming more and more widespread in the industry. They have more tunable parameters and built-in policies than traditional storage systems, and an adequate configuration of these parameters and policies is crucial for achieving high performance. A very important performance indicator for such systems is the response time of the file I/O requests. The response time can be minimized if the most frequently accessed (“hot”) files are located in the fastest storage tiers. Unfortunately, it is impossible to know a priori which files are going to be hot, especially because the file access patterns change over time. This paper presents a policy-based framework for dynamically deciding which files need to be upgraded and which files need to be downgraded based on their recent access pattern and on the system’s current state. The paper also presents a reinforcement learning (RL) algorithm for automatically tuning the file migration policies in order to minimize the average request response time. A multi-tier storage system simulator was used to evaluate the migration policies tuned by RL, and such policies were shown to achieve a significant performance improvement over the best hand-crafted policies found for this domain.
David VengerovEmail:
  相似文献   

12.
This paper examines whether and how a primitive form of communication emerges between adaptive agents by using their excess degrees of freedom in action and perception. As a case study, we consider a game in which two reinforcement learning agents learn to earn rewards by intruding into the other’s territory. Our simulation shows that agents with lights and light sensors can learn turn-taking behavior for avoiding collisions using visual communication. Further analysis reveals a variety in the mapping of messages to signals. In some cases, the differentiation of roles into a sender and a receiver was observed. The result confirmed that protocommunication can emerge through interaction between agents having generic reinforcement learning capability. This work was presented in part at the 12th International Symposium on Artificial Life and Robotics, Oita, Japan, January 25–27, 2007  相似文献   

13.
The design of fuzzy controllers for the implementation of behaviors in mobile robotics is a complex and highly time-consuming task. The use of machine learning techniques such as evolutionary algorithms or artificial neural networks for the learning of these controllers allows to automate the design process. In this paper, the automated design of a fuzzy controller using genetic algorithms for the implementation of the wall-following behavior in a mobile robot is described. The algorithm is based on the iterative rule learning approach, and is characterized by three main points. First, learning has no restrictions neither in the number of membership functions, nor in their values. In the second place, the training set is composed of a set of examples uniformly distributed along the universe of discourse of the variables. This warrantees that the quality of the learned behavior does not depend on the environment, and also that the robot will be capable to face different situations. Finally, the trade off between the number of rules and the quality/accuracy of the controller can be adjusted selecting the value of a parameter. Once the knowledge base has been learned, a process for its reduction and tuning is applied, increasing the cooperation between rules and reducing its number.  相似文献   

14.
The emergence of mobile and ubiquitous technologies as important tools to complement formal learning has been accompanied by a growing interest in their educational benefits and applications. Mobile devices can be used to promote learning anywhere and anytime, to foster social learning and knowledge sharing, or to visualize augmented reality applications for learning purposes. However, the development of these applications is difficult for many researchers because it requires understanding many different protocols; dealing with distributed schemas, processes, platforms, and services; learning new programming languages; and interacting with different hardware sensors and drivers. For that reason, the use of frameworks and middleware that encapsulate part of this complexity appears to be fundamental to the further development of mobile learning projects. This study analyzes the state of the art of frameworks and middleware devoted to simplifying the development of mobile and ubiquitous learning applications. The results can be useful to many researchers involved in the development of projects using these technologies by providing an overview of the features implemented in each of these frameworks.  相似文献   

15.
16.
17.
Experimental learning environments based on simulation usually require monitoring and adaptation to the actions the users carry out. Some systems provide this functionality, but they do so in a way which is static or cannot be applied to problem solving tasks. In response to this problem, we propose a method based on the use of intermediate languages to provide adaptation in design learning scenarios. Although we use some approaches which are familiar from other domains (e.g., programming tutors) they are novel as regards their application to a very different domain and as a result we have incorporated new strategies. The purpose of our proposal is to provide monitoring, guidance and adaptive features for PlanEdit, a tool for the learning of integral automation methods in buildings and housing by design. This tool is part of a collaborative environment, called DomoSim-TPC, which supports distance learning of domotical design. We have carried out an experiment to obtain some data which confirm that our position can be effective for group learning of domotical design, studying the relationship between the quantity of model work carried out and the errors made.  相似文献   

18.
Recent research on online learning suggests that virtual worlds are becoming an important environment to observe the experience of flow. From these simulated spaces, researchers may gather a deeper understanding of cognition in the context of game-based learning. Csikszentmihalyi (1997) describes flow as a feeling of increased psychological immersion and energized focus, with outcomes that evoke disregard for external pressures and the loss of time consciousness, issuing in a sense of pleasure. Past studies suggest that flow is encountered in an array of activities and places, including those in virtual worlds. The authors’ posit that flow in virtual worlds, such as Second Life (SL), can be positively associated with degrees of the cognitive phenomenon of immersion and telepresence. Flow may also contribute to a better attitude and behavior during virtual game-based learning. This study tested three hypotheses related to flow and telepresence, using SL. Findings suggest that both flow and telepresence are experienced in SL and that there is a significant correlation between them. These findings shed light on the complex interrelationships and interactions that lead to flow experience in virtual gameplay and learning, while engendering hope that learners, who experience flow, may acquire an improved attitude of learning online.  相似文献   

19.
Design plays a central role in a range of subjects at different educational levels. Students have to acquire the knowledge necessary for the execution of tasks that enable them to construct an artefact or model that can be tested by simulation and that satisfies some requirements and verifies some constraints. They achieve this by means of a design process. In some design domains there is a lack of teaching tools from a learner-centred perspective. Moreover, when these domains are complex, the design problems that the students have to solve during their learning process require the design activity to be carried out in group. In response to this situation, we have developed a design model and a collaborative learning method. Using this conceptual framework, we have built a collaborative environment for the learning of domotical1 design by means of complex problem solving, with an emphasis on synchronous collaboration for work distribution, discussion, design in shared surfaces and simulation. This environment has already been evaluated and used in real teaching experiences.  相似文献   

20.
The main objective of this research was to validate the effectiveness of Wikideas and Creativity Connector tools to stimulate the generation of ideas and originality by university students organized into groups according to their indexes of creativity and affinity. Another goal of the study was to evaluate the classroom climate created by these tools and the method “Think Actively in a Social Context” (TASC) proposed by Wallace and Adams (1993) and framed within project-based learning (PBL). The research was conducted with a sample of 34 students in the third year of a Computer Engineering degree, which, during a period of 15 weeks, required them to design and implement an innovative distributed application project. The procedure consisted of the implementation of the eight phases of the TASC method integrated to the Wikideas and Creativity Connector tools. The information provided by the tools, interviews and questionnaires administered to students were used to analyze our hypothesis. The results show that the tools helped the students to generate, evaluate and select the most relevant ideas and to form teams for project execution. They also revealed that teams with high indexes of creativity and affinity (type α) achieved the best grades in academic performance and project originality. Furthermore, research data show that Wikideas and Creativity Connector along with the TASC approach created a positive classroom climate for students. Based on this work, several suggestions can be extracted on the use of the tools and the TASC method for project-based learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号