共查询到20条相似文献,搜索用时 15 毫秒
1.
Reinforcement learning (RL) for solving large and complex problems faces the curse of dimensions problem. To overcome this problem, frameworks based on the temporal abstraction have been presented; each having their advantages and disadvantages. This paper proposes a new method like the strategies introduced in the hierarchical abstract machines (HAMs) to create a high-level controller layer of reinforcement learning which uses options. The proposed framework considers a non-deterministic automata as a controller to make a more effective use of temporally extended actions and state space clustering. This method can be viewed as a bridge between option and HAM frameworks, which tries to suggest a new framework to decrease the disadvantage of both by creating connection structures between them and at the same time takes advantages of them. Experimental results on different test environments show significant efficiency of the proposed method. 相似文献
2.
Tomofumi Ohshita Ji-Sun Shin Michio Miyazaki Hee-Hyol Lee 《Artificial Life and Robotics》2008,13(1):144-147
The distributed autonomous robotic system has superiority of robustness and adaptability to dynamical environment, however,
the system requires the cooperative behavior mutually for optimality of the system. The acquisition of action by reinforcement
learning is known as one of the approaches when the multi-robot works with cooperation mutually for a complex task. This paper
deals with the transporting problem of the multi-robot using Q-learning algorithm in the reinforcement learning. When a robot
carries luggage, we regard it as that the robot leaves a trace to the own migrational path, which trace has feature of volatility,
and then, the other robot can use the trace information to help the robot, which carries luggage. To solve these problems
on multi-agent reinforcement learning, the learning control method using stress antibody allotment reward is used. Moreover,
we propose the trace information of the robot to urge cooperative behavior of the multi-robot to carry luggage to a destination
in this paper. The effectiveness of the proposed method is shown by simulation.
This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January
31–February 2, 2008 相似文献
3.
Harukazu Igarashi 《Artificial Life and Robotics》2002,6(1-2):59-65
At AROB5, we proposed a solution to the path planning of a mobile robot. In our approach, we formulated the problem as a discrete
optimization problem at each time step. To solve the optimization problem, we used an objective function consisting of a goal
term, a smoothness term, and a collision term. While the results of our simulation showed the effectiveness of our approach,
the values of the weights in the objective function were not given by any theoretical method. This article presents a theoretical
method using reinforcement learning for adjusting the weight parameters. We applied Williams' learning algorithm, episodic
REINFORCE, to derive a learning rule for the weight parameters. We verified the learning rule by some experiments.
This work was presented, in part, at the Sixth International Symposium on Artificial Life and Robotics, Tokyo, Japan, January
15–17, 2001 相似文献
4.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable
agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement
learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value
has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper
we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian
processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard
prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has
low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance
of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the
Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance
based regression as a generalization algorithm for RRL.
Editors: David Page and Akihiro Yamamoto 相似文献
5.
In multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of Reinforcement Learning (RL) to the self-emergence of a common lexicon in robot teams. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the robots or the states of the environment itself) into symbols or signals we check whether it is possible for the robot team to converge in an autonomous, decentralized way to a common lexicon by means of RL, so that the communication efficiency of the entire robot team is optimal. We have conducted several experiments aimed at testing whether it is possible to converge with RL to an optimal Saussurean Communication System. We have organized our experiments alongside two main lines: first, we have investigated the effect of the team size centered on teams of moderated size in the order of 5 and 10 individuals, typical of multi-robot systems. Second, and foremost, we have also investigated the effect of the lexicon size on the convergence results. To analyze the convergence of the robot team we have defined the team’s consensus when all the robots (i.e. 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that RL allows the convergence to lexicon consensus in a population of autonomous agents. 相似文献
6.
Ana L. C. Bazzan 《Autonomous Agents and Multi-Agent Systems》2009,18(3):342-375
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general,
and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional
capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to
multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation
system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also
the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of
a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly
adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and
act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent
systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches
and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and
challenges so that future research in multiagent systems can address them. 相似文献
7.
8.
R.W Kentridge 《Parallel Computing》1990,14(3):405-414
It is argued that the backpropagation learning algorithm is unsuited to tackling real world problems such as sensory-motor coordination learning or the encoding of large amounts of background knowledge in neural networks. One difficulty in the real world - the unavailability of ‘teachers’ who already know the solution to problems, may be overcome by the use of reinforcement learning algorithms in place of backpropagation. It is suggested that the complexity of search space in real world neural network learning problems may be reduced if learning is divided into two components. One component is concerned with abstracting structure from the environment and hence with developing representations of stimuli. The other component involves associating and refining these representations on the basis of feedback from the environment. Time-dependent learning problems are also considered in this hybrid framework. Finally, an ‘open systems’ approach in which subsets of a network may adapt independently on the basis of spatio-temporal patterns is briefly discussed. 相似文献
9.
Gabriel Gómez-Pérez José D. Martín-Guerrero Emilio Soria-Olivas Emili Balaguer-Ballester Alberto Palomares Nicolás Casariego 《Expert systems with applications》2009,36(4):8022-8031
In this work, RL is used to find an optimal policy for a marketing campaign. Data show a complex characterization of state and action spaces. Two approaches are proposed to circumvent this problem. The first approach is based on the self-organizing map (SOM), which is used to aggregate states. The second approach uses a multilayer perceptron (MLP) to carry out a regression of the action-value function. The results indicate that both approaches can improve a targeted marketing campaign. Moreover, the SOM approach allows an intuitive interpretation of the results, and the MLP approach yields robust results with generalization capabilities. 相似文献
10.
A distributed motion coordination strategy for multiple nonholonomic mobile robots in cooperative hunting operations 总被引:1,自引:0,他引:1
Hiroaki 《Robotics and Autonomous Systems》2003,43(4):257-282
This paper presents a distributed smooth time-varying feedback control law for coordinating motions of multiple nonholonomic mobile robots of the Hilare-type to capture/enclose a target by making troop formations. This motion coordination is a cooperative behavior for security against invaders in surveillance areas. Each robot in this control law has its own coordinate system and it senses a target/invader, other robots and obstacles, to achieve this cooperative behavior without making any collision. Each robot especially has a two-dimensional control input referred to as a “formation vector” and the formation is controllable by the vectors. The validity of this control law is supported by computer simulations. 相似文献
11.
David Vengerov 《The Journal of supercomputing》2008,43(1):1-19
Multi-tier storage systems are becoming more and more widespread in the industry. They have more tunable parameters and built-in
policies than traditional storage systems, and an adequate configuration of these parameters and policies is crucial for achieving
high performance. A very important performance indicator for such systems is the response time of the file I/O requests. The
response time can be minimized if the most frequently accessed (“hot”) files are located in the fastest storage tiers. Unfortunately,
it is impossible to know a priori which files are going to be hot, especially because the file access patterns change over
time. This paper presents a policy-based framework for dynamically deciding which files need to be upgraded and which files
need to be downgraded based on their recent access pattern and on the system’s current state. The paper also presents a reinforcement
learning (RL) algorithm for automatically tuning the file migration policies in order to minimize the average request response
time. A multi-tier storage system simulator was used to evaluate the migration policies tuned by RL, and such policies were
shown to achieve a significant performance improvement over the best hand-crafted policies found for this domain.
相似文献
David VengerovEmail: |
12.
This paper examines whether and how a primitive form of communication emerges between adaptive agents by using their excess
degrees of freedom in action and perception. As a case study, we consider a game in which two reinforcement learning agents
learn to earn rewards by intruding into the other’s territory. Our simulation shows that agents with lights and light sensors
can learn turn-taking behavior for avoiding collisions using visual communication. Further analysis reveals a variety in the
mapping of messages to signals. In some cases, the differentiation of roles into a sender and a receiver was observed. The
result confirmed that protocommunication can emerge through interaction between agents having generic reinforcement learning
capability.
This work was presented in part at the 12th International Symposium on Artificial Life and Robotics, Oita, Japan, January
25–27, 2007 相似文献
13.
M. Mucientes D. L. Moreno A. Bugarín S. Barro 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(10):881-889
The design of fuzzy controllers for the implementation of behaviors in mobile robotics is a complex and highly time-consuming task. The use of machine learning techniques such as evolutionary algorithms or artificial neural networks for the learning of these controllers allows to automate the design process. In this paper, the automated design of a fuzzy controller using genetic algorithms for the implementation of the wall-following behavior in a mobile robot is described. The algorithm is based on the iterative rule learning approach, and is characterized by three main points. First, learning has no restrictions neither in the number of membership functions, nor in their values. In the second place, the training set is composed of a set of examples uniformly distributed along the universe of discourse of the variables. This warrantees that the quality of the learned behavior does not depend on the environment, and also that the robot will be capable to face different situations. Finally, the trade off between the number of rules and the quality/accuracy of the controller can be adjusted selecting the value of a parameter. Once the knowledge base has been learned, a process for its reduction and tuning is applied, increasing the cooperation between rules and reducing its number. 相似文献
14.
Sergio Martin Gabriel Diaz Inmaculada Plaza Elena Ruiz Manuel Castro Juan PeireAuthor vitae 《Journal of Systems and Software》2011,84(11):1883-1891
The emergence of mobile and ubiquitous technologies as important tools to complement formal learning has been accompanied by a growing interest in their educational benefits and applications. Mobile devices can be used to promote learning anywhere and anytime, to foster social learning and knowledge sharing, or to visualize augmented reality applications for learning purposes. However, the development of these applications is difficult for many researchers because it requires understanding many different protocols; dealing with distributed schemas, processes, platforms, and services; learning new programming languages; and interacting with different hardware sensors and drivers. For that reason, the use of frameworks and middleware that encapsulate part of this complexity appears to be fundamental to the further development of mobile learning projects. This study analyzes the state of the art of frameworks and middleware devoted to simplifying the development of mobile and ubiquitous learning applications. The results can be useful to many researchers involved in the development of projects using these technologies by providing an overview of the features implemented in each of these frameworks. 相似文献
15.
16.
17.
Miguel . Redondo Crescencio Bravo Manuel Ortega M. Felisa Verdejo 《Computers & Education》2007,48(4):642-657
Experimental learning environments based on simulation usually require monitoring and adaptation to the actions the users carry out. Some systems provide this functionality, but they do so in a way which is static or cannot be applied to problem solving tasks. In response to this problem, we propose a method based on the use of intermediate languages to provide adaptation in design learning scenarios. Although we use some approaches which are familiar from other domains (e.g., programming tutors) they are novel as regards their application to a very different domain and as a result we have incorporated new strategies. The purpose of our proposal is to provide monitoring, guidance and adaptive features for PlanEdit, a tool for the learning of integral automation methods in buildings and housing by design. This tool is part of a collaborative environment, called DomoSim-TPC, which supports distance learning of domotical design. We have carried out an experiment to obtain some data which confirm that our position can be effective for group learning of domotical design, studying the relationship between the quantity of model work carried out and the errors made. 相似文献
18.
Recent research on online learning suggests that virtual worlds are becoming an important environment to observe the experience of flow. From these simulated spaces, researchers may gather a deeper understanding of cognition in the context of game-based learning. Csikszentmihalyi (1997) describes flow as a feeling of increased psychological immersion and energized focus, with outcomes that evoke disregard for external pressures and the loss of time consciousness, issuing in a sense of pleasure. Past studies suggest that flow is encountered in an array of activities and places, including those in virtual worlds. The authors’ posit that flow in virtual worlds, such as Second Life (SL), can be positively associated with degrees of the cognitive phenomenon of immersion and telepresence. Flow may also contribute to a better attitude and behavior during virtual game-based learning. This study tested three hypotheses related to flow and telepresence, using SL. Findings suggest that both flow and telepresence are experienced in SL and that there is a significant correlation between them. These findings shed light on the complex interrelationships and interactions that lead to flow experience in virtual gameplay and learning, while engendering hope that learners, who experience flow, may acquire an improved attitude of learning online. 相似文献
19.
Design plays a central role in a range of subjects at different educational levels. Students have to acquire the knowledge necessary for the execution of tasks that enable them to construct an artefact or model that can be tested by simulation and that satisfies some requirements and verifies some constraints. They achieve this by means of a design process. In some design domains there is a lack of teaching tools from a learner-centred perspective. Moreover, when these domains are complex, the design problems that the students have to solve during their learning process require the design activity to be carried out in group. In response to this situation, we have developed a design model and a collaborative learning method. Using this conceptual framework, we have built a collaborative environment for the learning of domotical1 design by means of complex problem solving, with an emphasis on synchronous collaboration for work distribution, discussion, design in shared surfaces and simulation. This environment has already been evaluated and used in real teaching experiences. 相似文献
20.
Oscar Ardaiz-Villanueva Xabier Nicuesa-Chacón Oscar Brene-Artazcoz María Luisa Sanz de Acedo Lizarraga María Teresa Sanz de Acedo Baquedano 《Computers & Education》2011
The main objective of this research was to validate the effectiveness of Wikideas and Creativity Connector tools to stimulate the generation of ideas and originality by university students organized into groups according to their indexes of creativity and affinity. Another goal of the study was to evaluate the classroom climate created by these tools and the method “Think Actively in a Social Context” (TASC) proposed by Wallace and Adams (1993) and framed within project-based learning (PBL). The research was conducted with a sample of 34 students in the third year of a Computer Engineering degree, which, during a period of 15 weeks, required them to design and implement an innovative distributed application project. The procedure consisted of the implementation of the eight phases of the TASC method integrated to the Wikideas and Creativity Connector tools. The information provided by the tools, interviews and questionnaires administered to students were used to analyze our hypothesis. The results show that the tools helped the students to generate, evaluate and select the most relevant ideas and to form teams for project execution. They also revealed that teams with high indexes of creativity and affinity (type α) achieved the best grades in academic performance and project originality. Furthermore, research data show that Wikideas and Creativity Connector along with the TASC approach created a positive classroom climate for students. Based on this work, several suggestions can be extracted on the use of the tools and the TASC method for project-based learning. 相似文献