首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent’s non-task behavior can affect a human trainer’s training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent’s actions, as the foundation for our investigation. Then, starting from the premise that the interaction between the agent and the trainer should be bi-directional, we propose two new training interfaces to increase a human trainer’s active involvement in the training process and thereby improve the agent’s task performance. One provides information on the agent’s uncertainty which is a metric calculated as data coverage, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent’s performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. These results suggest that the organizational maxim about human behavior, “you get what you measure”—i.e., sharing metrics with people causes them to focus on optimizing those metrics while de-emphasizing other objectives—also applies to the training of agents. Using principle component analysis, we show how trainers in the two conditions train agents differently. In addition, by simulating the influence of the agent’s uncertainty–informative behavior on a human’s training behavior, we show that trainers could be distracted by the agent sharing its uncertainty levels about its actions, giving poor feedback for the sake of reducing the agent’s uncertainty without improving the agent’s performance.  相似文献   

2.
Abstract  Recent out-of-classroom studies have shown that interaction between children can facilitate correct solution of problem solving tasks, and that the benefits of interactive experience can carry over to subsequent individual post-tests. These results have also been demonstrated in the context of schoolchildren's microcomputer use, although with caveats that may have more to do with the tasks than the medium. The present study investigated peer interaction in a more naturalistic classroom-like context. Groups of children, with access to different levels of microcomputer resource, were given an introduction to using micro-PROLOG.  相似文献   

3.
Model-based average reward reinforcement learning   总被引:7,自引:0,他引:7  
《Artificial Intelligence》1998,100(1-2):177-224
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Averagereward Reinforcement Learning method called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this “Auto-exploratory H-Learning” performs better than the previously studied exploration strategies. To scale H-learning to larger state spaces, we extend it to learn action models and reward functions in the form of dynamic Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirement of H-learning and making it converge faster in some AGV scheduling tasks.  相似文献   

4.
通过研究基于回报函数学习的学徒学习的发展历史和目前的主要工作,概述了基于回报函数学习的学徒学习方法.分别在回报函数为线性和非线性条件下讨论,并且在线性条件下比较了2类方法——基于逆向增强学习(IRL)和最大化边际规划(MMP)的学徒学习.前者有较为快速的近似算法,但对于演示的最优性作了较强的假设;后者形式上更易于扩展,但计算量大.最后,提出了该领域现在还存在的问题和未来的研究方向,如把学徒学习应用于POMDP环境下,用PBVI等近似算法或者通过PCA等降维方法对数据进行学习特征的提取,从而减少高维度带来的大计算量问题.  相似文献   

5.
The focus of this study is to explore the advances that Social Network Analysis (SNA) can bring, in combination with other methods, when studying Networked Learning/Computer-Supported Collaborative Learning (NL/CSCL). We present a general overview of how SNA is applied in NL/CSCL research; we then go on to illustrate how this research method can be integrated with existing studies on NL/CSCL, using an example from our own data, as a way to synthesize and extend our understanding of teaching and learning processes in NLCs. The example study reports empirical work using content analysis (CA), critical event recall (CER) and social network analysis (SNA). The aim is to use these methods to study the nature of the interaction patterns within a networked learning community (NLC), and the way its members share and construct knowledge. The paper also examines some of the current findings of SNA analysis work elsewhere in the literature, and discusses future prospects for SNA. This paper is part of a continuing international study that is investigating NL/CSCL among a community of learners engaged in a master’s program in e-learning.  相似文献   

6.
Applied Intelligence - A multi-agent system (MAS) is expected to be applied to various real-world problems where a single agent cannot accomplish given tasks. Due to the inherent complexity in the...  相似文献   

7.
《Knowledge》2007,20(5):439-456
HRI (Human–Robot Interaction) is often frequent and intense in assistive service environment and it is known that realizing human-friendly interaction is a very difficult task because of human presence as a subsystem of the interaction process. After briefly discussing typical HRI models and characteristics of human, we point out that learning aspect would play an important role for designing the interaction process of the human-in-the loop system. We then show that the soft computing toolbox approach, especially with fuzzy set-based learning techniques, can be effectively adopted for modeling human behavior patterns as well as for processing human bio-signals including facial expressions, hand/ body gestures, EMG and so forth. Two project works are briefly described to illustrate how the fuzzy logic-based learning techniques and the soft computing toolbox approach are successfully applied for human-friendly HRI systems. Next, we observe that probabilistic fuzzy rules can handle inconsistent data patterns originated from human, and show that combination of fuzzy logic, fuzzy clustering, and probabilistic reasoning in a single frame leads to an algorithm of iterative fuzzy clustering with supervision. Further, we discuss a possibility of using the algorithm for inductively constructing probabilistic fuzzy rule base in a learning system of a smart home. Finally, we propose a life-long learning system architecture for the HRI type of human-in-the-loop systems.  相似文献   

8.
This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.  相似文献   

9.
Reinforcement learning is an area of machine learning that does not require detailed teaching signals by a human, which is expected to be applied to real robots. In its application to real robots, the learning processes are required to be finished in a short learning period of time. A reinforcement learning method of model-free type has fast convergence speeds in the tasks such as Sutton’s maze problem that aims to reach the target state in a minimum time. However, these methods are difficult to learn task to keep a stable state as long as possible. In this study, we improve the reward allocation method for the stabilizing control tasks. In stabilizing control tasks, we use the Semi-Markov decision process as an environment model. The validity of our method is demonstrated through simulation for stabilizing control of an inverted pendulum.  相似文献   

10.
An efficient learning algorithm for associative memories   总被引:1,自引:0,他引:1  
Associative memories (AMs) can be implemented using networks with or without feedback. We utilize a two-layer feedforward neural network and propose a learning algorithm that efficiently implements the association rule of a bipolar AM. The hidden layer of the network employs p neurons where p is the number of prototype patterns. In the first layer, the input pattern activates at most one hidden layer neuron or "winner". In the second layer, the "winner" associates the input pattern to the corresponding prototype pattern. The underlying association principle is minimum Hamming distance and the proposed scheme can be viewed also as an approximately minimum Hamming distance decoder. Theoretical analysis supported by simulations indicates that, in comparison with other suboptimum minimum Hamming distance association schemes, the proposed structure exhibits the following favorable characteristics: 1) it operates in one-shot which implies no convergence-time requirements; 2) it does not require any feedback; and 3) our case studies show that it exhibits superior performance to the popular linear system in a saturated mode. The network also exhibits 4) exponential capacity and 5) easy performance assessment (no asymptotic analysis is necessary). Finally, since it does not require any hidden layer interconnections or tree-search operations, it exhibits low structural as well as operational complexity.  相似文献   

11.
Nowadays, thanks to the rapid evolvement of information technology, an explosively large amount of information with very high-dimensional features for customers is being accumulated in companies. These companies, in turn, are exerting every effort to develop more efficient churn prediction models for managing customer relationships effectively. In this paper, a novel method is proposed to deal with a high-dimensional large data set for constructing better churn prediction models. The proposed method starts by partitioning a data set into small-sized data subsets, and applies sequential manifold learning to reduce high-dimensional features and give consistent results for combined data subsets. The performance of the constructed churn prediction model using the proposed method is tested using an E-commerce data set by comparing it with other existing methods. The proposed method works better and is much faster for high-dimensional large data sets without the need for retraining the original data set to reduce the dimensions of new test samples.  相似文献   

12.
We present a novel formulation for quasi-supervised learning that extends the learning paradigm to large datasets. Quasi-supervised learning computes the posterior probabilities of overlapping datasets at each sample and labels those that are highly specific to their respective datasets. The proposed formulation partitions the data into sample groups to compute the dataset posterior probabilities in a smaller computational complexity. In experiments on synthetic as well as real datasets, the proposed algorithm attained significant reduction in the computation time for similar recognition performances compared to the original algorithm, effectively generalizing the quasi-supervised learning paradigm to applications characterized by very large datasets.  相似文献   

13.
Sparse Bayesian learning for efficient visual tracking   总被引:4,自引:0,他引:4  
This paper extends the use of statistical learning algorithms for object localization. It has been shown that object recognizers using kernel-SVMs can be elegantly adapted to localization by means of spatial perturbation of the SVM. While this SVM applies to each frame of a video independently of other frames, the benefits of temporal fusion of data are well-known. This is addressed here by using a fully probabilistic relevance vector machine (RVM) to generate observations with Gaussian distributions that can be fused over time. Rather than adapting a recognizer, we build a displacement expert which directly estimates displacement from the target region. An object detector is used in tandem, for object verification, providing the capability for automatic initialization and recovery. This approach is demonstrated in real-time tracking systems where the sparsity of the RVM means that only a fraction of CPU time is required to track at frame rate. An experimental evaluation compares this approach to the state of the art showing it to be a viable method for long-term region tracking.  相似文献   

14.
Support vector machines (SVMs) have been broadly applied to classification problems. However, a successful application of SVMs depends heavily on the determination of the right type and suitable hyperparameter settings of kernel functions. Recently, multiple-kernel learning (MKL) algorithms have been developed to deal with these issues by combining different kernels together. The weight with each kernel in the combination is obtained through learning. Lanckriet et al. proposed a way of deriving the weights by transforming the learning into a semidefinite programming (SDP) problem with a transduction setting. However, the amount of time and space required by this method is demanding. In this paper, we reformulate the SDP problem with an induction setting and incorporate two strategies to reduce the search complexity of the learning process, based on the comments discussed in the Lanckriet et al. paper. The primal and dual forms of SDP are derived. A discussion on computation complexity is given. Experimental results obtained from synthetic and benchmark datasets show that the proposed method runs efficiently in multiple-kernel learning.  相似文献   

15.
Over the last few years, multiply-annotated data has become a very popular source of information. Online platforms such as Amazon Mechanical Turk have revolutionized the labelling process needed for any classification task, sharing the effort between a number of annotators (instead of the classical single expert). This crowdsourcing approach has introduced new challenging problems, such as handling disagreements on the annotated samples or combining the unknown expertise of the annotators. Probabilistic methods, such as Gaussian Processes (GP), have proven successful to model this new crowdsourcing scenario. However, GPs do not scale up well with the training set size, which makes them prohibitive for medium-to-large datasets (beyond 10K training instances). This constitutes a serious limitation for current real-world applications. In this work, we introduce two scalable and efficient GP-based crowdsourcing methods that allow for processing previously-prohibitive datasets. The first one is an efficient and fast approximation to GP with squared exponential (SE) kernel. The second allows for learning a more flexible kernel at the expense of a heavier training (but still scalable to large datasets). Since the latter is not a GP-SE approximation, it can be also considered as a whole new scalable and efficient crowdsourcing method, useful for any dataset size. Both methods use Fourier features and variational inference, can predict the class of new samples, and estimate the expertise of the involved annotators. A complete experimentation compares them with state-of-the-art probabilistic approaches in synthetic and real crowdsourcing datasets of different sizes. They stand out as the best performing approach for large scale problems. Moreover, the second method is competitive with the current state-of-the-art for small datasets.  相似文献   

16.
零阶学习分类元系统ZCS(Zeroth-level Classifier System)作为一种基于遗传的机器学习技术(Genetics-Based Machine Learning),在解决多步学习问题上,已展现出应用价值。然而标准的ZCS系统采用折扣奖赏强化学习技术,难于适应更为广泛的应用领域。基于ZCS的现有框架,提出了一种采用平均奖赏强化学习技术(R-学习算法)的分类元系统,将ZCS中的折扣奖赏强化学习方法替换为R-学习算法,从而使ZCS一方面可应用于需要优化平均奖赏的问题领域,另一方面则可求解规模较大、需要动作长链支持的多步学习问题。实验显示,在多步学习问题中,该系统可给出满意解,且在维持动作长链,以及克服过泛化问题方面,具有更优的特性。  相似文献   

17.
18.
In multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on theRationality Theorem of Profit Sharing 5) and analyze how to share a reward among all profit sharing agents. When an agent gets adirect reward R (R>0), anindirect reward μR (μ≥0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows;
whereM andL are the maximum number of conflicting all rules and rational rules in the same sensory input,W andW o are the maximum episode length of adirect and anindirect-reward agents, andn is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem. Kazuteru Miyazaki, Dr. Eng.: He is an associate professor in the Faculty of Assessment and Research for Degrees at National Institution for Academic Degrees. He obtained his BEng. form Meiji University in 1991, and his Dr. Eng. form Tokyo Institute of Technology in 1996. His research interests are in Machine Learning and Robotics. He has published over 30 research papers and received several awards. He is a member of the Japan Society of Mechanical Engineers (JSME), Japanese Society for Artificial Intelligence (JSAI), and the Society of Instrument and Control Engineers of Japan (SICE). Shigenobu Kobayashi, Dr. Eng.: He received his Dr. Eng. from Tokyo Institute of Technology in 1974. He is professor at Dept. of Computational Intelligence and Systems Science, Tokyo Institute of Technology. His research interests include artificial intelligence, emergent systems, evolutionary computation and reinforcement learning.  相似文献   

19.
Dealing with changing situations is a major issue in building agent systems. When the time is limited, knowledge is unreliable, and resources are scarce, the issue becomes more challenging. The BDI (Belief-Desire-Intention) agent architecture provides a model for building agents that addresses that issue. The model can be used to build intentional agents that are able to reason based on explicit mental attitudes, while behaving reactively in changing circumstances. However, despite the reactive and deliberative features, a classical BDI agent is not capable of learning. Plans as recipes that guide the activities of the agent are assumed to be static. In this paper, an architecture for an intentional learning agent is presented. The architecture is an extension of the BDI architecture in which the learning process is explicitly described as plans. Learning plans are meta-level plans which allow the agent to introspectively monitor its mental states and update other plans at run time. In order to acquire the intricate structure of a plan, a process pattern called manipulative abduction is encoded as a learning plan. This work advances the state of the art by combining the strengths of learning and BDI agent frameworks in a rich language for describing deliberation processes and reactive execution. It enables domain experts to specify learning processes and strategies explicitly, while allowing the agent to benefit from procedural domain knowledge expressed in plans.  相似文献   

20.
Researchers in biomechanics, neuroscience, human–machine interaction and other fields are interested in inferring human intentions and objectives from observed actions. The problem of inferring objectives from observations has received extensive theoretical and methodological development from both the controls and machine learning communities. In this paper, we provide an integrating view of objective learning from human demonstration data. We differentiate algorithms based on the assumptions made about the objective function structure, how the similarity between the inferred objectives and the observed demonstrations is assessed, the assumptions made about the agent and environment model, and the properties of the observed human demonstrations. We review the application domains and validation approaches of existing works and identify the key open challenges and limitations. The paper concludes with an identification of promising directions for future work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号