首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 566 毫秒
1.
A very important aspect in developing robots capable of human-robot interaction (HRI) is the research in natural, human-like communication, and subsequently, the development of a research platform with multiple HRI capabilities for evaluation. Besides a flexible dialog system and speech understanding, an anthropomorphic appearance has the potential to support intuitive usage and understanding of a robot, e.g., human-like facial expressions and deictic gestures can as well be produced and also understood by the robot. As a consequence of our effort in creating an anthropomorphic appearance and to come close to a human- human interaction model for a robot, we decided to use human-like sensors, i.e., two cameras and two microphones only, in analogy to human perceptual capabilities too. Despite the challenges resulting from these limits with respect to perception, a robust attention system for tracking and interacting with multiple persons simultaneously in real time is presented. The tracking approach is sufficiently generic to work on robots with varying hardware, as long as stereo audio data and images of a video camera are available. To easily implement different interaction capabilities like deictic gestures, natural adaptive dialogs, and emotion awareness on the robot, we apply a modular integration approach utilizing XML-based data exchange. The paper focuses on our efforts to bring together different interaction concepts and perception capabilities integrated on a humanoid robot to achieve comprehending human-oriented interaction.  相似文献   

2.
Humans use a combination of gesture and speech to interact with objects and usually do so more naturally without holding a device or pointer. We present a system that incorporates user body-pose estimation, gesture recognition and speech recognition for interaction in virtual reality environments. We describe a vision-based method for tracking the pose of a user in real time and introduce a technique that provides parameterized gesture recognition. More precisely, we train a support vector classifier to model the boundary of the space of possible gestures, and train Hidden Markov Models (HMM) on specific gestures. Given a sequence, we can find the start and end of various gestures using a support vector classifier, and find gesture likelihoods and parameters with a HMM. A multimodal recognition process is performed using rank-order fusion to merge speech and vision hypotheses. Finally we describe the use of our multimodal framework in a virtual world application that allows users to interact using gestures and speech.  相似文献   

3.
4.
A multimodal interactive dialogue automaton (kiosk) for self-service is presented in the paper. Multimodal user interface allow people to interact with the kiosk by natural speech, gestures additionally to the standard input and output devices. Architecture of the kiosk contains key modules of speech processing and computer vision. An array of four microphones is applied for far-field capturing and recording of user’s speech commands, it allows the kiosk to detect voice activity, to localize sources of desired speech signals, and to eliminate environmental acoustical noises. A noise robust speaker-independent recognition system is applied to automatic interpretation and understanding of continuous Russian speech. The distant speech recognizer uses grammar of voice queries as well as garbage and silence models to improve recognition accuracy. Pair of portable video-cameras are applied for vision-based detection and tracking of user’s head and body position inside of the working area. Russian-speaking talking head serves both for bimodal audio-visual speech synthesis and for improvement of communication intelligibility by turning the head to an approaching client. Dialogue manager controls the flow of dialogue and synchronizes sub-modules for input modalities fusion and output modalities fission. The experiments made with the multimodal kiosk were directed to cognitive and usability studies of human-computer interaction by different communication means  相似文献   

5.
For pt.1see ibid., vol. 9, p. 3 (2007). In this paper, the task and user interface modules of a multimodal dialogue system development platform are presented. The main goal of this work is to provide a simple, application-independent solution to the problem of multimodal dialogue design for information seeking applications. The proposed system architecture clearly separates the task and interface components of the system. A task manager is designed and implemented that consists of two main submodules: the electronic form module that handles the list of attributes that have to be instantiated by the user, and the agenda module that contains the sequence of user and system tasks. Both the electronic forms and the agenda can be dynamically updated by the user. Next a spoken dialogue module is designed that implements the speech interface for the task manager. The dialogue manager can handle complex error correction and clarification user input, building on the semantics and pragmatic modules presented in Part I of this paper. The spoken dialogue system is evaluated for a travel reservation task of the DARPA Communicator research program and shown to yield over 90% task completion and good performance for both objective and subjective evaluation metrics. Finally, a multimodal dialogue system which combines graphical and speech interfaces, is designed, implemented and evaluated. Minor modifications to the unimodal semantic and pragmatic modules were required to build the multimodal system. It is shown that the multimodal system significantly outperforms the unimodal speech-only system both in terms of efficiency (task success and time to completion) and user satisfaction for a travel reservation task  相似文献   

6.
Human-Robot Interaction Through Gesture-Free Spoken Dialogue   总被引:1,自引:0,他引:1  
We present an approach to human-robot interaction through gesture-free spoken dialogue. Our approach is based on passive knowledge rarefication through goal disambiguation, a technique that allows a human operator to collaborate with a mobile robot on various tasks through spoken dialogue without making bodily gestures. A key assumption underlying our approach is that the operator and the robot share a common set of goals. Another key idea is that language, vision, and action share common memory structures.We discuss how our approach achieves four types of human-robot interaction: command, goal disambiguation, introspection, and instruction-based learning. We describe the system we developed to implement our approach and present experimental results.  相似文献   

7.
8.
The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users.  相似文献   

9.
In this paper, we present a human-robot teaching framework that uses “virtual” games as a means for adapting a robot to its user through natural interaction in a controlled environment. We present an experimental study in which participants instruct an AIBO pet robot while playing different games together on a computer generated playfield. By playing the games and receiving instruction and feedback from its user, the robot learns to understand the user’s typical way of giving multimodal positive and negative feedback. The games are designed in such a way that the robot can reliably predict positive or negative feedback based on the game state and explore its user’s reward behavior by making good or bad moves. We implemented a two-staged learning method combining Hidden Markov Models and a mathematical model of classical conditioning to learn how to discriminate between positive and negative feedback. The system combines multimodal speech and touch input for reliable recognition. After finishing the training, the system was able to recognize positive and negative reward with an average accuracy of 90.33%.  相似文献   

10.
面向虚实融合的人机交互涉及计算机科学、认知心理学、人机工程学、多媒体技术和虚拟现实等领域,旨在提高人机交互的效率,同时响应人类认知与情感的需求,在办公教育、机器人和虚拟/增强现实设备中都有广泛应用。本文从人机交互涉及感知计算、人与机器人交互及协同、个性化人机对话和数据可视化等4个维度系统阐述面向虚实融合人机交互的发展现状。对国内外研究现状进行对比,展望未来的发展趋势。本文认为兼具可迁移与个性化的感知计算、具备用户行为深度理解的人机协同、用户自适应的对话系统等是本领域的重要研究方向。  相似文献   

11.
12.
Visual interpretation of gestures can be useful in accomplishing natural human-robot interaction (HRI). Previous HRI research focused on issues such as hand gestures, sign language, and command gesture recognition. Automatic recognition of whole-body gestures is required in order for HRI to operate naturally. This presents a challenging problem, because describing and modeling meaningful gesture patterns from whole-body gestures is a complex task. This paper presents a new method for recognition of whole-body key gestures in HRI. A human subject is first described by a set of features, encoding the angular relationship between a dozen body parts in 3-D. A feature vector is then mapped to a codeword of hidden Markov models. In order to spot key gestures accurately, a sophisticated method of designing a transition gesture model is proposed. To reduce the states of the transition gesture model, model reduction which merges similar states based on data-dependent statistics and relative entropy is used. The experimental results demonstrate that the proposed method can be efficient and effective in HRI, for automatic recognition of whole-body key gestures from motion sequences  相似文献   

13.
Assistance is currently a pivotal research area in robotics, with huge societal potential. Since assistant robots directly interact with people, finding natural and easy-to-use user interfaces is of fundamental importance. This paper describes a flexible multimodal interface based on speech and gesture modalities in order to control our mobile robot named Jido. The vision system uses a stereo head mounted on a pan-tilt unit and a bank of collaborative particle filters devoted to the upper human body extremities to track and recognize pointing/symbolic mono but also bi-manual gestures. Such framework constitutes our first contribution, as it is shown, to give proper handling of natural artifacts (self-occlusion, camera out of view field, hand deformation) when performing 3D gestures using one or the other hand even both. A speech recognition and understanding system based on the Julius engine is also developed and embedded in order to process deictic and anaphoric utterances. The second contribution deals with a probabilistic and multi-hypothesis interpreter framework to fuse results from speech and gesture components. Such interpreter is shown to improve the classification rates of multimodal commands compared to using either modality alone. Finally, we report on successful live experiments in human-centered settings. Results are reported in the context of an interactive manipulation task, where users specify local motion commands to Jido and perform safe object exchanges.  相似文献   

14.
Traditional dialogue systems use a fixed silence threshold to detect the end of users’ turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human–computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human–computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.  相似文献   

15.
《Advanced Robotics》2013,27(15):1725-1741
In this paper, we present a wearable interaction system to enhance interaction between a human user and a humanoid robot. The wearable interaction system assists the user and enhances interaction with the robot by intuitively imitating the user motion while expressing multimodal commands to the robot and displaying multimodal sensory feedback. AMIO, the biped humanoid robot of the AIM Laboratory, was used in experiments to confirm the performance and effectiveness of the proposed system, including the overall performance of motion tracking. Through an experimental application of this system, we successfully demonstrated human and humanoid robot interactions.  相似文献   

16.

Most of today's virtual environments are populated with some kind of autonomous life - like agents . Such agents follow a preprogrammed sequence of behaviors that excludes the user as a participating entity in the virtual society . In order to make inhabited virtual reality an attractive place for information exchange and social interaction , we need to equip the autonomous agents with some perception and interpretation skills . In this paper we present one skill: human action recognition . By opposition to human - computer interfaces that focus on speech or hand gestures , we propose a full - body integration of the user . We present a model of human actions along with a real - time recognition system . To cover the bilateral aspect in human - computer interfaces , we also discuss some action response issues . In particular , we describe a motion management library that solves animation continuity and mixing problems . Finally , we illustrate our systemwith two examples and discuss what we have learned .  相似文献   

17.
This paper introduces the Neem Platform, a generic research test bed for the development of a novel class of adaptive intelligent collaborative applications. These applications provide support for groups of people working together by facilitating their communication and by reacting adaptively to the perceived contexts of an ongoing interaction. Applications in Neem explore the multimodal context of mostly human-to-human communication, by “overhearing” conversations, monitoring user actions and reacting to perceived opportunities for augmentation via intelligent system interventions.The Neem Platform is designed to facilitate the integration of functionality for capture and reification of user communicative actions over a variety of modalities (e.g. speech, text, gestures), as well as facilitate analysis and reasoning based on this reified context. The platform promotes an exploratory and evolutionary style of development by affording rapid development of components, enactment of human-controlled real-time (Wizard of Oz) experiments, and remote collection of data for off-line analysis of interactions.This paper contributes the presentation of a novel coordination mechanism and shows how this underlying mechanism can be used to implement systems that support intelligent reaction to multimodal group interactions.This work was developed at the University of Colorado at Boulder.  相似文献   

18.
One of the most important issues in developing an entertainment robot is human-robot interaction, in which the robot is expected to learn new behaviors specified by the user. In this article we present an imitation-based mechanism to support robot learning, and use evolutionary computing to learn new behavior sequences. We also propose several advanced techniques at the task level and the computational level to evolve complex sequences. To evaluate our approach, we use it to evolve different behaviors for a humanoid robot. The results show the promise of our approach.  相似文献   

19.
Gesture and speech are co-expressive and complementary channels of a single human language system. While speech carries the major load of symbolic presentation, gesture provides the imagistic content. We investigate the role of oscillatory/cyclical hand motions in ‘carrying’ this image content. We present our work on the extraction of hand motion oscillation frequencies of gestures that accompany speech. The key challenges are that such motions are characterized by non-stationary oscillations, and multiple frequencies may be simultaneously extant. Also, the duration of the oscillations may be extended over very few cycles. We apply the windowed Fourier transform and wavelet transform to detect and extract gesticulatory oscillations. We tested these against synthetic signals (stationary and non-stationary) and real data sequences of gesticulatory hand movements in natural discourse. Our results show that both filters functioned well for the synthetic signals. For the real data, the wavelet bandpass filter bank is better for detecting and extracting hand gesture oscillations. We relate the hand motion oscillatory gestures detected by wavelet analysis to speech in natural conversation and apply to multimodal language analysis. We demonstrate the ability of our algorithm to extract gesticulatory oscillations and show how oscillatory gestures reveal portions of the multimodal discourse structure.  相似文献   

20.
This paper presents an experimental study on an agent system with multimodal interfaces for a smart office environment. The agent system is based upon multimodal interfaces such as recognition modules for both speech and pen-mouse gesture, and identification modules for both face and fingerprint. For essential modules, speech recognition and synthesis were basically used for a virtual interaction between user and system. In this study, a real-time speech recognizer based on a Hidden Markov Network (HM-Net) was incorporated into the proposed system. In addition, identification techniques based on both face and fingerprint were adopted to provide a specific user with the service of a user-customized interaction with security in an office environment. In evaluation, results showed that the proposed system was easy to use and would prove useful in a smart office environment, even though the performance of the speech recognizer was not satisfactory mainly due to noisy environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号