期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王恒升李熙印《计算机工程与科学》2017,39(8):1538-1545

提出了一种基于层叠条件随机场进行救灾机器人自然语言导航命令理解的方法。该方法由三层条件随机场(CRFs)构成:第一层用于导航词性标注,选取词、词性以及上下文作为特征模板生成导航词性标签;第二层用于导航过程提取,选择词、导航词性标签以及上下文构建特征模板生成导航过程标签;第三层用于起点终点识别,选取词、导航词性标签、导航过程标签以及上下文构建特征模板判断出地名词为起点还是终点。根据导航词性与导航要素的对应关系便可从命令中提取出导航信息。该方法能够处理完全不受限的自然语言导航命令,总体正确率达到78.6%,无需依赖特定的指令与地图,对完成救灾机器人导航的人机交互任务具有重要意义。相似文献

2.

采用自然语言的移动机器人任务编程

聂仙丽蒋平陈辉堂《机器人》2003,25(4):308-312

本文在机器人具备基本运动技能的基础上［1］，采用基于指令教导的学习方法．通过自然语言教会机器人完成抽象化任务，并以程序体方式保存所学知识，也即通过自然语言对话自动生成程序流．通过让机器人完成导航等任务，验证所提自然语言编程方法的可行性．相似文献

3.

基于智能语音的翻译机器人自动化控制系统设计

下载免费PDF全文

杨维秦波涛《计算机测量与控制》2024,32(5):102-108

为提升自动控制效果,加快翻译速率,设计基于智能语音的翻译机器人自动化控制系统。采集外界智能语音信号,利用A/D转换器得到数字信号,启动语音唤醒模块激活翻译机器人,听写模式识别复杂语音信号,命令模式识别简单语音信号,得到语言文本识别结果,通过深度学习关键词检测方法提取关键词作为翻译机器人的自动化控制指令,通过单片机识别自动化控制指令。实验结果表明,该系统可有效采集外界智能语音信号,提取智能语音信号的关键词,完成翻译机器人自动化控制。相似文献

4.

基于混合深度学习的多模态场景指令分类方法

吴桂玲《计算机应用与软件》2022,39(1):176-180,187

为提高家庭服务机器人指令中目标对象预测的准确率,提出一种基于混合深度学习的多模态自然语言理处理(Natural Language Processing,NLP)指令分类方法.该方法从语言特征、视觉特征和关系特征多模态入手,采用两种深度学习方法分别以多模态特征进行编码.对于语言指令,采用多层双向长短期记忆(Bi-LSTM... 相似文献

5.

A Robust Speech Recognition System for Communication Robots in Noisy Environments

Ishi C.T. Matsuda S. Kanda T. Jitsuhiro T. Ishiguro H. Nakamura S. Hagita N. 《Robotics, IEEE Transactions on》2008,24(3):759-763

The application range of communication robots could be widely expanded by the use of automatic speech recognition (ASR) systems with improved robustness for noise and for speakers of different ages. In past researches, several modules have been proposed and evaluated for improving the robustness of ASR systems in noisy environments. However, this performance might be degraded when applied to robots, due to problems caused by distant speech and the robot's own noise. In this paper, we implemented the individual modules in a humanoid robot, and evaluated the ASR performance in a real-world noisy environment for adults' and children's speech. The performance of each module was verified by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80% word accuracy in 70-dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73% word accuracy. 相似文献

6.

Mobile robot programming using natural language 总被引：3，自引：0，他引：3

Stanislao Guido Theocharis Ewan 《Robotics and Autonomous Systems》2002,38(3-4):171-181

How will naive users program domestic robots? This paper describes the design of a practical system that uses natural language to teach a vision-based robot how to navigate in a miniature town. To enable unconstrained speech the robot is provided with a set of primitive procedures derived from a corpus of route instructions. When the user refers to a route that is not known to the robot, the system will learn it by combining primitives as instructed by the user. This paper describes the components of the Instruction-Based Learning architecture and discusses issues of knowledge representation, the selection of primitives and the conversion of natural language into robot-understandable procedures. 相似文献

7.

基于非单调共轭梯度算法的声纹识别机器人控制系统设计

下载免费PDF全文

吴俊杰《计算机测量与控制》2020,28(1):116-119

传统声纹识别人控制系统识别准确率低,存在语音识别噪声鲁棒性问题。针对上述问题,基于非单调共轭梯度算法设计了一种新的声纹识别机器人控制系统,采用BioVoice 2.0 标准声纹采集器采集数据,提取声纹特征,根据提取的声纹特征建立模型库,同时引用了两个声纹数据采集终端,型号分别是TMC104-B和TMC104,选用型号为AS-MrobotR的机器人配合采集器和采集终端实现工作。在Windows平台下使用C/C+语言研究了一种专用的程序,在程序内部添加mde-api数据库,完成训练程序和识别程序。实验结果表明,基于非单调共轭梯度算法的声纹识别机器人控制系统能够很好地解决语音识别噪声鲁棒性问题,在有噪声环境下识别准确率提高15.24%,在无噪声环境下识别准确率提高21.55%。相似文献

8.

Teaching a pet-robot to understand user feedback through interactive virtual training tasks

Anja Austermann Seiji Yamada 《Autonomous Agents and Multi-Agent Systems》2010,20(1):85-104

In this paper, we present a human-robot teaching framework that uses “virtual” games as a means for adapting a robot to its user through natural interaction in a controlled environment. We present an experimental study in which participants instruct an AIBO pet robot while playing different games together on a computer generated playfield. By playing the games and receiving instruction and feedback from its user, the robot learns to understand the user’s typical way of giving multimodal positive and negative feedback. The games are designed in such a way that the robot can reliably predict positive or negative feedback based on the game state and explore its user’s reward behavior by making good or bad moves. We implemented a two-staged learning method combining Hidden Markov Models and a mathematical model of classical conditioning to learn how to discriminate between positive and negative feedback. The system combines multimodal speech and touch input for reliable recognition. After finishing the training, the system was able to recognize positive and negative reward with an average accuracy of 90.33%. 相似文献

9.

基于云平台的智能语音交互机器人设计

何松黄维吴昔遥周曾豪杨东泽《软件工程》2021,(4)

现有的语音交互机器人多采用用户提问、机器人回答的单向交流方式,人机交互的智能性和灵活性较差。本文研究运用树莓派(Raspberry Pi)计算机和配套的语音板作为硬件载体,融合语音唤醒、语音识别、语音合成、自然语言处理等人工智能技术,调用科大讯飞开放云平台、在线图灵机器人,搭建一种基于云平台的智能语音交互机器人系统,并结合自主开发的本地知识库和问题库,使智能语音交互机器人能够根据不同环境与任务需求实现双向互动交流,实现由机器人采集信息和交流反馈,以提供高适应性的无接触人机语音交互服务。相似文献

10.

A voice-activated robot with artificial intelligence

Dinesh P. Mital Goh Wee Leng 《Robotics and Autonomous Systems》1989,4(4):339-344

In this paper, a voice activated robot arm with intelligence is presented. The robot arm is controlled with natural connected speech input. The language input allows a user to interact with the robot in terms which are familiar to most people. The advantages of speech activated robots are hands-free and fast data input operations. The proposed robot is capable of understanding the meaning of natural language commands. After interpreting the voice commands a series of control data for performing a tasks are generated. Finally the robot actually performs the task. Artificial Intelligence techniques are used to make the robot understand voice commands and act in the desired mode. It is also possible to control the robot using the keyboard input mode. 相似文献

11.

Sparse imputation for large vocabulary noise robust ASR

Jort Florent Gemmeke Bert Cranen Ulpu Remes 《Computer Speech and Language》2011,25(2):462-479

An effective way to increase noise robustness in automatic speech recognition is to label the noisy speech features as either reliable or unreliable (‘missing’), and replace (‘impute’) the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNRs, frame-based imputation techniques fail because many time frames contain few, if any, reliable features. In previous work, we introduced an exemplar-based method, dubbed sparse imputation, which can impute missing features using reliable features from neighbouring frames. We achieved substantial gains in performance at low SNRs for a connected digit recognition task. In this work, we investigate whether the exemplar-based approach can be generalised to a large vocabulary task.Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal ‘oracle’ reliability of features is used. With error-prone estimates of feature reliability, sparse imputation performance is comparable to our baseline imputation technique in the cleanest conditions, and substantially better at lower SNRs. With noisy speech recorded in realistic noise conditions, sparse imputation performs slightly worse than our baseline imputation technique in the cleanest conditions, but substantially better in the noisier conditions. 相似文献

12.

A two-dimensional facial-affect estimation system for human–robot interaction using facial expression parameters

David Schacter Christopher Wang Beno Benhabib 《Advanced Robotics》2013,27(4):259-273

A social robot should be able to autonomously interpret human affect and adapt its behavior accordingly in order for successful social human–robot interaction to take place. This paper presents a modular non-contact automated affect-estimation system that employs support vector regression over a set of novel facial expression parameters to estimate a person’s affective states using a valence-arousal two-dimensional model of affect. The proposed system captures complex and ambiguous emotions that are prevalent in real-world scenarios by utilizing a continuous two-dimensional model, rather than a traditional discrete categorical model for affect. As the goal is to incorporate this recognition system in robots, real-time estimation of spontaneous natural facial expressions in response to environmental and interactive stimuli is an objective. The proposed system can be combined with affect detection techniques using other modes, such as speech, body language and/or physiological signals, etc., in order to develop an accurate multi-modal affect estimation system for social HRI applications. Experiments presented herein demonstrate the system’s ability to successfully estimate the affect of a diverse group of unknown individuals exhibiting spontaneous natural facial expressions. 相似文献

13.

An intelligent framework for simulating robot-assisted surgical operations

《Expert systems with applications》2005,28(3):425-433

In this paper, a system based on several intelligent techniques, including speech recognition, natural language processing and linear planning is described. These techniques have been employed to generate a sequence of operations understandable by the control system of a robot that is to perform a semi-automatic surgical task.Thus, a system has been implemented that translates some surgeon's ‘natural’ language into robot-executable commands. A robotic simulator has then been implemented in order to test the planned sequence in a virtual environment. 相似文献

14.

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

Zied Sakka Elhem Techini MedSalim Bouhlel 《International Journal of Speech Technology》2017,20(3):645-650

Noise robustness and Arabic language are still considered as the main challenges for speech recognition over mobile environments. This paper contributed to these trends by proposing a new robust Distributed Speech Recognition (DSR) system for Arabic language. A speech enhancement algorithm was applied to the noisy speech as a robust front-end pre-processing stage to improve the recognition performance. While an isolated Arabic word engine was designed, and developed using HMM Model to perform the recognition process at the back-end. To test the engine, several conditions including clean, noisy and enhanced noisy speech were investigated together with speaker dependent and speaker independent tasks. With the experiments carried out on noisy database, multi-condition training outperforms the clean training mode in all noise types in terms of recognition rate. The results also indicate that using the enhancement method increases the DSR accuracy of our system under severe noisy conditions especially at low SNR down to 10 dB. 相似文献

15.

Perception of own and robot engagement in human–robot interactions and their dependence on robotics knowledge

《Robotics and Autonomous Systems》2014,62(3):392-399

Communication between socially assistive robots and humans might be facilitated by intuitively understandable mechanisms. To investigate the effects of some key nonverbal gestures on a human’s own engagement and robot engagement experienced by humans, participants read a series of instructions to a robot that responded with nods, blinks, changes in gaze direction, or a combination of these. Unbeknown to the participants, the robot had no form of speech processing or gesture recognition, but simply measured speech volume levels, responding with gestures whenever a lull in sound was detected. As measured by visual analogue scales, engagement of participants was not differentially affected by the different responses of the robot. However, their perception of the robot’s engagement in the task, its likability and its understanding of the instructions depended on the gesture presented, with nodding being the most effective response. Participants who self-reported greater robotics knowledge reported higher overall engagement and greater success at developing a relationship with the robot. However, self-reported robotics knowledge did not differentially affect the impact of robot gestures. This suggests that greater familiarity with robotics may help to maximise positive experiences for humans involved in human–robot interactions without affecting the impact of the type of signal sent by the robot. 相似文献

16.

Communicating emotions and mental states to robots in a real time parallel framework using Laban movement analysis

Tino Lourens Roos van Berkel Emilia Barakova 《Robotics and Autonomous Systems》2010,58(12):1256-1265

This paper presents a parallel real time framework for emotions and mental states extraction and recognition from video fragments of human movements. In the experimental setup human hands are tracked by evaluation of moving skin-colored objects. The tracking analysis demonstrates that acceleration and frequency characteristics of the traced objects are relevant for classification of the emotional expressiveness of human movements. The outcomes of the emotional and mental states recognition are cross-validated with the analysis of two independent certified movement analysts (CMA’s) who use the Laban movement analysis (LMA) method. We argue that LMA based computer analysis can serve as a common language for expressing and interpreting emotional movements between robots and humans, and in that way it resembles the common coding principle between action and perception by humans and primates that is embodied by the mirror neuron system. The solution is part of a larger project on interaction between a human and a humanoid robot with the aim of training social behavioral skills to autistic children with robots acting in a natural environment. 相似文献

17.

Target Speech Detection and Separation for Communication with Humanoid Robots in Noisy Home Environments

《Advanced Robotics》2013,27(15):2093-2111

People usually talk face to face when they communicate with their partner. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise ratio (SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased by about 17 points. 相似文献

18.

Human and computer recognition of regional accents and ethnic groups from British English speech

A. Hanani M.J. Russell M.J. Carey 《Computer Speech and Language》2013,27(1):59-74

The paralinguistic information in a speech signal includes clues to the geographical and social background of the speaker. This paper is concerned with automatic extraction of this information from a short segment of speech. A state-of-the-art language identification (LID) system is applied to the problems of regional accent recognition for British English, and ethnic group recognition within a particular accent. We compare the results with human performance and, for accent recognition, the ‘text dependent’ ACCDIST accent recognition measure. For the 14 regional accents of British English in the ABI-1 corpus (good quality read speech), our LID system achieves a recognition accuracy of 89.6%, compared with 95.18% for our best ACCDIST-based system and 58.24% for human listeners. The “Voices across Birmingham” corpus contains significant amounts of telephone conversational speech for the two largest ethnic groups in the city of Birmingham (UK), namely the ‘Asian’ and ‘White’ communities. Our LID system distinguishes between these two groups with an accuracy of 96.51% compared with 90.24% for human listeners. Although direct comparison is difficult, it seems that our LID system performs much better on the standard 12 class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the 14 class regional accent recognition task. We conclude that automatic accent recognition is a challenging task for speech technology, and speculate that the use of natural conversational speech may be advantageous for these types of paralinguistic task. 相似文献

19.

Speech understanding and speech translation by maximum a-posteriori semantic decoding

《Artificial Intelligence in Engineering》1999,13(4):373-384

This paper describes a domain-limited system for speech understanding as well as for speech translation. An integrated semantic decoder directly converts the preprocessed speech signal into its semantic representation by a maximum a-posteriori classification. With the combination of probabilistic knowledge on acoustic, phonetic, syntactic, and semantic levels, the semantic decoder extracts the most probable meaning of the utterance. No separate speech recognition stage is needed because of the integration of the Viterbi-algorithm (calculating acoustic probabilities by the use of Hidden-Markov-Models) and a probabilistic chart parser (calculating semantic and syntactic probabilities by special models). The semantic structure is introduced as a representation of an utterance's meaning. It can be used as an intermediate level for a succeeding intention decoder (within a speech understanding system for the control of a running application by spoken inputs) as well as an interlingua-level for a succeeding language production unit (within an automatic speech translation system for the creation of spoken output in another language). Following the above principles and using the respective algorithms, speech understanding and speech translating front-ends for the domains ‘graphic editor’, ‘service robot’, ‘medical image visualisation’ and ‘scheduling dialogues’ could be successfully realised. 相似文献

20.

LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity

A. Arunkumar S. Sharma R. Agrawal S. Chandrasekaran C. Bryan 《Computer Graphics Forum》2023,42(3):409-421

Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing ‘bias’ in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains. 相似文献