期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems

Ramón López-Cózar Zoraida Callejas David Griol 《Knowledge》2010,23(5):471-485

This paper proposes a new technique to increase the robustness of spoken dialogue systems employing an automatic procedure that aims to correct frames incorrectly generated by the system’s component that deals with spoken language understanding. To do this the technique carries out a training that takes into account knowledge of previous system misunderstandings. The correction is transparent for the user as he is not aware of some mistakes made by the speech recogniser and thus interaction with the system can proceed more naturally. Experiments have been carried out using two spoken dialogue systems previously developed in our lab: Saplen and Viajero, which employ prompt-dependent and prompt-independent language models for speech recognition. The results obtained from 10,000 simulated dialogues show that the technique improves the performance of the two systems for both kinds of language modelling, especially for the prompt-independent language model. Using this type of model the Saplen system increases sentence understanding by 19.54%, task completion by 26.25%, word accuracy by 7.53%, and implicit recovery of speech recognition errors by 20.3%, whereas for the Viajero system these figures increase by 14.93%, 18.06%, 6.98% and 15.63%, respectively. 相似文献

2.

Combining language models in the input interface of a spoken dialogue system

R. Lpez-Czar Z. Callejas 《Computer Speech and Language》2006,20(4):420-440

This paper presents a new technique to enhance the performance of the input interface of spoken dialogue systems based on a procedure that combines during speech recognition the advantages of using prompt-dependent language models with those of using a language model independent of the prompts generated by the dialogue system. The technique proposes to create a new speech recognizer, termed contextual speech recognizer, that uses a prompt-independent language model to allow recognizing any kind of sentence permitted in the application domain, and at the same time, uses contextual information (in the form of prompt-dependent language models) to take into account that some sentences are more likely to be uttered than others at a particular moment of the dialogue. The experiments show the technique allows enhancing clearly the performance of the input interface of a previously developed dialogue system based exclusively on prompt-dependent language models. But most important, in comparison with a standard speech recognizer that uses just one prompt-independent language model without contextual information, the proposed recognizer allows increasing the word accuracy and sentence understanding rates by 4.09% and 4.19% absolute, respectively. These scores are slightly better than those obtained using linear interpolation of the prompt-independent and prompt-dependent language models used in the experiments. 相似文献

3.

汉语股票实时行情查询对话系统 总被引：1，自引：0，他引：1

张琳高峰郭荣毛家菊陆汝占《计算机应用》2004,24(7):61-63

介绍了一个用于股票实时行情查询的口语化的人机对话系统，该系统集成了语音识别、语言理解、对话控制等技术。文中定义了一个情景语义框架模型，较好地处理了口语理解系统的一些难点。相似文献

4.

基于上下文信息的口语意图检测方法

徐扬王建成刘启元李寿山《计算机科学》2020,47(1):205-211

近年来,随着人工智能的发展与智能设备的普及,人机智能对话技术得到了广泛的关注。口语语义理解是口语对话系统中的一项重要任务,而口语意图检测是口语语义理解中的关键环节。由于多轮对话中存在语义缺失、框架表示以及意图转换等复杂的语言现象,因此面向多轮对话的意图检测任务十分具有挑战性。为了解决上述难题,文中提出了基于门控机制的信息共享网络,充分利用了多轮对话中的上下文信息来提升检测性能。具体而言,首先结合字音特征构建当前轮文本和上下文文本的初始表示,以减小语音识别错误对语义表示的影响;其次,使用基于层级化注意力机制的语义编码器得到当前轮和上下文文本的深层语义表示,包含由字到句再到多轮文本的多级语义信息;最后,通过在多任务学习框架中引入门控机制来构建基于门控机制的信息共享网络,使用上下文语义信息辅助当前轮文本的意图检测。实验结果表明,所提方法能够高效地利用上下文信息来提升口语意图检测效果,在全国知识图谱与语义计算大会(CCKS2018)技术评测任务2的数据集上达到了88.1%的准确率(Acc值)和88.0%的综合正确率(F1值),相比于已有的方法显著提升了性能。相似文献

5.

Designing and Evaluating an Adaptive Spoken Dialogue System

Litman Diane J. Pan Shimei 《User Modeling and User-Adapted Interaction》2002,12(2-3):111-137

Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues. This paper presents the design and evaluation of an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Based on rules learned from a set of training dialogues, adaptive TOOT constructs a user model representing whether the user is having speech recognition problems as a particular dialogue progresses. Adaptive TOOT then automatically adapts its dialogue strategies based on this dynamically changing user model. An empirical evaluation of the system demonstrates the utility of the approach. 相似文献

6.

On the dynamic adaptation of language models based on dialogue information

J.M. Lucas-Cuesta J. Ferreiros F. Fernández-Martı´nez J.D. Echeverry S. Lutfi 《Expert systems with applications》2013,40(4):1069-1085

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to significantly improve the performance of the speech recognition, which leads to an improvement in both the language understanding and the dialogue management tasks. 相似文献

7.

Designing and evaluating a wizarded uncertainty-adaptive spoken dialogue tutoring system

Kate Forbes-Riley Diane Litman 《Computer Speech and Language》2011,25(1):105-126

We describe the design and evaluation of two different dynamic student uncertainty adaptations in wizarded versions of a spoken dialogue tutoring system. The two adaptive systems adapt to each student turn based on its uncertainty, after an unseen human “wizard” performs speech recognition and natural language understanding and annotates the turn for uncertainty. The design of our two uncertainty adaptations is based on a hypothesis in the literature that uncertainty is an “opportunity to learn”; both adaptations use additional substantive content to respond to uncertain turns, but the two adaptations vary in the complexity of these responses. The evaluation of our two uncertainty adaptations represents one of the first controlled experiments to investigate whether substantive dynamic responses to student affect can significantly improve performance in computer tutors. To our knowledge we are the first study to show that dynamically responding to uncertainty can significantly improve learning during computer tutoring. We also highlight our ongoing evaluation of our uncertainty-adaptive systems with respect to other important performance metrics, and we discuss how our corpus can be used by the wider computer speech and language community as a linguistic resource supporting further research on effective affect-adaptive spoken dialogue systems in general. 相似文献

8.

Towards situated speech understanding: visual context priming of language models

《Computer Speech and Language》2005,19(2):227-248

相似文献

9.

Error Detection in Spoken Human-Machine Interaction

E. Krahmer M. Swerts M. Theune M. Weegels 《International Journal of Speech Technology》2001,4(1):19-30

相似文献

10.

A robust system for human-machine dialogue in telephony-based applications

D. Albesano P. Baggia M. Danieli R. Gemello E. Gerbino C. Rullent 《International Journal of Speech Technology》1997,2(2):101-111

This paper presents a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions even for people with poor recognition performance. The robust behaviour of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows the system to deal with partial or total breakdowns at other levels of analysis. We report the field trial data of the system with respect to speech recognition metrics of word accuracy and sentence understanding rate, time-to-completion, time-to-acquisition of crucial parameters, and degree of success of the interactions in providing the speakers with the information they required. The evaluation data show that most of the subjects were able to interact fruitfully with the system. These results suggest that the design choices made to achieve robust behaviour are a promising way to create usable spoken language telephone systems. 相似文献

11.

基于两阶段分类的口语理解方法

吴尉林陆汝占段建勇刘慧高峰陈玉泉《计算机研究与发展》2008,45(5):861-868

口语理解是实现口语对话系统的关键技术之一.它主要面临两方面的挑战:1)稳健性,因为输入语句往往是病态的;2)可移植性,即口语理解单元应能够快速移植到新的领域和语言.提出了一种新的基于两阶段分类的口语理解方法:第1阶段为主题分类,用来识别用户输入语句的主题;第2阶段为主题相关的语义槽分类,根据识别的主题抽取相应的语义槽/值对.该方法能对用户输入语句进行深层理解,同时也能保持稳健性.它基本上是数据驱动的,而且训练数据的标记也比较容易,可方便地移植到新的领域和语言.实验分别在汉语交通查询领域和英语DARPA Communicator领域进行,结果表明了该方法的有效性. 相似文献

12.

An automatic speech recognition system for spontaneous Punjabi speech corpus

Yogesh Kumar Navdeep Singh 《International Journal of Speech Technology》2017,20(2):297-303

Automatic speech recognition is the central part of the wheel towards the natural person-to-machine interaction technique. Due to the high disparity of speaking styles, speech recognition surely demands composite methods to constitute this irregularity. A speech recognition method can work in numerous distinct states such as speaker dependent/independent speech, isolated/continuous/spontaneous speech recognition, for less to very large vocabulary. The Punjabi language is being spoken by concerning 104 million peoples in India, Pakistan and other countries with Punjabi migrants. The Punjabi language is written in Gurmukhi writing in Indian Punjab, while in Shahmukhi writing in Pakistani Punjab. In the paper, the objective is to build the speaker independent automatic spontaneous speech recognition system for the Punjabi language. The system is also capable to recognize the spontaneous Punjabi live speech. So far, no work has to be achieved in the area of spontaneous speech recognition system for the Punjabi language. The user interfaces for Punjabi live speech system is created by using the java programming. Till now, automatic speech system is trained with 6012 Punjabi words and 1433 Punjabi sentences. The performance measured in terms of recognition accuracy which is 93.79% for Punjabi words and 90.8% for Punjabi sentences. 相似文献

13.

Monaural speech separation and recognition challenge 总被引：2，自引：1，他引：1

Martin Cooke John R. Hershey Steven J. Rennie 《Computer Speech and Language》2010,24(1):1-15

Robust speech recognition in everyday conditions requires the solution to a number of challenging problems, not least the ability to handle multiple sound sources. The specific case of speech recognition in the presence of a competing talker has been studied for several decades, resulting in a number of quite distinct algorithmic solutions whose focus ranges from modeling both target and competing speech to speech separation using auditory grouping principles. The purpose of the monaural speech separation and recognition challenge was to permit a large-scale comparison of techniques for the competing talker problem. The task was to identify keywords in sentences spoken by a target talker when mixed into a single channel with a background talker speaking similar sentences. Ten independent sets of results were contributed, alongside a baseline recognition system. Performance was evaluated using common training and test data and common metrics. Listeners’ performance in the same task was also measured. This paper describes the challenge problem, compares the performance of the contributed algorithms, and discusses the factors which distinguish the systems. One highlight of the comparison was the finding that several systems achieved near-human performance in some conditions, and one out-performed listeners overall. 相似文献

14.

基于深度学习的口语理解联合建模算法综述

魏鹏飞曾碧汪明慧曾安《软件学报》2022,33(11):4192-4216

口语理解是自然语言处理领域的研究热点之一,应用在个人助理、智能客服、人机对话、医疗等多个领域.口语理解技术指的是将机器接收到的用户输入的自然语言转换为语义表示,主要包含意图识别、槽位填充这两个子任务.现阶段,使用深度学习对口语理解中意图识别和槽位填充任务的联合建模方法已成为主流,并且获得了很好的效果.因此,对基于深度学习的口语理解联合建模算法进行总结分析具有十分重要的意义.首先介绍了深度学习技术应用到口语理解的相关工作,然后从意图识别和槽位填充的关联关系对现有的研究工作进行剖析,并对不同模型的实验结果进行了对比分析和总结,最后给出了未来的研究方向及展望. 相似文献

15.

Efficient data selection for speech recognition based on prior confidence estimation using speech and monophone models

《Computer Speech and Language》2014,28(6):1287-1297

This paper proposes an efficient speech data selection technique that can identify those data that will be well recognized. Conventional confidence measure techniques can also identify well-recognized speech data. However, those techniques require a lot of computation time for speech recognition processing to estimate confidence scores. Speech data with low confidence should not go through the time-consuming recognition process since they will yield erroneous spoken documents that will eventually be rejected. The proposed technique can select the speech data that will be acceptable for speech recognition applications. It rapidly selects speech data with high prior confidence based on acoustic likelihood values and using only speech and monophone models. Experiments show that the proposed confidence estimation technique is over 50 times faster than the conventional posterior confidence measure while providing equivalent data selection performance for speech recognition and spoken document retrieval. 相似文献

16.

智能电话查询系统 总被引：1，自引：0，他引：1

蔡云轶王晓东陆汝占《计算机工程》2002,28(6):164-165,215

介绍了一个用于交通信息查询的交互式对话系统-ShanghaiQuest。该系统集成了语音识别、文语转换和自然语言理解等技术，用户可以同这样一个自动的交通代理对话以获取所需的交通信息。相似文献

17.

What should your speech system say?

Bernsen N.O. Dybkjaer H. Dybkjaer L. 《Computer》1997,30(12):25-31

Even as telephone based spoken language dialogue systems (SLDS) are becoming commercially available, developers can benefit from guidelines designed to help remove dialogue problems as early as possible in the design life cycle. SLDS designers generally rely on a Wizard of Oz (WOZ) simulation technique to ensure that the system's dialogue facilitates user interaction as much as possible. In a WOZ simulation, users are made to believe they are interacting with a real system, when in fact they are interacting with a hidden researcher. These researchers record, transcribe, and analyze the dialogues, and then use the results to improve the dialogue in the SLDS being developed. Using current methods, dialogue designers must be both very careful and very lucky, or interaction problems will remain during the implementation and testing stages. We have found that a sound, comprehensive set of dialogue design guidelines is an effective tool to support systematic development and evaluation during early SLDS design. We believe guidelines could significantly reduce development time by reducing the need for lengthy WOZ experimentation, controlled user testing, and field trial cycles 相似文献

18.

Segment-based emotion recognition from continuous Mandarin Chinese speech

Jun-Heng Yeh Tsang-Long PaoChing-Yi Lin Yao-Wei TsaiYu-Te Chen 《Computers in human behavior》2011,27(5):1545-1552

Recognition of emotion in speech has recently matured to one of the key disciplines in speech analysis serving next generation human-machine interaction and communication. However, compared to automatic speech recognition, that emotion recognition from an isolated word or a phrase is inappropriate for conversation. Because a complete emotional expression may stride across several sentences, and may fetch-up on any word in dialogue. In this paper, we present a segment-based emotion recognition approach to continuous Mandarin Chinese speech. In this proposed approach, the unit for recognition is not a phrase or a sentence but an emotional expression in dialogue. To that end, the following procedures are presented: First, we evaluate the performance of several classifiers in short sentence speech emotion recognition architectures. The results of the experiments show that the WD-KNN classifier achieves the best accuracy for the 5-class emotion recognition what among the five classification techniques. We then implemented a continuous Mandarin Chinese speech emotion recognition system with an emotion radar chart which is based on WD-KNN; this system can represent the intensity of each emotion component in speech. This proposed approach shows how emotions can be recognized by speech signals, and in turn how emotional states can be visualized. 相似文献

19.

Is talking to an automated teller machine natural and fun?

《Ergonomics》2012,55(13-14):1386-1407

Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use. 相似文献

20.

Two-level speech recognition to enhance the performance of spoken dialogue systems

《Knowledge》2006,19(3):153-163

Spoken dialogue systems can be considered knowledge-based systems designed to interact with users using speech in order to provide information or carry out simple tasks. Current systems are restricted to well-known domains that provide knowledge about the words and sentences the users will likely utter. Basically, these systems rely on an input interface comprised of speech recogniser and semantic analyser, a dialogue manager, and an output interface comprised of response generator and speech synthesiser. As an attempt to enhance the performance of the input interface, this paper proposes a technique based on a new type of speech recogniser comprised of two modules. The first one is a standard speech recogniser that receives the sentence uttered by the user and generates a graph of words. The second module analyses the graph and produces the recognised sentence using the context knowledge provided by the current prompt of the system. We evaluated the performance of two input interfaces working in a previously developed dialogue system: the original interface of the system and a new one that features the proposed technique. The experimental results show that when the sentences uttered by the users are out-of-context analysed by the new interface, the word accuracy and sentence understanding rates increase by 93.71 and 77.42% absolute, respectively, regarding the original interface. The price to pay for this clear enhancement is a little reduction in the scores when the new interface analyses sentences in-context, as they decrease by 2.05 and 3.41% absolute, respectively, in comparison with the original interface. Given that in real dialogues sentences may be out-of-context analysed, specially when they are uttered by inexperienced users, the technique can be very useful to enhance the system performance. 相似文献