首页 | 官方网站   微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Computer recognition of unconstrained handwritten numerals   总被引:13,自引:0,他引:13  
Four independently, developed expert algorithms for recognizing unconstrained handwritten numerals are presented. All have high recognition rates. Different experimental approaches for incorporating these recognition methods into a more powerful system are also presented. The resulting multiple-expert system proves that the consensus of these methods tends to compensate for individual weaknesses, while preserving individual strengths. It is shown that it is possible to reduce the substitution rate to a desired level while maintaining a fairly high recognition rate in the classification of totally unconstrained handwritten ZIP code numerals. If reliability is of the utmost importance, substitutions can be avoided completely (reliability=100%) while retaining a recognition rate above 90%. Results are compared with those for some of the most effective numeral recognition systems found in the literature  相似文献   

In the past decade, the performance of spoken language understanding systems has improved dramatically, including speech recognition, dialog systems, speech summarization, and text and speech translation. This has resulted in an increasingly widespread use of speech and language technologies in a wide variety of applications. With more than 6,900 languages in the world and the current trend of globalization, one of the most important challenges in spoken language technologies today is the need to support multiple input and output languages, especially if applications are intended for international markets, linguistically diverse user communities, and nonnative speakers. In many cases these applications have to support even multiple languages simultaneously to meet the needs of a multicultural society. Consequently, new algorithms and tools are required that support the simultaneous recognition of mixed-language input, the summarization of multilingual text and spoken documents, the generation of output in the appropriate language, or the accurate translation from one language to another. This article surveys significant ongoing research programs as well as trends, prognoses, and open research issues with a special emphasis on multilingual speech processing as described in detail in the work of Schultz and Hirschberg (2006) and multilingual language processing as presented in the work of Fung (2006).  相似文献   

Content-based access to spoken audio   总被引:2,自引:0,他引:2  
This article describes approaches to content-based access to spoken audio with a qualitative and tutorial emphasis. We describe how the analysis, retrieval, and delivery phases contribute to making spoken audio content more accessible and outline outstanding research issues. We also discuss the main application domains and identify important issues for future developments. The structure of the article is based on the general system architecture for content-based access. Although the tasks within each processing stage may appear unconnected, the interdependencies and the sequence with which they take place vary.  相似文献   

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on one's own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.  相似文献   

Networks of adaptive-logic circuits which recognise patterns after being `taught? by an operator have been studied. The letter describes the way in which the performance of such networks may be estimated from a knowledge of the statistical characteristics of the patterns. Results obtained with hand-printed numerals are discussed.  相似文献   

Spoken language translation (SLT) is of great relevance in our increasingly globalized world, both from a social and economic point of view. It is one of the major challenges in automatic speech recognition (ASR) and machine translation (MT), driving an intense research activity in these areas. Speech translation is useful to assist person-to-person communication in limited domains like tourism and traveling and to translate foreign parliamentary speeches and broadcast news. Speech translation is based on a suitable combination of two independent technologies, namely ASR and MT of written language. Thus, the important question is how to pass on the ASR ambiguities to the MT process. A unifying framework for this ASR-MT interface is provided by applying the Bayes decision rule to the speech translation tasks as whole rather than to each task individually. Depending on the MT approaches used, such as finite-state transducers or phrase-based modeling, various types of ASR-MT interfaces have been studied, ranging from N-best lists through word lattices to confusion networks. We have discussed experimental results on various tasks, ranging from limited to unrestricted domains. Despite the significant advances and the large number of experimental studies, it is still an open question what type of interface provides a suitable compromise between translation accuracy and computational cost.  相似文献   

Progress in both speech and language processing has spurred efforts to support applications that rely on spoken rather than written language input. A key challenge in moving from text-based documents to such spoken documents is that spoken language lacks explicit punctuation and formatting, which can be crucial for good performance. This article describes different levels of speech segmentation, approaches to automatically recovering segment boundary locations, and experimental results demonstrating impact on several language processing tasks. The results also show a need for optimizing segmentation for the end task rather than independently.  相似文献   

Automatic emotion recognition from speech signals without linguistic cues has been an important emerging research area. Integrating emotions in human–computer interaction is of great importance to effectively simulate real life scenarios. Research has been focusing on recognizing emotions from acted speech while little work was done on natural real life utterances. English, French, German and Chinese corpora were used for that purpose while no natural Arabic corpus was found to date. In this paper, emotion recognition in Arabic spoken data is studied for the first time. A realistic speech corpus from Arabic TV shows is collected. The videos are labeled by their perceived emotions; namely happy, angry or surprised. Prosodic features are extracted and thirty-five classification methods are applied. Results are analyzed in this paper and conclusions and future recommendations are identified.  相似文献   

Oviatt  S. 《Multimedia, IEEE》1996,3(4):26-35
By modeling difficult sources of linguistic variability in speech and language, we can design interfaces that transparently guide human input to match system processing capabilities. Such work will yield more user centered and robust interfaces for next generation spoken language and multimodal systems  相似文献   

Building modern speech and language systems currently requires large data resources such as texts, voice recordings, pronunciation lexicons, morphological decomposition information and parsing grammars. Based on a study of the most important differences between language groups, we introduce approaches to efficiently deal with the enormous task of covering even a small percentage of the world's languages. For speech recognition, we have reduced the resource requirements by applying acoustic model combination, bootstrapping and adaption techniques. Similar algorithms have been applied to improve the recognition of foreign accents. Segmenting language into appropriate units reduces the amount of data required to robustly estimate statistical models. The underlying morphological principles are also used to automatically adapt the coverage of our speech recognition dictionaries with the Hypothesis-Driven Lexical Adaptation (HDLA) algorithm. This reduces the out-of-vocabulary problems encountered in agglutinative languages. Speech recognition results are reported for the read GlobalPhone database and some broadcast news data. For speech translation, using a task-oriented Interlingua allows to build a system with N languages with linear, rather than quadratic effort. We have introduced a modular grammar design to maximize reusability and portability. End-to-end translation results are reported on a travel-domain task in the framework of C-STAR  相似文献   

This work presents an embedded Arabic OCR system. The proposed system is compact and portable which make it useful for many applications such as blind assistance and language translation. OCR system consists of the sub-systems: image acquisition, pre-processing, segmentation, feature extraction, classification, and post- processing. For each sub-system there are several of algorithms and techniques to be implemented. Working with PCs gives the designer freedom to select the algorithms and techniques according to the required performance, reliability and reusability. However with the embedded systems we are facing many problems and challenges. Such challenges are associated with memory, speed, and computational power. FPGA is selected as the hardware platform for realizing that recognition task. An OCR system is designed and implemented on PC. Then this system is transferred to FPGA after a set of optimization procedures. Utilizing the features of FPGA technology, Hardware / Software co-design is accomplished on an FPGA board. In that design the systems is partitioned into software modules and hardware components to get the advantages of software flexibility and hardware speed. A database of 3000 Arabic characters is used to train and test the performance of the system. The effects of changing the number of features and classification parameters on accuracy, memory and speed are measured. Design points are selected in order to improve the memory required, speed and computation power without affecting the accuracy.  相似文献   

张蕤  孙甲松 《信息技术》2016,(4):92-95,104
针对语音识别错误导致口语理解系统性能下降的问题,提出一种易于训练且解码快速的鉴别式口语理解方法。首先为每个语义要素建立一个二类逻辑回归模型,随后根据领域中的限制关系建立联合概率模型。在英语公开数据集DSTC2上的实验结果表明,该方法优于人工规则方法和语义元组分类器模型。  相似文献   

针对传统的英语发音自动校对系统中语音识别混乱的问题,设计一种英语口语自动发音校对系统。引进灭错计算进行语音的识别校对,通过灭错计算的语音信息能够进行高阶识别,避免传统的识别校对方法中出现的数据进阶误差,同时优化了反馈控制系统,提高系统的识别语音的能力。为了验证所设计的英语口语自动发音校对系统的有效性,设计了对比仿真试验。试验数据表明,设计的英语口语自动发音校对系统能够有效地解决语音识别混乱问题。  相似文献   

A performance evaluation of sound recognition techniques in recognizing some spoken Arabic words, namely digits from zero to nine, is proposed. One of the main characteristics of aU Arabic digits is polysyllabic words except for zero. The performance analysis is based on different features of phonetic isolated Arabic digits. The main aim of this paper is to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on three recognition features: the Yule-Walker spectrum features, the Walsh spectrum features, and the Mel frequency Cepstral coefficients (MFCC) features. The MFCC based recognition system achieves the best average correct recognition. On the other hand, the Yule-Walker based recognition system achieves the worst average correct recognition.  相似文献   

曹建凯  张连海 《信号处理》2017,33(5):703-710
提出一种基于层级狄利克雷过程隐马尔科夫模型(HDPHMM)符号化器的无监督语音查询样例检测(QbE-STD)方法。该方法首先应用一个双状态层隐马尔科夫模型,其中顶层状态用于表示所发现的声学单元,底层状态用于建模顶层状态的发射概率,通过对顶层状态假设一个层级狄利克雷过程先验,获得非参贝叶斯模型HDPHMM。使用无标注语音数据对该模型进行训练,然后对测试语音和查询样例输出后验概率特征矢量,使用非负矩阵分解算法对后验概率进行优化得到新的特征,然后在此基础上,应用修正分段动态时间规整算法进行检索,构成QbE-STD系统。实验结果表明,相比于基于高斯混合模型符号化器的基线系统,本文所提出的方法性能更优,检索精度得到显著提升。   相似文献   

用自定义函数将阿拉伯数字表示的数值转换成汉字的数值或人民币金额,程序以函数的形式使用,实用性强且便于移值。  相似文献   

The first- second- and third-order entropies of written Arabic text are calculated. The calculated value of the third-order entropy shows that a redundancy of more than 50 percent is exhibited by the text.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号