首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexicons, two of the most important written language resources for the development of ASR (automatic speech recognition) and TTS (text-to-speech) systems. The presented architecture is modular and is particularly suitable for the development of written language resources for inflectional languages. In this paper an implementation is presented for the Slovenian language. The integrated graphic user interface focuses on the morphological and phonetic aspects of language and allows experts to produce good performances during analysis. In multilingual TTS systems, many extensive external written language resources are used, especially in the text processing part. It is very important, therefore, that representation of these resources is time and space efficient. It is also very important that language resources for new languages can be easily incorporated into the system, without modifying the common algorithms developed for multiple languages. In this regard the use of large external language resources (e.g., morphology and phonetic lexicons) represent an important problem because of the required space and slow look-up time. This paper presents a method and its results for compiling large lexicons, using examples for compiling German phonetic and morphology lexicons (CISLEX), and Slovenian phonetic (SIflex) and morphology (SImlex) lexicons, into corresponding finite-state transducers (FSTs). The German lexicons consisted of about 300,000 words, SIflex consisted of about 60,000 and SImlex of about 600,000 words (where 40,000 words were used for representation using finite-state transducers). Representation of large lexicons using finite-state transducers is mainly motivated by considerations of space and time efficiency. A great reduction in size and optimal access time was achieved for all lexicons. The starting size for the German phonetic lexicon was 12.53 MB and 18.49 MB for the morphology lexicon. The starting size for the Slovenian phonetic lexicon was 1.8 MB and 1.4 MB for the morphology lexicon. The final size of the corresponding FSTs was 2.78 MB for the German phonetic lexicon, 6.33 MB for the German morphology lexicon, 253 KB for SIflex and 662 KB for the SImlex lexicon. The achieved look-up time is optimal, since it only depends on the length of the input word and not on the size of the lexicon. Integration of lexicons for new languages into the multilingual TTS system is easy when using such representations and does not require any changes in the algorithms used for such lexicons.  相似文献   

2.
The sentiment analysis (SA) applications are becoming popular among the individuals and organizations for gathering and analysing user's sentiments about products, services, policies, and current affairs. Due to the availability of a wide range of English lexical resources, such as part‐of‐speech taggers, parsers, and polarity lexicons, development of sophisticated SA applications for the English language has attracted many researchers. Although there have been efforts for creating polarity lexicons in non‐English languages such as Urdu, they suffer from many deficiencies, such as lack of publically available sentiment lexicons with a proper scoring mechanism of opinion words and modifiers. In this work, we present a word‐level translation scheme for creating a first comprehensive Urdu polarity resource: “Urdu Lexicon” using a merger of existing resources: list of English opinion words, SentiWordNet, English–Urdu bilingual dictionary, and a collection of Urdu modifiers. We assign two polarity scores, positive and negative, to each Urdu opinion word. Moreover, modifiers are collected, classified, and tagged with proper polarity scores. We also perform an extrinsic evaluation in terms of subjectivity detection and sentiment classification, and the evaluation results show that the polarity scores assigned by this technique are more accurate than the baseline methods.  相似文献   

3.
4.
This paper describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language-independent representation called lexical conceptual structure (LCS). A primary goal of the LCS research is to demonstrate that synonymous verb senses share distributional patterns. We show how the syntax–semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. We start by describing the structure of the LCS and showing how this representation is used in FLT and MT. We then focus on the problem of building LCS dictionaries for large-scale FLT and MT. First, we describe authoring tools for manual and semi-automatic construction of LCS dictionaries; we then present a more sophisticated approach that uses linguistic techniques for building word definitions automatically. These techniques have been implemented as part of a set of lexicon-development tools used in the milt FLT project.  相似文献   

5.
基于对普通语音语料库构建方法的研究与分析,结合自然口语语音识别研究相关需求以及藏语自然口语语音的基本特点,研究设计了适用于藏语语音识别的口语语音语料库建设方案以及相应的标注规范,并据此构建了时长50小时,包含音素、半音节、音节、藏文字以及语句共5层标注信息的藏语拉萨话口语语音语料库。统计结果显示,该语料库在保留口语语音自然属性的同时,对音素、半音节等常用语音建模单元也有均衡的覆盖,为基于藏语口语语音数据的语音识别技术研究提供了可靠的数据支撑。  相似文献   

6.
近年来,随着人工智能的发展与智能设备的普及,人机智能对话技术得到了广泛的关注。口语语义理解是口语对话系统中的一项重要任务,而口语意图检测是口语语义理解中的关键环节。由于多轮对话中存在语义缺失、框架表示以及意图转换等复杂的语言现象,因此面向多轮对话的意图检测任务十分具有挑战性。为了解决上述难题,文中提出了基于门控机制的信息共享网络,充分利用了多轮对话中的上下文信息来提升检测性能。具体而言,首先结合字音特征构建当前轮文本和上下文文本的初始表示,以减小语音识别错误对语义表示的影响;其次,使用基于层级化注意力机制的语义编码器得到当前轮和上下文文本的深层语义表示,包含由字到句再到多轮文本的多级语义信息;最后,通过在多任务学习框架中引入门控机制来构建基于门控机制的信息共享网络,使用上下文语义信息辅助当前轮文本的意图检测。实验结果表明,所提方法能够高效地利用上下文信息来提升口语意图检测效果,在全国知识图谱与语义计算大会(CCKS2018)技术评测任务2的数据集上达到了88.1%的准确率(Acc值)和88.0%的综合正确率(F1值),相比于已有的方法显著提升了性能。  相似文献   

7.
8.
9.
This paper presents algorithms for generating targeted name lists for candidate out-of-vocabulary (OOV) words for applications in language processing, particularly speech recognition. Focusing on names, which are shown to be the dominant class of OOVs in news broadcasts, the approach involves offline generation of a large name list and online pruning based on a phonetic distance. The resulting list can be used in a rescoring pass in automatic speech recognition. We also show that a simple variation of the approach can be used to generate alternate name spellings, which may be useful for query expansion in information retrieval. By using a wide variety of sources, including automatic name phrase tagging of temporally relevant news text, OOV coverage can be improved by nearly a factor of two with only a 10% increase in the word list size. For one source, coverage increased from 13% to 94%. Phonetic pruning can be used to reduce the list size by an order of magnitude with only a small loss in coverage.  相似文献   

10.
针对俄语语音合成和语音识别系统中发音词典规模有限的问题,提出一种基于长短时记忆(LSTM)序列到序列模型的俄语词汇标音算法,同时设计实现了标音原型系统。首先,对基于SAMPA的俄语音素集进行了改进设计,使标音结果能够反映俄语单词的重音位置及元音弱化现象,并依据改进的新音素集构建了包含20 000词的俄语发音词典;然后利用TensorFlow框架实现了这一算法,该算法通过编码LSTM将俄语单词转换为固定维数的向量,再通过解码LSTM将向量转换为目标发音序列;最后,设计实现了具有交互式单词标音等功能的俄语词汇标音系统。实验结果表明,该算法在集外词测试集上的词形正确率达到了74.8%,音素正确率达到了94.5%,均高于Phonetisaurus方法。该系统能够有效为俄语发音词典的构建提供支持。  相似文献   

11.
This paper proposes and experimentally evaluates a method to determine the stressed syllable of a word in the framework of speech synthesis in Romanian. In order to produce high quality speech, a speech synthesis system needs information about the position of the stress for each word of a sentence to be generated. Otherwise, incorrect positioning of stress (or, in the worst case, completely ignoring it) translates into poor quality synthesized speech. Since Romanian is a free-stressed language (as is English, for example), the position of the stressed syllable within a word is not clearly defined. Consequently, a set of explicit rules that can determine the exact position of the stress is difficult to generate. In order to solve this problem, we propose an original method to find stressing rules for the Romanian language as well as an algorithm to implement this method. According to this algorithm, the position of the stressed syllable is computed according to a number of word parameters encompassing morphologic, phonetic, and lexical characteristics of the word. The experimental results show that the errors of the automatic stress assignment using our method do not exceed 6%.  相似文献   

12.
知识表示是自然语言理解的重要基础。知识表示不统一、语义信息无法系统化利用是目前存在的亟待解决的问题。要解决这个问题,就要解决语义知识表示的问题。该文基于概念层次网络,描述了词语、句子和篇章层面的语义知识表示方法。基于文中描述的词汇层面的表示方法,构建了一个多语言本体知识库。该知识库的知识表示方法不仅可以为知识表示理论研究提供基础,还可以为自然语言处理相关领域的应用提供资源支持。  相似文献   

13.
面向大规模语料的语言模型研究新进展   总被引:2,自引:0,他引:2  
N元语言模型是统计机器翻译、信息检索、语音识别等很多自然语言处理研究领域的重要工具.由于扩大训练语料规模和增加元数对于提高系统性能很有帮助,随着可用语料迅速增加,面向大规模训练语料的高元语言模型(如N≥5)的训练和使用成为新的研究热点.介绍了当前这个问题的最新研究进展,包括了集成数据分治、压缩和内存映射的一体化方法,基于随机存取模型的表示方法,以及基于分布式并行体系的语言模型训练与查询方法等几种代表性的方法,展示了它们在统计机器翻译中的性能,并比较了这些方法的优缺点.  相似文献   

14.
分布式词表示学习旨在用神经网络框架训练得到低维、压缩、稠密的词语表示向量。然而,这类基于神经网络的词表示模型有以下不足: (1) 罕见词由于缺乏充分上下文训练数据,训练所得的罕见词向量表示不能充分地反映其在语料中的语义信息; (2) 中心词语的反义词出现于上下文时,会使意义完全相反的词却赋予更近的空间向量表示; (3) 互为同义词的词语均未出现于对方的上下文中,致使该类同义词学习得到的表示在向量空间中距离较远。基于以上三点,该文提出了一种基于多源信息融合的分布式词表示学习算法(MSWE),主要做了4个方面的改进: (1) 通过显式地构建词语的上下文特征矩阵,保留了罕见词及其上下文词语在语言训练模型中的共现信息可以较准确地反映出词语结构所投影出的结构语义关联; (2) 通过词语的描述或解释文本,构建词语的属性语义特征矩阵,可有效地弥补因为上下文结构特征稀疏而导致的训练不充分; (3) 通过使用同义词与反义词信息,构建了词语的同义词与反义词特征矩阵,使得同义词在词向量空间中具有较近的空间距离,而反义词则在词向量空间中具有较远的空间距离; (4) 通过诱导矩阵补全算法融合多源特征矩阵,训练得到词语低维度的表示向量。实验结果表明,该文提出的MSWE算法能够有效地从多源词语特征矩阵中学习到有效的特征因子,在6个词语相似度评测数据集上表现出了优异的性能。  相似文献   

15.
As mobile computing devices grow smaller and as in-car computing platforms become more common, we must augment traditional methods of human-computer interaction. Although speech interfaces have existed for years, the constrained system resources of pervasive devices, such as limited memory and processing capabilities, present new challenges. We provide an overview of embedded automatic speech recognition (ASR) on the pervasive device and discuss its ability to help us develop pervasive applications that meet today's marketplace needs. ASR recognizes spoken words and phrases. State-of-the-art ASR uses a phoneme-based approach for speech modeling: it gives each phoneme (or elementary speech sound) in the language under consideration a statistical representation expressing its acoustic properties.  相似文献   

16.
提出了基于词干单元的维吾尔语和哈萨克语(以下称维-哈语)文本关键词提取方法。维-哈语属于资源缺乏的派生类语言,词素结构分析和词干提取方法能有效地减少派生类语言的粒度容量,并且可以提高其覆盖率。从网上下载维-哈语文本,并切分成词素序列,用word2vec训练词干向量以分布式表示文本内容,再用TF-IDF算法对其词干向量进行加权处理。根据训练集关键词干向量和测试集词干向量相似度来提取关键词。实验结果表明,基于词素切分及词干向量表示的方法是在维-哈语等派生类语言关键词提取任务中的重要步骤,通过这个步骤,能够提高关键词提取的准确率。  相似文献   

17.
This paper investigates a data-driven word decompounding algorithm for use in automatic speech recognition. An existing algorithm, called “Morfessor,” has been enhanced in order to address the problem of increased phonetic confusability arising from word decompounding by incorporating phonetic properties and some constraints on recognition units derived from forced alignments experiments. Speech recognition experiments have been carried out on a broadcast news task for the Amharic language to validate the approach. The out of vocabulary (OOV) word rates were reduced by 35% to 50% and a small reduction in word error rate (WER) has been achieved. The algorithm is relatively language independent and requires minimal adaptation to be applied to other languages.   相似文献   

18.
Development of a robust two-way real-time speech translationsystem exposes researchers and system developers to various challenges of machine translation(MT) and spoken language dialogues. The need for communicating in at least two differentlanguages poses problems not present for a monolingual spoken language dialogue system,where no MT engine is embedded within the process flow. Integration of various componentmodules for real-time operation poses challenges not present for text translation. In this paper,we present the CCLINC (Common Coalition Language System at Lincoln Laboratory) English–Koreantwo-way speech translation system prototype trained on doctor–patient dialogues,which integrates various techniques to tackle the challenges of automatic real-time speechtranslation. Key features of the system include (i) language–independent meaning representation which preserves the hierarchicalpredicate–argument structure of an input utterance, providing a powerful mechanism for discourse understanding of utterances originating from different languages,word-sense disambiguation and generation of various word orders of many languages, (ii) adoptionof the DARPA Communicator architecture, a plug-and-play distributed system architecturewhich facilitates integration of component modules and system operation in real time, and (iii)automatic acquisition of grammar rules and lexicons for easy porting of the system to differentlanguages and domains. We describe these features in detail and present experimental results.  相似文献   

19.
维吾尔语多词表达抽取方法研究   总被引:1,自引:0,他引:1  
多词表达是特殊的语言现象,一般由多个词构成来表示一个意义,语料中常出现在一起。多词表达因是特殊的单元,其抽取在自然语言处理的很多领域有着非常重要的作用。讨论了目前常见的三种统计方法即互信息、对数似然比以及卡方等在维吾尔语多词表达抽取方面的影响。根据维吾尔语的特点,将词干作为一项特征加到抽取方法中。语料的选择上考虑了覆盖面及领域,并探讨了它们对抽取方法的影响。  相似文献   

20.
Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words , the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号