首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 577 毫秒
1.
基于主题的语言模型自适应方法应尽可能提高语言模型权重系数的更新速度并降低语言模型的调用量以满足语音识别实时性要求。本文采用基于聚类的方法实现连续相邻二元词对的量化表示并以此刻画语音识别预测历史和各个文本主题中心,依据语音识别历史矢量和各个文本主题中心矢量的相似度更新语言模型权重系数并摒弃全局语言模型。同传统的基于EM算法的自适应方法相比,实验表明该方法明显提高了语音识别性能和实时性,识别错误率相对下降5.1% ,说明该方法可比较准确地判断测试内容所属文本主题。  相似文献   

2.
针对传统的语音识别系统采用数据驱动并利用语言模型来决策最优的解码路径,导致在部分场景下的解码结果存在明显的音对字错的问题,提出一种基于韵律特征辅助的端到端语音识别方法,利用语音中的韵律信息辅助增强正确汉字组合在语言模型中的概率。在基于注意力机制的编码-解码语音识别框架的基础上,首先利用注意力机制的系数分布提取发音间隔、发音能量等韵律特征;然后将韵律特征与解码端结合,从而显著提升了发音相同或相近、语义歧义情况下的语音识别准确率。实验结果表明,该方法在1 000 h及10 000 h级别的语音识别任务上分别较端到端语音识别基线方法在准确率上相对提升了5.2%和5.0%,进一步改善了语音识别结果的可懂度。  相似文献   

3.
目前语音跟踪在说话人干扰的条件下,即一段语音中存在多个说话人的混合语音信号时,语音跟踪质量会严重下降。针对这种情况,提出一种基于聚类分析与说话人识别的语音跟踪算法。算法首先使用改进的聚类分析方法进行语音分离,具体包括在K-means聚类中对质心进行缓存并降低采样率,以及在embedding特征空间引入正则项。其次,算法采用GMM-UBM说话人模型进行语音跟踪。实验结果表明改进的聚类分析方法可以有效提高算法的实时性及其语音分离质量,GMM-UBM模型在3 s语音的测试中具有84%的识别率。  相似文献   

4.
由于人类情感的表达受文化和社会的影响,不同语言语音情感的特征差异较大,导致单一语言语音情感识别模型泛化能力不足。针对该问题,提出了一种基于多任务注意力的多语言语音情感识别方法。通过引入语言种类识别辅助任务,模型在学习不同语言共享情感特征的同时也能学习各语言独有的情感特性,从而提升多语言情感识别模型的多语言情感泛化能力。在两种语言的维度情感语料库上的实验表明,所提方法相比于基准方法在Valence和Arousal任务上的相对UAR均值分别提升了3.66%~5.58%和1.27%~6.51%;在四种语言的离散情感语料库上的实验表明,所提方法的相对UAR均值相比于基准方法提升了13.43%~15.75%。因此,提出的方法可以有效地抽取语言相关的情感特征并提升多语言情感识别的性能。  相似文献   

5.
Emotion recognition in speech signals is currently a very active research topic and has attracted much attention within the engineering application area. This paper presents a new approach of robust emotion recognition in speech signals in noisy environment. By using a weighted sparse representation model based on the maximum likelihood estimation, an enhanced sparse representation classifier is proposed for robust emotion recognition in noisy speech. The effectiveness and robustness of the proposed method is investigated on clean and noisy emotional speech. The proposed method is compared with six typical classifiers, including linear discriminant classifier, K-nearest neighbor, C4.5 decision tree, radial basis function neural networks, support vector machines as well as sparse representation classifier. Experimental results on two publicly available emotional speech databases, that is, the Berlin database and the Polish database, demonstrate the promising performance of the proposed method on the task of robust emotion recognition in noisy speech, outperforming the other used methods.  相似文献   

6.
In this paper, a novel method for voiced-unvoiced decision within a pitch tracking algorithm is presented. Voiced-unvoiced decision is required for many applications, including modeling for analysis/synthesis, detection of model changes for segmentation purposes and signal characterization for indexing and recognition applications. The proposed method is based on the generalized likelihood ratio test (GLRT) and assumes colored Gaussian noise with unknown covariance. Under voiced hypothesis, a harmonic plus noise model is assumed. The derived method is combined with a maximum a-posteriori probability (MAP) scheme to obtain a pitch and voicing tracking algorithm. The performance of the proposed method is tested using several speech databases for different levels of additive noise and phone speech conditions. Results show that the GLRT is robust to speaker and environmental conditions and performs better than existing algorithms.  相似文献   

7.
Text-independent speech segmentation is a challenging topic in computer-based speech recognition systems. This paper proposes a novel time-domain algorithm based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-term energy, zero-crossing rate and the singularity exponents are the time-domain features that we have calculated in each point of speech signal in order to exploit relevant information for generating the significant segments. This is down for the phoneme or syllable identification and the transition fronts. Fuzzy logic technique helped us to fuzzify the calculated features into three complementary sets namely: low, medium, high and to perform a matching phase using a set of fuzzy rules. The outputs of our proposed algorithm are silence, phonemes, or syllables. Once evaluated, our algorithm produced the best performances with efficient results on Fongbe language (an African tonal language spoken especially in Benin, Togo and Nigeria).  相似文献   

8.
基于HTK 的特定词语音识别系统   总被引:1,自引:1,他引:0  
语音识别技术经过半个世纪的发展,目前已日趋成熟,其在语音拨号系统、数字遥控、工业控制等领域都有了广泛的应用。由于目前常用的声学模型和语言模型的局限性,计算机只能识别一些词汇或一些句子。语音识别系统在语种改变时,往往会出现错误的识别结果。针对上述问题,结合隐马尔可夫模型原理,在HTK语音处理工具箱的基础上构建了中英文特定词语音识别系统。该系统通过代码控制整个构建过程,使其在更换新的训练数据和词典后能快速生成对应的识别模型。  相似文献   

9.
一种联合语种识别的新型大词汇量连续语音识别算法   总被引:1,自引:1,他引:0  
单煜翔  邓妍  刘加 《自动化学报》2012,38(3):366-374
提出了一种联合语种识别的新型大词汇量连续语音识别(Large vocabulary continuous speech recognition, LVCSR)算法,并构建了实时处理系统. 该算法能够充分利用语音解码过程中收集的音素识别假设,在识别语音内容的同时识别语种类别.该系统可以应用于多语种环境,不仅可以以更小的系统整体计算开销替代独立的语种识别模块,更能有效应对在同一段语音中混有非目标语种的情况,极大地减少由非目标语种引入的无意义识别错误,避免错误积累对后续识别过程的误导.为将语音内容识别和语种识别紧密整合在一个统一语音识别解码过程中,本文提出了三种不同的算法对解码产生的音素格结构进行调整(重构):一方面去除语音识别中由发音字典和语言模型引入的特定目标语种偏置,另一方面在音素格中包含更加丰富的音素识别假设.实验证明, 音素格重构算法可有效提高联合识别中语种识别的精度.在汉语为目标语种、汉英混杂的电话对话语音库上测试表明,本文提出的联合识别算法将集外语种引起的无意义识别错误减少了91.76%,纯汉字识别错误率为54.98%.  相似文献   

10.
Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.  相似文献   

11.
李伟  吴及  吕萍 《计算机应用》2010,30(10):2563-2566
为了克服语音识别中单遍解码词图生成算法速度较慢的缺点,提出一种基于前后向语言模型的两遍快速解码算法。两遍解码分别采用前向与后向语言模型,同时通过优化以减少前后向语言模型不匹配对识别结果造成的影响。实验证明,该算法在保持识别准确率的基础上有效地提升了解码速度。  相似文献   

12.
基于子带GMM-UBM的广播语音多语种识别   总被引:2,自引:0,他引:2  
提出了一种基于概率统计模型的与语言内容无关的语种识别方法,它不需要掌握各语种的专业语言学知识就可以实现几十种语言的语种识别;并针对广播语音噪声干扰大的特点,采用GMM-UBM模型作为语种模型,提高了系统的噪声鲁棒性;由于广播语音的背景噪声不是简单的全频带加性白噪声,因此本文构建了一种基于子带GMM-UBM模型的多子系统结构的语种识别系统,后端采用神经网络进行系统级融合。本文通过对37种语言及方言的识别实验,证明了子带GMM-UBM方法的有效性。  相似文献   

13.
In this paper, a sinusoidal model has been proposed for characterization and classification of different stress classes (emotions) in a speech signal. Frequency, amplitude and phase features of the sinusoidal model are analyzed and used as input features to a stressed speech recognition system. The performances of sinusoidal model features are evaluated for recognition of different stress classes with a vector-quantization classifier and a hidden Markov model classifier. To find the effectiveness of these features for recognition of different emotions in different languages, speech signals are recorded and tested in two languages, Telugu (an Indian language) and English. Average stressed speech index values are proposed for comparing differences between stress classes in a speech signal. Results show that sinusoidal model features are successful in characterizing different stress classes in a speech signal. Sinusoidal features perform better compared to the linear prediction and cepstral features in recognizing the emotions in a speech signal.  相似文献   

14.
张瑛琪  彭大卫  李森  孙莹  牛强 《计算机应用》2022,42(6):1762-1769
近年来,有研究提出了使用多个定制且可拉伸的射频识别(RFID)标签进行语音识别的无线平台,但该标签难以精准捕捉拉伸引起的大频率偏移,而且需要探测多个标签,标签脱落或自然磨损时还须重新校准。针对以上问题,提出基于单标签RFID的唇语识别算法,将灵活、易于隐藏且没有侵入性的单个通用RFID标签贴在脸上,即使用户不发出声音,仅依靠面部的微动作也可进行唇语识别。首先建立模型处理RFID阅读器接收的单个标签随时间和频率响应的接收信号强度(RSS)和相位变化,然后采用高斯函数对原始数据的噪点进行平滑去噪预处理,再采用动态时间规整(DTW)算法对收集到的信号特征进行评估分析,以解决发音长短不匹配的问题;最后创建无线语音识别系统来识别区分与声音相对应的面部表情,从而达到识别唇语的目的。实验结果表明,对于识别不同用户的200组数字信号特征,该方法的RSS准确率可以达到86.5%以上。  相似文献   

15.
在资源相对匮乏的自动语音识别(Automatic speech recognition, ASR)领域, 如面向电话交谈的语音识别系统中, 统计语言模型(Language model, LM)存在着严重的数据稀疏问题. 本文提出了一种基于等概率事件的采样语料生成算法, 自动生成领域相关的语料, 用来强化统计语言模型建模. 实验结果表明, 加入本算法生成的采样语料可以缓解语言模型的稀疏性, 从而提升整个语音识别系统的性能. 在开发集上语言模型的困惑度相对降低7.5%, 字错误率(Character error rate, CER)绝对降低0.2个点; 在测试集上语言模型的困惑度相对降低6%, 字错误率绝对降低0.4点.  相似文献   

16.
17.
姚煜  RYAD Chellali 《计算机应用》2018,38(9):2495-2499
针对隐马尔可夫模型(HMM)在语音识别中存在的不合理条件假设,进一步研究循环神经网络的序列建模能力,提出了基于双向长短时记忆神经网络的声学模型构建方法,并将联结时序分类(CTC)训练准则成功地应用于该声学模型训练中,搭建出不依赖于隐马尔可夫模型的端到端中文语音识别系统;同时设计了基于加权有限状态转换器(WFST)的语音解码方法,有效解决了发音词典和语言模型难以融入解码过程的问题。与传统GMM-HMM系统和混合DNN-HMM系统对比,实验结果显示该端到端系统不仅明显降低了识别错误率,而且大幅提高了语音解码速度,表明了该声学模型可以有效地增强模型区分度和优化系统结构。  相似文献   

18.
Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. Identification of the accent before the speech recognition can improve performance of the speech recognition systems. If the number of accents is more in a language, the accent recognition becomes crucial. Telugu is an Indian language which is widely spoken in Southern part of India. Telugu language has different accents. The main accents are coastal Andhra, Telangana, and Rayalaseema. In this present work the samples of speeches are collected from the native speakers of different accents of Telugu language for both training and testing. In this work, Mel frequency cepstral coefficients (MFCC) features are extracted for each speech of both training and test samples. In the next step Gaussian mixture model (GMM) is used for classification of the speech based on accent. The overall efficiency of the proposed system to recognize the speaker, about the region he belongs, based on accent is 91 %.  相似文献   

19.
话题检测与跟踪的评测及研究综述   总被引:8,自引:0,他引:8  
话题检测与跟踪是一项面向新闻媒体信息流进行未知话题识别和已知话题跟踪的信息处理技术。自从1996年前瞻性的探索以来,该领域进行的多次大规模评测为信息识别、采集和组织等相关技术提供了新的测试平台。由于话题检测与跟踪相对于信息检索、信息挖掘和信息抽取等自然语言处理技术具备很多共性,并面向具备突发性和延续性规律的新闻语料,因此逐渐成为当前信息处理领域的研究热点。本文简要介绍了话题检测与跟踪的研究背景、任务定义、评测方法以及相关技术,并通过分析目前TDT领域的研究现状展望未来的发展趋势。  相似文献   

20.
We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language translation between English and Mandarin Chinese for limited domains, such as air travel reservations. In MARS, machine translation is embedded within a complex speech processing task, and the translation performance is highly effected by the performance of other components, such as the recognizer and semantic parser, etc. All components in the proposed system are statistically trained using an appropriate training corpus. The speech signal is first recognized by an automatic speech recognizer (ASR). Next, the ASR-transcribed text is analyzed by a semantic parser, which uses a statistical decision-tree model that does not require hand-crafted grammars or rules. Furthermore, the parser provides semantic information that helps further re-scoring of the speech recognition hypotheses. The semantic content extracted by the parser is formatted into a language-independent tree structure, which is used for an interlingua based translation. A Maximum Entropy based sentence-level natural language generation (NLG) approach is used to generate sentences in the target language from the semantic tree representations. Finally, the generated target sentence is synthesized into speech by a speech synthesizer.Many new features and innovations have been incorporated into MARS: the translation is based on understanding the meaning of the sentence; the semantic parser uses a statistical model and is trained from a semantically annotated corpus; the output of the semantic parser is used to select a more specific language model to refine the speech recognition performance; the NLG component uses a statistical model and is also trained from the same annotated corpus. These features give MARS the advantages of robustness to speech disfluencies and recognition errors, tighter integration of semantic information into speech recognition, and portability to new languages and domains. These advantages are verified by our experimental results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号