期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

任纪生王作英《高技术通讯》2005,15(8):1-5

提出了基于聚类的方法实现词的快速量化表示,并由此导出潜在语义分析语言模型预测置信度,同时运用新提出的几何加权静态插值方式同三元文法模型相结合,构建了一种新的潜在语义分析语言模型,并将其应用于汉语语音识别。实验表明其效率和性能均优于传统基于奇异值分解的潜在语义分析语言模型,相比于三元文法模型,识别错误率相对下降为3．6％～7．1％左右,并为有效量化表示词对进一步提高潜在语义分析语言模型性能提供了新的途径。相似文献

2.

一种基于相似度的汉语语言模型平滑技术及其在音字转换中的应用

肖镜辉王晓龙刘秉权《高技术通讯》2006,16(2):127-132

针对汉语语言模型中的数据稀疏问题,利用词语语义信息,将词语相似度同back-off平滑技术相结合,提出一种基于词语相似度的汉语语言模型平滑技术,并且设计了一种能够自动优化模型中各项参数的迭代算法,最后,将这种平滑技术由低阶语言模型推广到高阶语言模型中,将上述技术应用到音字转换领域.实验表明,这项技术使语言模型的性能获得了较大的提高,并有效地降低了音字转换系统的错误率. 相似文献

3.

基于统计与规则相结合的汉语计算语言模型及其在语音识别中的应用 总被引：1，自引：0，他引：1

关毅王晓龙张凯《高技术通讯》1998,8(4):16-20

把基于统计的语料概率统计方法与基于规则的自然语言理解方法结合起来，提出了一种新的汉语计算语言模型，并把该模型应用于语音识别后处理模块中，取得了较理想的结果。相似文献

4.

基于融合特征的短语音汉语声调自动识别方法

下载免费PDF全文

沈凌洁王蔚《声学技术》2018,37(2):167-174

提出一种基于韵律特征(基频、时长)和梅尔倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征的融合特征进行短语音汉语声调识别的方法,旨在利用两种特征的优势提高短语音汉语声调识别率。该融合特征包括7个根据不同模型得到的韵律特征和统计参数以及4个从每个音段的梅尔倒谱系数计算得来的对数化后验概率,使用高斯混合模型表示4个声调的倒谱特征的分布。实验分两步:第一步,将基于韵律特征和倒谱特征的分类器在决策阶段混合起来进行声调分类,分别赋予两个分类器权重,计算倒谱特征和韵律特征在声调分类任务中的权重;第二步,将基于字的韵律特征和基于帧的倒谱特征结合起来生成融合特征的超向量,使用融合特征进行汉语声调识别,根据准确率、未加权平均召回率(Unweigted Average Recall,UAR)和科恩卡帕(Cohen’s Kappa)系数3个指标,比较并评估5种分类器(两种设置的高斯混合模型,后向传播神经网络,支持向量机和卷积神经网络(Convolutional Neural Network,CNN))在不平衡数据集上的分类效果。实验结果表明:(1)倒谱特征方法能够提高汉语声调的识别率,该特征在总体分类任务中的权重为0.11;(2)基于融合特征的深度学习(CNN)方法对声调的识别率最高,为87.6%,与高斯混合模型的基线系统相比,提高了5.87%。该研究证明了倒谱特征法能够提供与韵律特征法互补的信息,从而提高短语音汉语声调识别率;同时,该方法可以运用到韵律检测和副语言信息检测等相关研究中。相似文献

5.

基于注意力机制的TDNN-LSTM模型及应用

下载免费PDF全文

金浩朱文博段志奎陈建文李艾园《声学技术》2021,40(4):508-514

在大数据规模下,基于深度学习的语音识别技术已经相当成熟,但在小样本资源下,由于特征信息的关联性有限,模型的上下文信息建模能力不足从而导致识别率不高。针对此问题,提出了一种嵌入注意力机制层（Attention Mechanism）的时延神经网络（Time Delay Neural Network,TDNN）结合长短时记忆递归（Long Short Term Memory,LSTM）神经网络的时序预测声学模型,即TLSTM-Attention,有效地融合了具有重要信息的粗细粒度特征以提高上下文信息建模能力。通过速度扰动技术扩增数据,结合说话人声道信息特征以及无词格最大互信息训练准则,选取不同输入特征、模型结构及节点个数进行对比实验。实验结果表明,该模型相比于基线模型,词错误率降低了3.37个百分点。相似文献

6.

基于后验概率的汉语语音检索方法研究

郑铁然韩纪庆《高技术通讯》2009,19(2)

针对经典的向量空间检索模型直接用于基于音节lattice形式的汉语语音检索存在无法有效区分lattice中包含的正确音节识别候选和错误的识别候选以及不能充分利用lattice中所蕴含的各层级信息的不足,提出了一种基于语音文档邻接音节后验概率矩阵的检索方法.该方法以该矩阵作为文档索引,并计算查询请求被包含在语音文档中的后验概率,并以此来度量查询请求和语音文档间的相关度.后验概率作为可靠的置信测度能够有效区分正确和错误的音节候选,在lattice中后验概率的计算能够充分地利用语音识别结果中的多层级的信息.语音检索实验表明,与基于向量空间模型的检索方法相比,该方法的检索性能有显著提高,是一种适用于汉语语音检索的有效方法. 相似文献

7.

利用语音的频谱空间特征进行汉语抗噪语音识别的方法

下载免费PDF全文

张永锋田勇张阳《声学技术》2015,34(1):51-53

抗噪连续语音识别是当前汉语连续语音识别的重要研究领域。采用通过度量连续语音帧之间频谱的稳定性,将连续语音切分成份,再将切分结果(无论时间长短)变换为与时间无关的大小固定的频谱空间特征,通过与模板库进行比较实现语音识别。新的频谱空间特征,与语音时长无关,同时表现出较好的抗噪声能力。在特定人连续语音识别测试系统中,取得了不错的识别效果。相似文献

8.

基于HMM的实时语音识别方法研究

彭勇《中国科技博览》2013,(32):247-249,251

随着计算机技术和通信技术的快速发展，语音识别技术在国民经济中的各个领域得到了广泛的应用，并有相关产品的问世。但为了提高工作效率和节省企业的成本，有许多特定应用要与语音识别进行融合。针对企业报关系统的特点，采用了一种基于HMM模型的二级单字识别方法，解决了系统识别效率与识别稳定性的问题，使得该语音识别方法最终满足了报关系统的应用要求，并扼要介绍了词汇库维护、新人语音训练及建立语音新模型的过程。相似文献

9.

无端点检测汉语识别算法的实现及改进——动态时间规整和隐马尔可夫统一模型的应用 总被引：1，自引：1，他引：0

《声学技术》1998,(4)

语音识别算法中，动态时间规整（ＤＴＷ）和隐马尔可夫模型（ＨＭＭ）是最有效的识别算法，并且两者之间有着本质的联系和内在的统一［１］，据此前期工作中，已经建立了ＤＴＷ和ＨＭＭ的统一模型（ＤＨＵＭ）［２、３］。本文对ＤＨＵＭ进行了改进，在ＤＨＵＭ中引进寂静段自环，并根据汉语语音的特点，提出了一种无端点检测的语音识别算法。在识别过程中，该算法无需确定语音信号起止点位置，而是从寂静段开始，直接按帧提取特征（帧长２０ｍｓ，帧间重叠５０％），特征向量由１５阶倒谱系数和帧平均能量组成。实验中，用ＤＨＵＭ实现了该算法，对９９个相似汉语单字的识别测试结果表明：无端点检测的识别正识率为９４．９５％，正识率下降很少，但不作端点检测却降低了算法的复杂程度。为进一步改善识别性能，特征向量采用一种听觉模型特征，识别器具有更好的鲁棒性，识别率会略有提高。相似文献

10.

一种基于废料模型的关键词确认方法

马晓梅李雪耀《中国新技术新产品》2008,(15):27-28

关键词确认是语音识别中一个重要的研究方向。对于关键词确认系统来讲,废料模型的结构和类型对整个系统的性能有很大的影响。文中提出了一种基于音节格的废料模型。实验表明,与传统的基于音素类的废料模型相比,关键词确认率有了很大的提高。相似文献

11.

Tibetan Multi-Dialect Speech and Dialect Identity Recognition

Yue Zhao Jianjian Yue Wei Song Xiaona Xu Xiali Li Licheng Wu Qiang Ji 《计算机、材料和连续体（英文）》2019,60(3):1223-1235

Tibetan language has very limited resource for conventional automatic speech recognition so far. It lacks of enough data, sub-word unit, lexicons and word inventories for some dialects. And speech content recognition and dialect classification have been treated as two independent tasks and modeled respectively in most prior works. But the two tasks are highly correlated. In this paper, we present a multi-task WaveNet model to perform simultaneous Tibetan multi-dialect speech recognition and dialect identification. It avoids processing the pronunciation dictionary and word segmentation for new dialects, while, in the meantime, allows training speech recognition and dialect identification in a single model. The experimental results show our method can simultaneously recognize speech content for different Tibetan dialects and identify the dialect with high accuracy using a unified model. The dialect information used in output for training can improve multi-dialect speech recognition accuracy, and the low-resource dialects got higher speech content recognition rate and dialect classification accuracy by multi-dialect and multi-task recognition model than task-specific models. 相似文献

12.

Error analysis to improve the speech recognition accuracy on Telugu language

N USHA RANI P N GIRIJA 《Sadhana》2012,37(6):747-761

Speech is one of the most important communication channels among the people. Speech Recognition occupies a prominent place in communication between the humans and machine. Several factors affect the accuracy of the speech recognition system. Much effort was involved to increase the accuracy of the speech recognition system, still erroneous output is generating in current speech recognition systems. Telugu language is one of the most widely spoken south Indian languages. In the proposed Telugu speech recognition system, errors obtained from decoder are analysed to improve the performance of the speech recognition system. Static pronunciation dictionary plays a key role in the speech recognition accuracy. Modification should be performed in the dictionary, which is used in the decoder of the speech recognition system. This modification reduces the number of the confusion pairs which improves the performance of the speech recognition system. Language model scores are also varied with this modification. Hit rate is considerably increased during this modification and false alarms have been changing during the modification of the pronunciation dictionary. Variations are observed in different error measures such as F-measures, error-rate and Word Error Rate (WER) by application of the proposed method. 相似文献

13.

Tone recognition for continuous mandarin speech with limited training data using selected context‐dependent hidden markov models

Hsin‐Min Wang Lin‐Shan Lee 《中国工程学刊》2013,36(6):775-784

Abstract

Mandarin Chinese is a tonal language, in which every syllable is assigned a tone that has a lexical meaning. Therefore tone recognition is very important for Mandarin speech. This paper presents a method for continuous speech tone recognition. Context‐dependent discrete hidden Markov models (HMM's) are used taking into account the tones of the syllables on both sides, and special efforts were made in selecting the minimum number of key context‐dependent models considering the characteristics of the tones. The results indicate that a total of 23 context‐dependent models have very good potential to describe the complicated tone behavior for all 175 possible tone concatenation conditions in continuous speech, such that the required training data can be reduced to a minimum and the recognition process can be simplified significantly. The best achievable recognition rate is 83.55 %. 相似文献

14.

Ameliorated language modelling for lecture speech recognition of Indian English

Disha Kaur Phull G Bharadwaja Kumar 《Sadhana》2018,43(12):209

相似文献

15.

Mental Illness Disorder Diagnosis Using Emotion Variation Detection from Continuous English Speech

S. Lalitha Deepa Gupta Mohammed Zakariah Yousef Ajami Alotaibi 《计算机、材料和连续体（英文）》2021,69(3):3217-3238

Automatic recognition of human emotions in a continuous dialog model remains challenging where a speaker’s utterance includes several sentences that may not always carry a single emotion. Limited work with standalone speech emotion recognition (SER) systems proposed for continuous speech only has been reported. In the recent decade, various effective SER systems have been proposed for discrete speech, i.e., short speech phrases. It would be more helpful if these systems could also recognize emotions from continuous speech. However, if these systems are applied directly to test emotions from continuous speech, emotion recognition performance would not be similar to that achieved for discrete speech due to the mismatch between training data (from training speech) and testing data (from continuous speech). The problem may possibly be resolved if an existing SER system for discrete speech is enhanced. Thus, in this work the author’s existing effective SER system for multilingual and mixed-lingual discrete speech is enhanced by enriching the cepstral speech feature set with bi-spectral speech features and a unique functional set of Mel frequency cepstral coefficient features derived from a sine filter bank. Data augmentation is applied to combat skewness of the SER system toward certain emotions. Classification using random forest is performed. This enhanced SER system is used to predict emotions from continuous speech with a uniform segmentation method. Due to data scarcity, several audio samples of discrete speech from the SAVEE database that has recordings in a universal language, i.e., English, are concatenated resulting in multi-emotional speech samples. Anger, fear, sad, and neutral emotions, which are vital during the initial investigation of mentally disordered individuals, are selected to build six categories of multi-emotional samples. Experimental results demonstrate the suitability of the proposed method for recognizing emotions from continuous speech as well as from discrete speech. 相似文献

16.

空间遥控机械手的声控识别与应答系统

贺怀清洪炳熔《高技术通讯》1999,(12):23-27

描述了用自然语言控制空间遥控机械手执行远程操作的自然语言接口系统。该系统以孤立词语音识别为核心,着重于系统的可靠性和实用性。为此,在系统的开发过程中,考虑了人类语音交互的听觉感知特点、汉语的一字一音节特点、具体识识时语音的声学模型不必与其语言学模型严格一致的特点以及环境噪声对系统性能的影响等,提出以过渡段＋韵母段作为识别基元,采取多层识别策略进行识别。识别实验与仿真实验结果表明,系统达到了预期性能相似文献