面向哈萨克语LVCSR的语言模型构建方法研究 On language model construction for LVCSR in Kazakh期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向哈萨克语LVCSR的语言模型构建方法研究

引用本文：	达吾勒·阿布都哈依尔,努尔买买提·尤鲁瓦斯,刘艳. 面向哈萨克语LVCSR的语言模型构建方法研究[J]. 计算机工程与应用, 2016, 52(24): 178-181

作者姓名：	达吾勒·阿布都哈依尔努尔买买提·尤鲁瓦斯刘艳

作者单位：	新疆大学信息科学与工程学院，乌鲁木齐 830046

摘要：	一个好的语言模型不仅可以压缩语音识别过程中的搜索空间，而且还可以提高识别准确率。N-gram统计语言模型是目前广泛使用的语言模型之一。从文本的收集和处理开始，介绍了哈萨克语语言模型的构建相关技术，并以此为基础实现了一个哈萨克语连续语音识别基线系统。分别训练了基于单词和基于音节的3-gram语言模型，并通过困惑度及连续语言实验结果对两种语言模型进行了评价。
关键词：	哈萨克语语言模型语音识别语料库构建文本处理
On language model construction for LVCSR in Kazakh

Dawel Abilhayer,Nurmemet Yolwas,LIU Yan. On language model construction for LVCSR in Kazakh[J]. Computer Engineering and Applications, 2016, 52(24): 178-181

Authors:	Dawel Abilhayer Nurmemet Yolwas LIU Yan

Affiliation:	College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Abstract:	A good language model not only compresses the search space for speech recognition process, but also improves the recognition accuracy. N-gram statistical language model is one of the widely used language models. This paper starts from the collection and processing of the text, introduces the construction technology of Kazakh language model. On?this?basis?a Kazakh continuous speech recognition baseline system?is?implemented. It trains the 3-gram language model based on word and syllable respectively, and then the two language models are evaluated by the result of perplexity and continuous language experiment.

Keywords:	Kazakh language language model Automatic Speech Recognition（ASR） corpus creation text processing

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏