首页 | 官方网站   微博 | 高级检索  
     

A Dialectal Chinese Speech Recognition Framework
作者姓名:Jing Li  Thomas Fang Zheng  William Byrne  and Dan Jurafsky
作者单位:[1]Center for Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing 100084, P.R. China [2]Machine Intelligence Laboratory, Cambridge University, U.K. [3]Center for Language and Speech Processing, The Johns Hopkins University, U.S.A. [4]Department of Linguistics, Stanford University, U.S.A.
基金项目:This paper is based upon a study supported by the US National Science Foundation under Grant No.0121285. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
摘    要:A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use context-independent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.

关 键 词:中国  方言  语言识别  语音数据  语音模型
收稿时间:2004-10-20
修稿时间:2004-10-202005-01-17

A Dialectal Chinese Speech Recognition Framework
Jing Li,Thomas Fang Zheng,William Byrne,and Dan Jurafsky.A Dialectal Chinese Speech Recognition Framework[J].Journal of Computer Science and Technology,2006,21(1):106-115.
Authors:Jing Li  Thomas Fang Zheng  William Byrne  Dan Jurafsky
Affiliation:(1) Center for Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China;(2) Machine Intelligence Laboratory, Cambridge University, U.K.;(3) Center for Language and Speech Processing, The Johns Hopkins University, U.S.A.;(4) Department of Linguistics, Stanford University, U.S.A.
Abstract:A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use context-independent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.
Keywords:dialectal Chinese speech recognition  initial or final (IF)  IF-mapping rule  pronunciation modeling  small quantity of speech data
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
点击此处可从《计算机科学技术学报》浏览原始摘要信息
点击此处可从《计算机科学技术学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号