首页 | 官方网站   微博 | 高级检索  
     


A framework for efficient development of Slovenian written language resources used in speech processing applications
Authors:Matej Rojc  Darinka Verdonik  Zdravko Kačič
Affiliation:1. Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ulica 17, 2000, Maribor, Slovenia
Abstract:This paper presents a framework for the efficient development and representation of morphological and phonetic lexicons, to be used in speech technology applications. Solutions that would be the most appropriate for developing speech technologies for specific language have to be analyzed when developing the lexicons. In the paper issues such as the development of resources, good word coverage in general texts, efficient coding of lexicons, representation (regarding time and memory space) and the integration of lexicons in speech processing applications are addressed. The construction process within the proposed framework is based on the use of finite-state machines and heterogeneous relation-graphs structures, and significantly reduces the time and effort needed for the construction of large-scale lexica, minimizes any analysis errors, and efficiently represents the lexicons, regarding time and memory usage. The wordlist construction process presented in the paper also guarantees that by using the constructed lexicons high word coverage is achieved in general texts. SIlex lexicons are large-scale phonetic and morphology lexicons for the Slovenian language, constructed within the new framework and with a developed toolset, and represent valuable language resources for the development of various speech processing applications for the Slovenian language.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号