首页 | 官方网站   微博 | 高级检索  
     


Lexicon+TX: rapid construction of a multilingual lexicon with under-resourced languages
Authors:Lian Tze Lim  Lay-Ki Soon  Tek Yong Lim  Enya Kong Tang  Bali Ranaivo-Malançon
Affiliation:1. School of Engineering, Science and Technology, KDU College Penang, 32 Jalan Anson, 10400, Georgetown, Penang, Malaysia
2. Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100, Cyberjaya, Selangor, Malaysia
3. Linton University College, Persiaran UTL, Bandar Universiti Teknologi Legenda, Batu 12, 71700, Mantin, Negeri Sembilan, Malaysia
4. Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300, Kota Samarahan, Sarawak, Malaysia
Abstract:Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2  % of 500 random multilingual entries in the generated lexicon require minimal or no human correction.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号