首页 | 官方网站   微博 | 高级检索  
     

基于分布式表示的汉字部件表义能力测量与应用
引用本文:梁诗尘,唐雪梅,胡韧奋,吴金闪,刘智颖.基于分布式表示的汉字部件表义能力测量与应用[J].中文信息学报,2021,35(5):17-26.
作者姓名:梁诗尘  唐雪梅  胡韧奋  吴金闪  刘智颖
作者单位:1.北京师范大学 中文信息处理研究所,北京 100875;
2.神州泰岳-北京师范大学 人工智能联合实验室,北京 100875;
3.北京师范大学 系统科学学院,北京 100875
基金项目:国家语委科研项目(ZDI135-42);国家社会科学基金(18CYY029);教育部人文社会科学基金(18YJAZH112)
摘    要:汉字的表义性是其区别于表音文字的一大特点。部件作为构字单位,同汉字的意义之间有着很大的联系。然而,汉字部件的表义能力究竟如何是学界尚待讨论的课题。针对这一问题,该文从汉字部件入手,提出了融合部件的字词分布式表示模型。该模型在向量内部评测任务上性能获得了一定提升,在汉字理据性测量任务上也与人工打分结果显著相关。基于该模型,进一步提出了部件表义能力的计算方法,对汉字部件的表义能力做了整体评估,并结合部件的构字能力建立了现代汉字部件的等级体系。测量结果显示,现代汉字部件具有一定表义能力,但整体而言表义能力偏低。最后,将测量结果应用于对外汉语教学中,确立了适用于部件教学法的部件范围,并提出了对应的汉字教学顺序方案。

关 键 词:汉字部件  表义能力测量  分布式表示  
收稿时间:2019-09-19

Measurement and Application of Chinese Component Semantic Ability Based on Distributed Representation
LIANG Shichen,TANG Xuemei,HU Renfen,WU Jinshan,LIU Zhiying.Measurement and Application of Chinese Component Semantic Ability Based on Distributed Representation[J].Journal of Chinese Information Processing,2021,35(5):17-26.
Authors:LIANG Shichen  TANG Xuemei  HU Renfen  WU Jinshan  LIU Zhiying
Affiliation:1.Institute of Chinese Information Processing, Beijing Normal University, Beijing 100875, China;2.UltraPower-BNU Joint Laboratory for Artificial Intelligence, Beijing Normal University, Beijing 100875, China;3.School of Systems Science, Beijing Normal University, Beijing 100875, China
Abstract:The semantic representation of Chinese characters is one of the characteristics that distinguishes them from phonetic characters. As a unit of character construction, components are closely related to the meaning of Chinese characters. However, how to measure the meaning of Chinese character components is an issue remains to be discussed. In this paper, we focus on components in Chinese character and train a multi-granularity Chinese word embedding, which are proved positive in the internal evaluation task of word embedding and the motivation mea-surement of Chinese character. Based on this model, we further put forward a formula to calculate the semantic ability of components, revealing that components in Chinese characters have certain but limited semantic ability. Meanwhile, we further establish the grading system of components by taking the semantic ability of components into account. Finally, for the teaching of Chinese as a foreign language, We establish the scope of component teaching, and put forward a scheme of teaching sequence of Chinese characters.
Keywords:Chinese character component  semantic ability measurement  distributed representation  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号