首页 | 官方网站   微博 | 高级检索  
     

学术文本词汇功能识别--在关键词自动抽取中的应用
引用本文:姜艺,黄永,夏义堃,李鹏程,陆伟.学术文本词汇功能识别--在关键词自动抽取中的应用[J].情报学报,2021(2):152-162.
作者姓名:姜艺  黄永  夏义堃  李鹏程  陆伟
作者单位:武汉大学信息管理学院;武汉大学信息检索与知识挖掘研究所;武汉大学信息资源研究中心
基金项目:国家社会科学基金重大项目“基于认知计算的学术论文评价理论与方法研究”(17ZDA292)。
摘    要:传统的关键词自动抽取常以候选词的出现频次、位置等非语义信息构建特征,并未考虑关键词在学术文献中承担的特定语义角色,即词汇功能。通过对现有数据统计,本文发现作者标注关键词中约有67.99%是研究问题或研究方法词。因此,本文将关键词的词汇功能分为三类:“研究问题”“研究方法”和“其他”,在传统的词频特征以及位置特征基础上,融合词汇功能特征,使用计算机领域的学术文献基于分类和排序两种思想进行关键词抽取实验。实验结果表明,融合词汇功能后,关键词抽取效果得到明显提升。相较于基准实验,二分类模型的准确率Acc和F值分别相对提升24.63%和25.19%,达到了0.840和0.666;排序模型的MAP、NDCG@5和P@5分别相对提升168.32%、189.50%和148.30%,提升至0.813、0.828和0.447,证明了学术文献词汇功能特征在关键词自动抽取中具有重要作用。

关 键 词:词汇功能  关键词抽取  支持向量机  学习排序  学术文本

Recognition of Lexical Functions in Academic Texts:Application in Automatic Keyword Extraction
Jiang Yi,Huang Yong,Xia Yikun,Li Pengcheng,Lu Wei.Recognition of Lexical Functions in Academic Texts:Application in Automatic Keyword Extraction[J].Journal of the China Society for Scientific andTechnical Information,2021(2):152-162.
Authors:Jiang Yi  Huang Yong  Xia Yikun  Li Pengcheng  Lu Wei
Affiliation:(School of Information Management,Wuhan University,Wuhan 430072;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072;Center for Studies of Information Resources,Wuhan University,Wuhan 430072)
Abstract:Traditional automatic keyword extraction often uses non-semantic information such as the frequency and location of candidate keywords to construct features without considering the specific semantic role of keywords in the academic text,that is,lexical function.Our statistical analysis found that 67.99%of the keywords in our dataset represented research questions or methods.Therefore,we classified lexical functions into three categories:Research Questions,Research Methods,and Others.Then,based on the word frequency and position features,a method was proposed to implement lexical functions in computer science papers through a classification model and ranking model.The results showed that our method could outperform the baseline with base features.The Acc and F of the classification model were improved to 0.840 and 0.666,with relative improvements of 24.63%and 25.19%,respectively.The MAP,NDCG@5,and P@5 of the ranking model improved by 168.32%,189.50%,and 148.30%,reaching 0.813,0.828,and 0.447,respectively.All improvements showed that lexical functions play an important role in automatic keyword extraction.
Keywords:lexical function  keyword extraction  SVM  learning to rank  academic text
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号