首页 | 官方网站   微博 | 高级检索  
     

基于条件随机场的学术期刊中理论的自动识别方法
引用本文:陈锋,翟羽佳,王芳.基于条件随机场的学术期刊中理论的自动识别方法[J].图书情报工作,2016,60(2):122-128.
作者姓名:陈锋  翟羽佳  王芳
作者单位:南开大学商学院信息资源管理系 天津 300071; 南开大学商学院网络社会治理研究中心 天津 300071
基金项目:本文系国家社会科学基金重大项目"我国网络社会治理研究"(项目编号:14ZDA063)研究成果之一。
摘    要:目的/意义]从学术期刊中抽取其中的理论是对文献进行内容分析的前提,实现理论名称识别的自动化可以提高内容分析的效率。方法/过程]将理论识别视为一类命名实体识别问题,总结现有的命名实体识别的常用方法,提出一个基于语义泛化思想的命名实体识别方法,选取词性、知网义原等外部知识,采用CRF模型对《情报学报》1822篇论文的标题和摘要进行实验。结果/结论]实验表明,识别准确率最高达到95.38%,但召回率较低;训练语料规模对性能影响较大,不同程度的语义泛化方法对准确率和召回率有复杂影响。如何选择语义特征、语义标注和语义消歧是需要解决的新问题。

关 键 词:理论识别  命名实体识别  引文分析  语义泛化  
收稿时间:2015-10-25

Automatic Theory Recognition in Academic Journals Based on CRF
Chen Feng,Zhai Yujia,Wang Fang.Automatic Theory Recognition in Academic Journals Based on CRF[J].Library and Information Service,2016,60(2):122-128.
Authors:Chen Feng  Zhai Yujia  Wang Fang
Affiliation:Department of Information Resources Management, Business School of Nankai University, Tianjin 300071
Abstract:Purpose/significance] Theory recognition in the academic journals is a precondition for content analysis, so the automation of theory recognition can improve the efficiency of content analysis. Method/process] This paper regards theory recognition as named entity recognition, reviews the existing named entity recognition methods, and proposes a theory recognition model based on semantic generalization. Selecting the part of speech, HowNet semantic and other external knowledge, a series of experiments with CRF model on 1822 academic journal papers are conducted. Result/conclusion] The accuracy rate of recognition is 95.38% high, but the recall rate is low;the size of the training texts has a large influence on the performance. Semantic resources can improve the performance, but the recall rate is decreased. How to select the semantic features, semantic annotation and semantic disambiguation has to be solved.
Keywords:theory recognition  named entity recognition(NER)  citation content analysis  semantic generalization  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号