首页 | 官方网站   微博 | 高级检索  
     

基于语法树的SAO结构识别方法研究
引用本文:杨超,朱东华,衡晓帆,汪雪锋.基于语法树的SAO结构识别方法研究[J].图书情报工作,2016,60(21):113-121.
作者姓名:杨超  朱东华  衡晓帆  汪雪锋
作者单位:北京理工大学管理与经济学院 北京 100081
基金项目:本文系国家自然科学基金面上项目“基于语义TRIZ的新兴技术创新路径预测研究”(项目编号:71373019)和国家高技术研究发展计划“面向政府管理的大数据智能服务系统及应用示范”(项目编号:2014AA015105)研究成果之一。
摘    要:目的/意义] SAO是一种能够表示主题信息和主题间关系的3元组结构,是文献计量学领域一个快速发展的研究方向。为了获得“满足文献计量分析需求的SAO结构”,需要解决现有SAO结构识别方法遭遇的3个问题:查全和查准率低、所识别SAO结构和领域主题相关性不强以及矩阵稀疏性。方法/过程] 提出一种面向文献计量分析的基于语法树的SAO结构识别方法,首先基于共现算法和“主题词簇”方法(term clumping)识别SAO核心组件,然后利用基于语法树的抽取算法实现SAO结构的逐层抽取。结果/结论] 案例研究发现,该方法的平均查准率为0.805 8,平均查全率为0.844 6,所识别SAO结构与领域主题关系较强,且矩阵稀疏性也得到较好改善,可有效应用于相关文献计量分析。

关 键 词:"主语-行为-宾语"(SAO)识别  语法树  语义分析  共现算法  主题词簇  
收稿时间:2016-08-25

Parse Tree-based SAO Structure Identification
Yang Chao,Zhu Donghua,Heng Xiaofan,Wang Xuefeng.Parse Tree-based SAO Structure Identification[J].Library and Information Service,2016,60(21):113-121.
Authors:Yang Chao  Zhu Donghua  Heng Xiaofan  Wang Xuefeng
Affiliation:School of Management and Economics, Beijing Institute of Technology, Beijing 100081
Abstract:Purpose/significance] Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in details and explore the relationship between topics. SAO analysis is a fast-growing research field. In order to obtain the SAO structures which are suitable for the bibliometric analysis, three problems need to be solved. Recall and precision have been low. The SAOs don't have close relationships with domain topics. There is a problem of matrix sparsity. Method/process] This paper proposed a parse tree-based SAO identification method for the bibliometric analysis. It included:(1) a model to identify the core components of SAO structures, where co-word analysis and term clumping processes were involved; (2) a parse tree-based hierarchical SAO extraction model to implement SAO structures identification. Result/conclusion] The case study shows that the average precision and average recall of the proposed method is 0.8058 and 0.8446. The SAO extracted with our method has a great relationship with the domain topic and improves the matrix sparsity, which makes it be used as an effective tool for the bibliometric analysis.
Keywords:subject-action-object (SAO) identification  parse tree  semantic analysis  co-word algorithm  term clumping  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号