首页 | 官方网站   微博 | 高级检索  
     

基于改进词向量的石油文档语义关系识别
引用本文:宫法明,朱朋海.基于改进词向量的石油文档语义关系识别[J].计算机系统应用,2018,27(8):153-158.
作者姓名:宫法明  朱朋海
作者单位:中国石油大学(华东) 计算机与通信工程学院, 青岛 266580,中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
基金项目:科技部创新方法工作专项(2015IM01030)
摘    要:语义关系识别是对文档进行处理识别出包含的语义关系的过程,是构建本体重要组成部分之一.在石油领域本体的构建过程中,由于石油领域的文档具有组合词多的特点,语义关系识别更加困难.目前使用的语义识别算法主要是基于关联规则的识别算法,但此类算法没有领域针对性.通过分析石油文档的特点,提出一种基于改进词向量的石油文档语义关系识别算法,以连续词袋(Continuous Bag-Of-Words,CBOW)模型为基础,对石油专业术语进行扩展训练,引入负采样和二次采样技术提高训练准确率和效率,利用向量特征训练支持向量机(Support Vector Mechine,SVM)分类器进行语义关系识别.实验结果表明,该方法训练的词向量能够准确识别石油领域的语义关系,在石油领域具有明显的优势.

关 键 词:词向量  语义关系识别  SVM
收稿时间:2017/12/10 0:00:00
修稿时间:2018/1/4 0:00:00

Semantic Relationship Recognition of Oil Documents Based on Improved Word Vector
GONG Fa-Ming and ZHU Peng-Hai.Semantic Relationship Recognition of Oil Documents Based on Improved Word Vector[J].Computer Systems& Applications,2018,27(8):153-158.
Authors:GONG Fa-Ming and ZHU Peng-Hai
Affiliation:College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China and College of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China
Abstract:Semantic relationship recognition is the process of document processing and is used to identify the semantic relations contained in the process, which is an important part of the construction of ontology. In the process of constructing petroleum field ontology, the semantic relationship identification is more difficult because the documents in the petroleum field have their unique characteristics. The current semantic recognition algorithm is mainly based on association rules'' recognition algorithm, but there is no field-specific orientation. By analyzing the characteristics of petroleum documents, this study proposes a semantic relationship recognition algorithm for petroleum documents based on improved word vector. Based on the Continuous Bag-Of-Words (CBOW) model, this study carries out expanded model training on petroleum terminologies and introduces negative sampling and subsampling techniques to improve the training accuracy and efficiency. Feature vectors are used in training the Support Vector Mechine (SVM) classifier for semantic relationship recognition. The experimental results show that the word vectors trained by this method can accurately identify the semantic relations contained in documents in the petroleum field and have obvious advantages.
Keywords:word vector  semantic relationship recognition  Support Vector Mechine (SVM)
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号