首页 | 官方网站   微博 | 高级检索  
     

基于信息关联拓扑的互联网社交关系挖掘
引用本文:刘锦文,邢凯,芮伟康,张利萍,周慧.基于信息关联拓扑的互联网社交关系挖掘[J].计算机应用,2016,36(7):1875-1880.
作者姓名:刘锦文  邢凯  芮伟康  张利萍  周慧
作者单位:1. 中国科学技术大学 计算机科学与技术学院, 合肥 230022;2. 中国科学技术大学 苏州研究院, 江苏 苏州 215123;3. 苏州工业园区疾病防治中心, 江苏 苏州 215123
基金项目:国家自然科学基金资助项目(61332004),苏州市科技计划项目产业技术创新专项(民生科技)(SS201509)。
摘    要:针对目前基于监督学习的关系抽取方法需要标注大量训练数据和预先定义关系类型,提出了一种基于词语共现信息构建关联网络并在关联网络上进行图聚类分析的人物关系提取方法。首先,从新闻标题数据获得关联度较高的500个人物对用于关系抽取研究;然后,抓取关联人物对所在新闻数据,对其进行预处理,并利用词频-逆向文档频率(TF-IDF)得到人物对共现句子中的关键词;其次,基于词语共现信息得到词语之间的关联,进而建立关键词关联网络;最后,利用对关联网络进行图聚类分析以获得人物关系。在关系抽取的实验中,与传统基于词语共现和模式匹配的中文实体关系提取方法相比,所提方法在准确率、召回率和平衡F分数(F-score)上分别提升了5.5,3.7和4.4个百分点。实验结果表明,所提算法能够在没有标注训练数据的条件下,有效地从新闻数据中抽取丰富且高质量的人物关系数据。

关 键 词:社会关系抽取    共现统计    词语关联度    关联网络    图聚类
收稿时间:2016-01-25
修稿时间:2016-02-29

Info-association topology based social relationship mining on Internet
LIU Jinwen,XING Kai,RUI Weikang,ZHANG Liping,ZHOU Hui.Info-association topology based social relationship mining on Internet[J].journal of Computer Applications,2016,36(7):1875-1880.
Authors:LIU Jinwen  XING Kai  RUI Weikang  ZHANG Liping  ZHOU Hui
Affiliation:1. School of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230022, China;2. Suzhou Institute of Advanced Study, University of Science and Technology of China, Suzhou Jiangsu 215123, China;3. Suzhou Industrial Park Centers for Disease Control and Prevention, Suzhou Jiangsu 215123, China
Abstract:To solve the problems of needing labeling a great number of training data and pre-defining relation types in relation extraction methods based on supervised learning, a method for personal relation extraction by constructing the correlation network based on word co-occurrence information and performing graph clustering analysis on the correlation network was proposed. Firstly, 500 highly related person pairs for the research of relation extraction were gotten from the news title data. Secondly, the news data which contained related person pairs were crawled and performed pre-processing, and the keywords in the sentences which contained person pairs were gotten by the Term Frequency-Inverse Document Frequency (TF-IDF). Thirdly, the correlation between the words was acquired by the words co-occurrence information, and the key-words correlation network was constructed. Finally, the personal relations were acquired by the graph clustering analysis on the correlation network. In the relation extraction experiments, compared with the traditional algorithm of Chinese relation extraction based on word co-occurrence and pattern matching technology, the precision, recall and F-score of the proposed method were improved by 5.5, 3.7 and 4.4 percentage points respectively. The experimental results show that the proposed algorithm can effectively extract abundant and high-quality personal relation data from news data without labeling training data.
Keywords:social relation extraction                                                                                                                        co-occurrence statistics                                                                                                                        word correlation                                                                                                                        correlation network                                                                                                                        graph clustering
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号