基于信息关联拓扑的互联网社交关系挖掘 Info-association topology based social relationship mining on Internet期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于信息关联拓扑的互联网社交关系挖掘

引用本文：	刘锦文,邢凯,芮伟康,张利萍,周慧.基于信息关联拓扑的互联网社交关系挖掘[J].计算机应用,2016,36(7):1875-1880.

作者姓名：	刘锦文邢凯芮伟康张利萍周慧

作者单位：	1. 中国科学技术大学计算机科学与技术学院, 合肥 230022;2. 中国科学技术大学苏州研究院, 江苏苏州 215123;3. 苏州工业园区疾病防治中心, 江苏苏州 215123

基金项目：	国家自然科学基金资助项目（61332004），苏州市科技计划项目产业技术创新专项（民生科技）（SS201509）。

摘要：	针对目前基于监督学习的关系抽取方法需要标注大量训练数据和预先定义关系类型，提出了一种基于词语共现信息构建关联网络并在关联网络上进行图聚类分析的人物关系提取方法。首先，从新闻标题数据获得关联度较高的500个人物对用于关系抽取研究；然后，抓取关联人物对所在新闻数据，对其进行预处理，并利用词频-逆向文档频率（TF-IDF）得到人物对共现句子中的关键词；其次，基于词语共现信息得到词语之间的关联，进而建立关键词关联网络；最后，利用对关联网络进行图聚类分析以获得人物关系。在关系抽取的实验中，与传统基于词语共现和模式匹配的中文实体关系提取方法相比，所提方法在准确率、召回率和平衡F分数（F-score）上分别提升了5.5，3.7和4.4个百分点。实验结果表明，所提算法能够在没有标注训练数据的条件下，有效地从新闻数据中抽取丰富且高质量的人物关系数据。
关键词：	社会关系抽取共现统计词语关联度关联网络图聚类
收稿时间：	2016-01-25
修稿时间：	2016-02-29
Info-association topology based social relationship mining on Internet

LIU Jinwen,XING Kai,RUI Weikang,ZHANG Liping,ZHOU Hui.Info-association topology based social relationship mining on Internet[J].journal of Computer Applications,2016,36(7):1875-1880.

Authors:	LIU Jinwen XING Kai RUI Weikang ZHANG Liping ZHOU Hui

Affiliation:	1. School of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230022, China;2. Suzhou Institute of Advanced Study, University of Science and Technology of China, Suzhou Jiangsu 215123, China;3. Suzhou Industrial Park Centers for Disease Control and Prevention, Suzhou Jiangsu 215123, China

Abstract:	To solve the problems of needing labeling a great number of training data and pre-defining relation types in relation extraction methods based on supervised learning, a method for personal relation extraction by constructing the correlation network based on word co-occurrence information and performing graph clustering analysis on the correlation network was proposed. Firstly, 500 highly related person pairs for the research of relation extraction were gotten from the news title data. Secondly, the news data which contained related person pairs were crawled and performed pre-processing, and the keywords in the sentences which contained person pairs were gotten by the Term Frequency-Inverse Document Frequency (TF-IDF). Thirdly, the correlation between the words was acquired by the words co-occurrence information, and the key-words correlation network was constructed. Finally, the personal relations were acquired by the graph clustering analysis on the correlation network. In the relation extraction experiments, compared with the traditional algorithm of Chinese relation extraction based on word co-occurrence and pattern matching technology, the precision, recall and F-score of the proposed method were improved by 5.5, 3.7 and 4.4 percentage points respectively. The experimental results show that the proposed algorithm can effectively extract abundant and high-quality personal relation data from news data without labeling training data.

Keywords:	social relation extraction co-occurrence statistics word correlation correlation network graph clustering

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏