A robust web personal name information extraction system期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A robust web personal name information extraction system

Authors:	Ying Chen Sophia Yat Mei Lee Chu-Ren Huang

Affiliation:	^a College of Information and Electrical Engineering, China Agricultural University, PR China ^b Language Centre, Hong Kong Baptist University, Hong Kong ^c Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong

Abstract:	Personal information extraction, which extracts the persons in question and their related information (such as biographical information and occupation) from web, is an important component to construct social network (a kind of semantic web). For this practical task, two important issues are to be discussed: personal named entity ambiguity and the extraction of personal information for a specific person. For personal named entity ambiguity, which is a common phenomenon in the fast growing web resource, we propose a robust system which extracts lightweight features with a totally unsupervised approach from broad resources. The experiments show that these lightweight features not only improve the performances, but also increase the robustness of a disambiguation system. To extract the information of the focus person, an integrated system is introduced, which is able to effectively re-use and combine current well-developed tools for web data, and at the same time, to identify the expression properties of web data. We show that our flexible extraction system achieves state-of-the-art performances, especially the high precision, which is very important for real applications.

Keywords:	Information extraction Named entity disambiguation Attribute extraction Relation extraction
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏