首页 | 官方网站   微博 | 高级检索  
     

基于Web数据挖掘的多因素科技专家信息提取方法
引用本文:朱全银,周培,尹永华,陈浮,刘金岭.基于Web数据挖掘的多因素科技专家信息提取方法[J].淮阴工学院学报,2013(5):23-27.
作者姓名:朱全银  周培  尹永华  陈浮  刘金岭
作者单位:淮阴工学院计算机工程学院,江苏淮安223003
基金项目:国家星火计划项目(2011GA690190); 淮安市科技支撑项目(HAG2011052,HAG2011045,HASZ2012046,HASZ2012050); 淮安市“533英才工程”项目资助
摘    要:由于网页信息呈现的多样性和复杂性,基于Web数据挖掘的信息提取准确率不高。为了提高科技专家Web信息挖掘的正确率,提出一种基于Web数据挖掘的多因素科技专家信息提取方法,对于网页给定统一资源定位符(URL)先进行网页正文提取,综合特征词在网页正文中的位置及特征词与特征词之间的距离构成最短距离匹配方法,抽取科技专家姓名、性别、出生年月、出生地点、职称等信息。实验结果表明,该方法获得了94.43%的查全率和92.34%的准确率,较好地满足了应用需求。

关 键 词:科技专家  网页正文提取  特征词  最短距离匹配

Multivariate Method for Extracting the Basic Information of Experts in Science and Technology Based on Web Mining
Affiliation:ZHU Quan - yin, ZHOU Pei, YIN Yong - hua, CHEN Fu, LIU Jin - ling ( Faculty of Computer Engineering, Huaiyin Institute of Technology, Huai'an Jiangsu 223003, China)
Abstract:The accuracy rate of information extracting by Web mining is not high because of the multiformity and complexity of web pages.In order to increase the accuracy rate of information extracting by Web mining for building a basic information system of experts in science and technology,a novel multivariate extracting method was proposed in this paper.The proposed method extracted web pay by URL first and then integrated the positions of characteristic words in the web page and the shortest-word-distance matching method to extract expert information.The extracted results included the name,sex,birth,home place,professional title of experts and etc.Experiments showed that the accuracy rate and recall rate reached 92.34% and 94.43% respectively indicating that the proposed method could satisfy the application requirements.
Keywords:experts in science and technology  Web mining  characteristic words  shortest distance matching
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号