首页 | 官方网站   微博 | 高级检索  
     

网页结构化信息抽取技术方法研究
引用本文:郝爱峰. 网页结构化信息抽取技术方法研究[J]. 山西电子技术, 2008, 0(4)
作者姓名:郝爱峰
作者单位:忻州师范学院,山西忻州034100
摘    要:分析了两种当前主流的网页结构化信息抽取技术方法:基于模版的分装器方法和不依赖模版的基于视觉的网页信息抽取技术方法,并在此基础上实现了一种新的网页结构化信息抽取算法,一定程度上提高了抽取效率和精度。

关 键 词:垂直搜索引擎  信息抽取  分装器  模版

A Study on Structured Information Extraction Based on Webpage
Hao Ai-feng. A Study on Structured Information Extraction Based on Webpage[J]. Shanxi Electronic Technology, 2008, 0(4)
Authors:Hao Ai-feng
Affiliation:Hao Ai-feng (Xinzhou Teachers University,Xinzhou Shanxi 034100,China)
Abstract:In this paper,two methods of current mainstream structured information extraction are comprehensively analyzed based on Web pages: a method based on sub-loaded template and the method based on the visual information extraction with out template;and a new webpage stuctured information extraction algorithms have achieved on this basis,with which the extraction efficiency and accuracy are increased.
Keywords:vertical search engine  information extraction  wrapper  pattern plate  
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号