多信息块Web页面中的抽取规则 Extraction Rule of MIB Web Page期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

多信息块Web页面中的抽取规则

引用本文：	王庆一,王继成,周源远,袁春风.多信息块Web页面中的抽取规则[J].计算机工程,2003,29(9):42-44,50.

作者姓名：	王庆一王继成周源远袁春风

作者单位：	南京大学软件新技术国家重点实验室,南京大学计算机科学与技术系,南京,210093

基金项目：	国家自然科学基金项目(60073030)，国家高技术研究发展计划“863”计划项目（2001AA114041）

摘要：	以往的包装器主要针对仅含有一个数据块的Web页面，而对含有多个信息块的Web页面，简称MIB(Multiple Information Block)Web页面无法处理。该文提出了一个新的抽取规则，结合了基于文档结构的抽取规则和基于特征Pattern匹配的抽取规则的优点，能够有效地抽取MIB Web页面中的信息。
关键词：	Web 信息抽取包装器抽取规则信息集成
文章编号：	1000-3428(2003)09-0042-03
Extraction Rule of MIB Web Page

WANG Qingyi,WANG Jicheng,ZHOU Yuanyuan,YUAN Chunfeng.Extraction Rule of MIB Web Page[J].Computer Engineering,2003,29(9):42-44,50.

Authors:	WANG Qingyi WANG Jicheng ZHOU Yuanyuan YUAN Chunfeng

Abstract:	The existent wrapper can not correctly extract all the information from such page is called MIB (multiple information block) Web page. A kind of new extraction rule, which combines the advantage of extraction rules based on document structure and extraction rules based on patterns, is introduced to solve the problem.

Keywords:	Web Information extraction Wrapper Extraction rule Information integration
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏