基于有限状态自动机提取不规范表结构Web信息 Unregulated table structure Web information extraction based on finite state automata期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于有限状态自动机提取不规范表结构Web信息

引用本文：	李石君,欧伟杰,简伟,黄河.基于有限状态自动机提取不规范表结构Web信息[J].武汉大学学报(工学版),2005,38(6):128-132.

作者姓名：	李石君欧伟杰简伟黄河

作者单位：	1. 武汉大学计算机学院,湖北,武汉430072;中国科学院计算机科学重点实验室,北京100080 2. 武汉大学计算机学院,湖北,武汉430072

基金项目：	国家自然科学基金项目资助(No.60273072)，国家高技术研究发展计划(863)项目(No.2002AA423450)资助

摘要：	大量的不规范表结构信息是当前Web信息提取所必须解决的问题.在现有方法基础上,给出了归纳学习相邻属性间上下文规则集算法,提出了以Web页为粒度的属性转换机和有限状态自动机包装器概念,最后介绍了采用有限状态自动机包装器提取不规范表结构Web信息的算法.
关键词：	信息提取上下文规则集有限状态自动机
文章编号：	1671-8844(2005)06-128-05
修稿时间：	2005年6月22日
Unregulated table structure Web information extraction based on finite state automata

LI Shi-jun,OU Wei-jie,JIAN Wei,HUANG He.Unregulated table structure Web information extraction based on finite state automata[J].Engineering Journal of Wuhan University,2005,38(6):128-132.

Authors:	LI Shi-jun OU Wei-jie JIAN Wei HUANG He

Affiliation:	LI Shi-jun~

Abstract:	lots of unregulated table structure information currently come to be the unavoidable issue of Web information extraction.Based on the existing method,a further research about inducing contextual rules of adjoining attributes has been done;and a new conception of the finite state automata wrapper and attributes transducer,whose granularity is Web pages,is presented.Finally,the algorithms for the unregulated table structure Web information extraction by finite state automata wrapper,are introduced.

Keywords:	information extraction contextual rules finite state automata(FSA)
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏