基于XML的网页信息提取系统的研究与设计 Study and Design of Network Page Information Extraction System Based on XML期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于XML的网页信息提取系统的研究与设计

引用本文：	杨成. 基于XML的网页信息提取系统的研究与设计[J]. 数字社区&智能家居, 2009, 0(26)

作者姓名：	杨成

作者单位：	上海交通大学;

摘要：	该文提出了一种面向由XML描述的Web文档的基于用户主题信息的模式和数据抽取方法,它利用学习算法从样本文档中提取规则,然后使用匹配算法从目标文档中抽取出数据。该文使用一种改进的解析方法对XML文档进行解析,在模式抽取时使用了顺序覆盖算法从样本XML文档集中训练出模式。在数据抽取算法中,数据抽取算法从解析后的XML文档树中寻找用户所需的信息,它可以高效、准确地找到用户所需数据。
关键词：	XML 数据抽取文档解析
Study and Design of Network Page Information Extraction System Based on XML

YANG Cheng. Study and Design of Network Page Information Extraction System Based on XML[J]. Digital Community & Smart Home, 2009, 0(26)

Authors:	YANG Cheng

Affiliation:	YANG Cheng (Shanghai Jiaotong University,Shanghai 200240,China)

Abstract:	In this paper,a kind of model and data extraction method based on user theme-oriented information facing Web document de-scribed by XML was brought forward,it extracted the rule from the sample document using learning algorithm,then extracted data from the target document using matching algorithm. In this paper,an improved resolution method was used to resolve XML document,at mode extraction the sequence covering algorithm was used to train out mode from sample XML document collections. In data extraction a...

Keywords:	XML data extract document parase
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏