基于标记树表示方法的页面结构分析 Web Page Structure Analysis Based on Tag Tree Method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于标记树表示方法的页面结构分析

引用本文：	常育红,姜哲,朱小燕.基于标记树表示方法的页面结构分析[J].计算机工程与应用,2004,40(16):129-132.

作者姓名：	常育红姜哲朱小燕

作者单位：	1. 北京九州公司,北京,100081;清华大学计算机科学与技术系,北京,100084;清华大学智能技术与系统国家重点实验室,北京,100084 2. 清华大学计算机科学与技术系,北京,100084;清华大学智能技术与系统国家重点实验室,北京,100084

摘要：	页面内容结构分析在WEB信息检索、分类和抽取等方面有重要作用。文章从页面布局和内容之间关系出发,根据WEB文件中标记之间关系,用标记树表示页面文件,采用自底向上的算法,抽取出具有不同语义的页面内容,提出用树形层次结构表示它们之间关系的方法。在此基础上,通过模仿人们浏览页面的习惯,成功地将其应用于页面的计算机屏读系统,实现自动朗读页面主题的功能。
关键词：	WEB页面布局页面结构信息抽取
文章编号：	1002-8331-(2004)16-0129-04
Web Page Structure Analysis Based on Tag Tree Method

Chang Yuhong , Jiang Zhe , Zhu Xiaoyan.Web Page Structure Analysis Based on Tag Tree Method[J].Computer Engineering and Applications,2004,40(16):129-132.

Authors:	Chang Yuhong Jiang Zhe Zhu Xiaoyan

Affiliation:	Chang Yuhong 1,2,3 Jiang Zhe 2,3 Zhu Xiaoyan 2,31

Abstract:	WEB page content structure is very helpful for applications such as information retrieval,classification,information extraction etc.This paper analyzes the structure of WEB page according to the relation between layout and content.This paper uses tag tree to denote the WEB content and presents a down-top approach to extract the different semantic contents of page,and brings forward a tree structure to present the relation of them.At last it applies it successfully in screen reader for reading WEB pages according to simulating the habit of person browsing WEB page.

Keywords:	WEB page layout WEB page structure information extract
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏