基于结构与内容的Web主要信息提取方法研究 Research on main web information extraction based on structure and content期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于结构与内容的Web主要信息提取方法研究

引用本文：	张文东,李伟.基于结构与内容的Web主要信息提取方法研究[J].计算机工程与设计,2008,29(24).

作者姓名：	张文东李伟

作者单位：	中国石油大学,计算机与通信工程学院,山东东营257061

摘要：	Web页面的主要信息被广告、超链等无用信息包围,是Web信息自动处理所要解决的难题.传统的信息提取方法是从内容着手,或者从结构出发,很少将两者相结合,因此提出了一种Web主要信息提取方法.该方法可以从Web页面的结构和内容两方面出发,准确地将Web内容进行分块,并对分块内容进行分析处理,从而提取出Web页面的主要信息.
关键词：	Web页面内容结构分块信息提取
Research on main web information extraction based on structure and content

ZHANG Wen-dong,LI Wei.Research on main web information extraction based on structure and content[J].Computer Engineering and Design,2008,29(24).

Authors:	ZHANG Wen-dong LI Wei

Affiliation:	ZHANG Wen-dong,LI Wei(Institute of Computer , Communication Engineering,China University of Petroleum,Dongying 257061,China)

Abstract:	The main web information is usually surrounded by advertisings,hyperlinks and other useless information.It is a main problem for the automatic processing of web information.The traditional method of main web information extraction is either based on content or on structure,rarely both.A method for extracting main web information based on structure and content is presented.It can first block the web content accurately,and then analyze the blocks,lastly extract the main web information.

Keywords:	web pages content structure blocking information extraction
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏