一种基于页面Block的Web信息提取方法 A Web Information Extraction Algorithm Based on Web Page期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于页面Block的Web信息提取方法

引用本文：	蒙韧,邵延振,袁鼎荣.一种基于页面Block的Web信息提取方法[J].微机发展,2010(1):197-200.

作者姓名：	蒙韧邵延振袁鼎荣

作者单位：	广西师范大学;

基金项目：	广西自然科学基金(桂科自0640069)

摘要：	基于页面结构的信息提取是Web数据挖掘中三大研究领域之一。该研究的关键技术是如何识别Web页面的组织形式,从中挖掘所需要的页面信息。文中基于页面的语义分块（Block）给出一个新的块主题提取算法,与传统的以页面为单位的Web信息提取相比,更符合实际情况,粒度优势明显。该算法针对页面中不同分块的重要性给予不同的权值,依据权值大小取舍页面信息提供给用户。针对该算法进行了模拟实验,从实验结果可以看出该算法具有一定的实用性和有效性。
关键词：	语义Block Block权值 Block主题提取 Web信息挖掘
A Web Information Extraction Algorithm Based on Web Page

MENG Ren,SHAO Yan-zhen,YUAN Ding-rong.A Web Information Extraction Algorithm Based on Web Page[J].Microcomputer Development,2010(1):197-200.

Authors:	MENG Ren SHAO Yan-zhen YUAN Ding-rong

Affiliation:	MENG Ren,SHAO Yan-zhen,YUAN Ding-rong(Guangxi Normal University,Guilin 541004,China)

Abstract:	Information extraction based web page structure is one of three web data mining's research fields.Key technology of the research is how to recognize web page's organization form and mine the needed information.Intrduces a new block topic-extracted algorithm based on semantic block.Compared with traditional information extraction based on web page,it is more accordant to the fact and the advantage of granularity is evident.This algorithm gives different block weight values according to the importance of diff...

Keywords:	semantic block block weight block topic extraction web data mining
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏