Web站点层次结构抽取算法的分析和实现 Analysis and implementation of extraction algorithm of Web hierarchy structure期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Web站点层次结构抽取算法的分析和实现

引用本文：	冯雁,王申康.Web站点层次结构抽取算法的分析和实现[J].浙江大学学报(自然科学版 ),2005,39(10):1507-1501.

作者姓名：	冯雁王申康

作者单位：	冯雁，王申康（浙江大学计算机学院，浙江杭州 310027 ）

摘要：	为了提高搜索引擎、网站管理及推荐系统的运行效率，提出了一种重构网站层次结构的方法，该方法以人工智能及图论为基础，通过对标记信息、网站的目录信息以及链接信息等的分析，定义和建立了网站的数据模型：结构标记图，并采用最短路径算法（Dijkstral），完成Web站点的层次结构抽取.算法体系由5层构成：显示层、网站层、页面分析层、预处理层和连接层.实验结果证明该方法能正确地建立网站的层次结构，并具有较快的运行时间.
关键词：	Web 结构挖掘标记图目录信息
文章编号：	1008-973X（2005）10-1507-05
收稿时间：	2004-05-16
修稿时间：	2004年5月16日
Analysis and implementation of extraction algorithm of Web hierarchy structure

FENG Yan,WANG Shen-kang.Analysis and implementation of extraction algorithm of Web hierarchy structure[J].Journal of Zhejiang University(Engineering Science),2005,39(10):1507-1501.

Authors:	FENG Yan WANG Shen-kang

Affiliation:	College of Computer Science, Zhejiang University , Hangzhou 310027, China

Abstract:	A method for rebuilding Web hierarchy structure in order to increase the efficiency of search engine,Web management and recommender system,etc.was proposed.By analyzing the structural information,such as "tag" information,directory information and link information the Web tag structure graph was defined and built on the basis of artificial intelligence and graph theory.The Dijkstral algorithm was applied to extract the hierarchy of the Web site.The algorithm structure is composed of five layers: display layer,Web layer,page analysis layer,pretreatment layer and link layer.Experimental results show that the algorithm can be implemented with great efficiency and speed,and that the Web hierarchy structure is correct.

Keywords:	Web
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
	点击此处可从《浙江大学学报(自然科学版 )》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏