MatchLink：一种主题爬行方法 MatchLink: A Focused Crawling Method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

MatchLink：一种主题爬行方法

引用本文：	蒋宗礼,鲁国相.MatchLink：一种主题爬行方法[J].北京工业大学学报,2007,33(11):1227-1232.

作者姓名：	蒋宗礼鲁国相

作者单位：	北京工业大学,计算机学院,北京,100022

摘要：	为了在浩如烟海的Web信息中更快地找到用户关心的信息,提出了一种主题爬行方法——MatchLink,它通过文档向量模型来评估网页链接的主题相关度,通过朴素贝叶斯算法和多层分类的方法计算链接所在网页的主题相关度,并根据这2个相关度优先下载主题相关的页面,实验表明其结果好于BestFirst和BreadthFirst。
关键词：	主题爬行器文档向量模型朴素贝叶斯
文章编号：	0254-0037（2007）11-1227-06
收稿时间：	2006-08-31
MatchLink: A Focused Crawling Method

JIANG Zong-li,LU Guo-xiang.MatchLink: A Focused Crawling Method[J].Journal of Beijing Polytechnic University,2007,33(11):1227-1232.

Authors:	JIANG Zong-li LU Guo-xiang

Abstract:	How to find what a user wants in tremendous amount of Web information is a great challenge to web search engine.By focusing downloading web pages on a given domain,focused crawlers can save a great deal of works and improve the quality of the information they provide.We put forward a method of focused crawling--MatchLink.It uses document vector model to evaluate topic relevance of the anchor and uses Naive Bayes algorithm and multilayer classification method to compute the topic relevance of the web page containing the anchor.According to these.two relevaneies,topic relevant web pages have prior claim to be downloaded.Experiment shows that the result is better than BestFirst and BreadthFirst.

Keywords:	search engines document handling Naive Bayes methods
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏