基于关键词相关度的Deep Web爬虫爬行策略 Crawlers Crawling Strategy of Deep Web Based on Keywords Relevant Weight期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于关键词相关度的Deep Web爬虫爬行策略

引用本文：	田野,丁岳伟.基于关键词相关度的Deep Web爬虫爬行策略[J].计算机工程,2008,34(15):220-222.

作者姓名：	田野丁岳伟

作者单位：	上海理工大学计算机工程学院,上海,200093

摘要：	Deep Web蕴藏丰富的、高质量的信息资源，为了获取某Deep Web站点的页面，用户不得不键入一系列的关键词集。由于没有直接指向Deep Web页面的静态链接，目前大多数搜索引擎不能发现这些页面。该文提出的Deep Web爬虫爬行策略，可以有效地下载Deep Web页面。由于该页面只提供一个查询接口，因此Deep Web爬虫设计面对的主要挑战是怎样选择最佳的查询关键词产生有意义的查询。实验证明文中提出的一种基于不同关键词相关度权重的选择方法是有效的。
关键词：	Deep Web页面爬行策略关键词选择相关度权重覆盖率
Crawlers Crawling Strategy of Deep Web Based on Keywords Relevant Weight

TIAN Ye,DING Yue-wei.Crawlers Crawling Strategy of Deep Web Based on Keywords Relevant Weight[J].Computer Engineering,2008,34(15):220-222.

Authors:	TIAN Ye DING Yue-wei

Affiliation:	(Institute of Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093)

Abstract:	There is plenty high-quality information in Deep Web, but user has to input several keywords to search and reach the pages of Deep Web. Traditional crawlers cannot get to the Hidden Web pages because there are no direct links to pages of Deep Web. This paper presents a crawling strategy that can download the pages of Deep Web effectively. As the result of the only interface that Deep Web provides, the biggest challenge for Deep Web crawler is how to choose the best keywords to query effectively. This paper brings forward a new selecting method that based on the relevant weight of different keywords. The experiment shows that this method is efficient.

Keywords:	Deep Web crawling strategy keywords selection relevant weight covering rate
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏