首页 | 官方网站   微博 | 高级检索  
     

基于移动爬虫的专用Web信息收集系统的设计
引用本文:潘春华,冯太明,武港山.基于移动爬虫的专用Web信息收集系统的设计[J].计算机工程与应用,2003,39(36):153-156.
作者姓名:潘春华  冯太明  武港山
作者单位:南京大学计算机科学与技术系,南京,210093
基金项目:国家自然科学基金资助(编号:60073030),国家教育部“现代远程教育关键技术研究重点项目”资助,富士通研究项目资助
摘    要:搜索引擎已经成为网上导航的重要工具。为了能够提供强大的搜索能力,搜索引擎对网上可访问文档维持着详尽的索引。创建和维护索引的任务由网络爬虫完成,网络爬虫代表搜索引擎递归地遍历和下载Web页面。Web页面在下载之后,被搜索引擎分析、建索引,然后提供检索服务。文章介绍了一种更加有效的建立Web索引的方法,该方法是基于移动爬虫(MobileCrawler)的。在此提出的爬虫首先被传送到数据所在的站点,在那里任何不需要的数据在传回搜索引擎之前在当地被过滤。这个方法尤其适用于实施所谓的“智能”爬行算法,这些算法根据已访问过的Web页面的内容来决定一条有效的爬行路径。移动爬虫是移动计算和专业搜索引擎两大技术趋势的结合,能够从技术上很好地解决现在通用搜索引擎所面临的问题。

关 键 词:信息收集  搜索引擎  移动爬虫  WWW
文章编号:1002-8331-(2003)36-0153-04
修稿时间:2002年12月1日

Design of a Specific Web Information-Collecting System Based on Mobile Crawler
Pan Chunhua Feng Taiming Wu,Gangshan.Design of a Specific Web Information-Collecting System Based on Mobile Crawler[J].Computer Engineering and Applications,2003,39(36):153-156.
Authors:Pan Chunhua Feng Taiming Wu  Gangshan
Abstract:Search engines have become important tools for Web navigation.In order to provide powerful search facili-ties,search engines maintain comprehensive indices of documents available on the Web.The creation and maintenance of Web indices is done by Web crawlers,which recursively traverse and download Web pages on behalf of search engines.Analysis of the collected information is performed after the data has been downloaded.This paper presents an alterna-tive,more efficient approach to building Web indices based on mobile crawlers.The proposed crawlers are transferred to the source(s)where the data resides in order to filter out any unwanted data locally before transferring it back to the search engine.Our approach to Web crawling is particularly well suited for implementing so-called″smart″crawling al-gorithms which determine an efficient crawling path based on the contents of Web pages that have been visited so far.Mobile crawler is the result of the two technology tendencies,specific search engine and mobile computing,it promises to solve the difficult issues faced by current general search engines.
Keywords:Information-gathering  Search engine  Mobile crawler  World Wide Web
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号