首页 | 官方网站   微博 | 高级检索  
     

一种轻量级中文搜索引擎模型的设计与实现
引用本文:黄宇达,魏霞,王迤冉.一种轻量级中文搜索引擎模型的设计与实现[J].计算机技术与发展,2012(9):201-204,209.
作者姓名:黄宇达  魏霞  王迤冉
作者单位:1. 西南科技大学 计算机科学与技术学院,四川 绵阳 621010
2. 周口职业技术学院 信息工程系,河南 周口 466000
3. 周口师范学院 计算机科学与技术学院,河南 周口 466001
基金项目:河南省科技基础与前沿技术研究计划项目(112300410307)
摘    要:首先详细介绍了一种建构在PC Windows平台上的轻量级中文搜索引擎系统模型的总体设计,然后采用基于多线程技术的广度优先遍历法及最大匹配法和最小匹配法相结合的中文分词法等技术进行了各个主要功能模块的具体设计和实现,对模型进行了基于多线程的网络爬虫、用户接口等测试。测试实验结果表明:构建并实现的轻量级中文搜索引擎系统模型能较好地实现一个简单中文搜索引擎所具有的基本功能,系统界面简单实用,具有较高的资源检索率并能够保证检索结果的准确性。

关 键 词:网络爬虫  URL库  中文分词  倒排文件索引  多线程

Design and Implementation of System Model of a Lightweight Chinese Search Engine
HUANG Yu-da,WEI Xia,WANG Yi-ran.Design and Implementation of System Model of a Lightweight Chinese Search Engine[J].Computer Technology and Development,2012(9):201-204,209.
Authors:HUANG Yu-da  WEI Xia  WANG Yi-ran
Affiliation:1. College of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621010, China; 2. Information and Engineering Department,Zhoukou Vocational and Technical College, Zhoukou 466000, China; 3. College of Computer Science and Technology,Zhoukou Normal University, Zhoukou 466001, China)
Abstract:First described in detail the overall design of the lightweight Chinese search engine system model based on PC Windows plat- form, and then the major functional blocks were designed and realized by using breadth-first traversal method based on multi-threading technology and the Chinese sub-lexical method of the combination of the maximum matching method and the minimum matching method and other technology ,then carded out some tests based on multi-threaded Web crawler and user interface on the model. Experimental results show:the lightweight Chinese search engine system built and realized is able to achieve the basic functions of a simple Chinese search engine and good operating results, the system interface is simple and practical, with higher rates of resource retrieval and to ensure the accuracy of search results.
Keywords:Web crawler  URL library  Chinese word segmentation  inverted file index  multi-threaded
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号