首页 | 官方网站   微博 | 高级检索  
     

基于Lucene的中文全文检索系统的研究与设计
引用本文:索红光,孙鑫.基于Lucene的中文全文检索系统的研究与设计[J].计算机工程与设计,2008,29(19).
作者姓名:索红光  孙鑫
作者单位:中国石油大学(华东)计算机与通信工程学院,山东,东营,257061
摘    要:提出了一种基于Lucene的中文全文检索系统模型.通过分析Lucene的系统结构,系统采用了基于统计的网页正文提取技术,并且加入了中文分词模块和索引文档预处理模块来提高检索系统的效率和精度.在检索结果的处理上,采用文本聚类的办法,使检索结果分类显示,提高了用户的查找的效率.实验数据表明,该系统在检索中文网页时,在效率,精度和结果处理等方面性能明显提高.

关 键 词:全文检索  网页正文提取  中文分词模块  索引文档预处理  文本聚类

Research and development of Chinese full text search engine based on Lucene
SUO Hong-guang,SUN Xin.Research and development of Chinese full text search engine based on Lucene[J].Computer Engineering and Design,2008,29(19).
Authors:SUO Hong-guang  SUN Xin
Affiliation:SUO Hong-guang,SUN Xin(College of Computer , Communication Engineering,China University of Petroleum(East China),Dongying 257061,China)
Abstract:A system model for Chinese full text search engine based on Lucene is proposed.In order to improve the performance of Lucene system in searching Chinese web pages,the technique of web page text extraction based on statistics,Chinese word segmentation module and documents for indexing pretreatment module are added into the system by analyzing the structure of Lucene.In order to im-prove the efficiency of searching information people needed,document clustering is applied in processing the searching results.Th...
Keywords:full text search  web page text extraction  Chinese word segmentation  documents for indexing pretreatment  document clustering  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号