首页 | 官方网站   微博 | 高级检索  
     

一种改进的Wu-Manber多模式匹配算法及应用
引用本文:孙晓山,王强,关毅,王晓龙.一种改进的Wu-Manber多模式匹配算法及应用[J].中文信息学报,2006,20(2):49-54.
作者姓名:孙晓山  王强  关毅  王晓龙
作者单位:哈尔滨工业大学计算机学院
基金项目:中国科学院资助项目;哈尔滨工业大学校科研和教改项目
摘    要:本文针对Wu-Manber多模式匹配算法在处理后缀模式情况下的不足,给出了一种改进的后缀模式处理算法,减少了匹配过程中字符比较的次数,提高了算法的运行效率。本文在随机选择的TREC2000的52,067篇文档上进行了全文检索实验, 对比了Wu-Manber算法、使用后缀模式的改进算法、不使用后缀模式的简单改进等三种算法的匹配过程中字符比较的次数。实验结果说明,本文的改进能够比较稳定的减少匹配过程中字符比较的次数,提高匹配的速度和效率。

关 键 词:计算机应用  中文信息处理  多模式匹配  后缀模式  字符串匹配  全文检索  信息检索  
文章编号:1003-0077(2006)02-0047-06
收稿时间:2005-01-18
修稿时间:2005-05-17

An Improved Wu-Manber Multiple-pattern Matching Algorithm and Its Application
SUN Xiao-shan,WANG Qiang,GUAN Yi,WANG Xiao-long.An Improved Wu-Manber Multiple-pattern Matching Algorithm and Its Application[J].Journal of Chinese Information Processing,2006,20(2):49-54.
Authors:SUN Xiao-shan  WANG Qiang  GUAN Yi  WANG Xiao-long
Affiliation:School of Computer Science and Technology , Harbin Institute of Technology
Abstract:The Wu-Manber multiple-pattern matching algorithm does not work well when some patterns are suffix of other patterns.To solve the problem,an improved algorithm is introduced which reduces the number of comparisons during pattern matching and leads to a faster matching algorithm.The text retrieval experiments use 52,067 passages which are randomly selected from TREC2000.Three algorithms including the Wu-Manber algorithm,the improved algorithm and the algorithm simply breaks halfway,are compared and the results show that the improved algorithm can steadily reduce the number of character comparisons and thus work more efficiently.
Keywords:computer application  Chinese information processing  multiple-pattern matching  suffix pattern  string matching  full text retrieval  information retrieval
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号