首页 | 官方网站   微博 | 高级检索  
     

一种有效的字符串有序跳跃模式近似匹配算法
引用本文:沈洲,王永成,刘功申.一种有效的字符串有序跳跃模式近似匹配算法[J].数据采集与处理,2001,16(4):459-465.
作者姓名:沈洲  王永成  刘功申
作者单位:上海交通大学电子信息学院,
基金项目:国家863计划(编号:863-306-ZD03-04-1)资助项目.
摘    要:字符串的模式匹配问题是计算机科学的基本问题之一,而近似模式匹配更是近期的研究热点。本文分析了文本分析领域中出现的一种特殊的近似模式匹配问题,即字符串有序跳跃模式近似匹配问题,提出了一种基于有限自动机的组件组合分析算法。算法的特点在于将组件匹配过程与组配过程进行分离,这样既降低了问题的复杂度,又可以实现按策略组配的灵活性。组件匹配过程中利用有限自动机对跳跃模式的组件进行匹配查找;组件的组配过程中先对查找到的组件进行组合分析,然后再对各种组合进行初步筛选和基于策略的优选。初步筛选工作是依据顺序性、唯一性和最大数三条原则进行;而优选工作是根据四个设计的评价参数选择其中最佳组合。实验结果表明,该算法的确能解决字符串有序跳跃模式匹配问题,完全可以适用于句型匹配与主题词跳词匹配。

关 键 词:有限状态自动机  字符串  模式匹配  计算机  有序跳跃模式
文章编号:1004-9037(2001)04-0459-07
修稿时间:2000年10月20

Efficient Skip-Pattern Matching Algorithm for Approximate String Sequential Problem
Abstract:The string pattern matching is always viewed as a fundamental problem in computer science. Furthermore, the approximate pattern matching has attracted more interest recently and now becomes a new research field. This paper analyzes a kind of special approximate pattern matching problem i.e. sequential skip pattern matching in the field of text analysis. A component combinatorial analysis algorithm based on the deterministic state automata is presented. The main feature of the algorithm is that the process of component matching and assembling is separated. So, it not only decrease the complexity of problem, but also provide the flexible solution for assembling component depended on policy. The automata is used to search the components of the patterns in the process of matching component. In the process of assembling the components, combinatorial analysis is done firstly for the found components, and preliminary selection and further selection depended on po licy are made secondly. The preliminary selection is based on three principles: sequential rule, sole rule, and maximum occurrence number rule. And the further selection is based four parameters of evaluation to select the best result. Experiments demonstrate that the algorithm can solve such approximate string matching problem, and it can be used in the filed of sentence pattern matching and skip word matching.
Keywords:matching  approximate pattern matching  combinational analysis  deterministic state automata
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号