首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
近年来, XML数据查询成为一个重要的研究课题。处理小枝查询是XML查询实现的核心操作,针对小枝模式查询,提出了一种改进的小枝模式匹配算法。该算法通过剪去无用的数据流以减少待处理结点的数目,从而节省处理时间,提高查询的准确率。实验结果表明,该算法能够有效提高查询效率。  相似文献   

2.
A common problem of XML query algorithms is that execution time and input size grows rapidly as the size of XML document increases. In this paper, we propose a version-labeling scheme and TwigVersion algorithm to address this problem. The version-labeling scheme is utilized to identify all repetitive structures in XML documents, and the Version Tree is constructed to hold such version information. To process a query, TwigVersion generates a filter through the created Version Tree, and the final answer to the query can be retrieved from the database easily through the filtering process. Both theoretical proof and experimental results reported in this paper demonstrate that the concise structure of Version Tree and the reduced input size make TwigVersion outperform the existing approaches.  相似文献   

3.
姚全珠  郭祯  房美君 《计算机应用》2011,31(10):2782-2785
给定一个小枝模式查询,如何快速地在XML数据集中找到所有感兴趣的信息,已成为当前研究的热点。针对TwigStack算法在处理含有父子节点的情况下会产生大量的中间结果等问题,通过栈来对非叶子节点缓存和对叶子节点延迟输出的思想,提出了一种改进的小枝模式匹配算法--cTwigStack。采用Treebank数据集进行测验,结果表明该算法不仅仅在处理祖孙/后继节点时能使输出结果的准确性达到最优,而且在处理父子节点时,相对目前提出的算法,也是非常高效的。  相似文献   

4.
针对目前不确定XML小枝模式匹配算法均基于归并,易造成很大的空间和时间浪费问题,提出基于P-文档模型的连续不确定XML的非归并的小枝模式匹配算法.算法在节点入队列和出队列时分别进行过滤剪枝操作,减少待处理节点的个数,匹配过程使用相互关联的链表存储中间结果,不需要归并.理论分析与实验结果表明,该算法是一种高效的连续不确定XML查询算法.  相似文献   

5.
In this paper we present a novel methodology for sequence classification, based on sequential pattern mining and optimization algorithms. The proposed methodology automatically generates a sequence classification model, based on a two stage process. In the first stage, a sequential pattern mining algorithm is applied to a set of sequences and the sequential patterns are extracted. Then, the score of every pattern with respect to each sequence is calculated using a scoring function and the score of each class under consideration is estimated by summing the specific pattern scores. Each score is updated, multiplied by a weight and the output of the first stage is the classification confusion matrix of the sequences. In the second stage an optimization technique, aims to finding a set of weights which minimize an objective function, defined using the classification confusion matrix. The set of the extracted sequential patterns and the optimal weights of the classes comprise the sequence classification model. Extensive evaluation of the methodology was carried out in the protein classification domain, by varying the number of training and test sequences, the number of patterns and the number of classes. The methodology is compared with other similar sequence classification approaches. The proposed methodology exhibits several advantages, such as automated weight assignment to classes using optimization techniques and knowledge discovery in the domain of application.
Dimitrios I. FotiadisEmail:
  相似文献   

6.
This paper describes a method for robust real time pattern matching. We first introduce a family of image distance measures, the "Image Hamming Distance Family". Members of this family are robust to occlusion, small geometrical transforms, light changes and non-rigid deformations. We then present a novel Bayesian framework for sequential hypothesis testing on finite populations. Based on this framework, we design an optimal rejection/acceptance sampling algorithm. This algorithm quickly determines whether two images are similar with respect to a member of the Image Hamming Distance Family. We also present a fast framework that designs a near-optimal sampling algorithm. Extensive experimental results show that the sequential sampling algorithm performance is excellent. Implemented on a Pentium 4 3 GHz processor, detection of a pattern with 2197 pixels, in 640 x 480 pixel frames, where in each frame the pattern rotated and was highly occluded, proceeds at only 0.022 seconds per frame.  相似文献   

7.
G. Davies  S. Bowsher 《Software》1986,16(6):575-601
This paper describes four algorithms of varying complexity used for pattern matching, and investigates their behaviour. The algorithms are tested using patterns of varying length from several alphabets. It is concluded that although there is no overall ‘best’ algorithm, the more complex algorithms are worth considering as they are generally more efficient in terms of number of comparisons made and execution time.  相似文献   

8.
介绍了 twig pattern查询处理和索引技术的研究现状 ,对一些典型的 twig pattern查询处理方法进行了分析和评价 ,指出其中存在的优点和不足 ,展望了未来 twig pattern查询处理研究的关键问题和研究方向。  相似文献   

9.
As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this featu...  相似文献   

10.
李素清  陶世群 《计算机应用》2007,27(12):3021-3025
XML已经成为Internet上一种普遍的数据交换标准,目前已经出现了多种对XML文档的查询方法。针对小枝模式的XML查询,提出了一种改进的小枝栈算法。该算法将路径栈算法的思想应用到它的主算法中实现了小枝模式查询。与仅使用路径栈算法相比,改进后的小枝栈算法在运行过程中不会产生中间结果,而且提高了找到小枝模式根元素后的查询效率。  相似文献   

11.
Humans are fascinated by patterns, and they can spot them well - in fact, that's one area where humans excel over computers. But research is producing interesting competition as scientists discover and employ new methods of automated pattern recognition. Practical applications include finding genes, detecting cancer-causing chemicals in molecules, searching out potential terrorists, and predicting terrorist threat levels, as well as recognizing speech patterns and creating a nanotechnology resource library.  相似文献   

12.
This paper investigates the correspondence matching of point-sets using spectral graph analysis. In particular, we are interested in the problem of how the modal analysis of point-sets can be rendered robust to contamination and drop-out. We make three contributions. First, we show how the modal structure of point-sets can be embedded within the framework of the EM algorithm. Second, we present several methods for computing the probabilities of point correspondences from the modes of the point proximity matrix. Third, we consider alternatives to the Gaussian proximity matrix. We evaluate the new method on both synthetic and real-world data. Here we show that the method can be used to compute useful correspondences even when the level of point contamination is as large as 50%. We also provide some examples on deformed point-set tracking.  相似文献   

13.
In secure delegatable computation, computationally weak devices (or clients) wish to outsource their computation and data to an untrusted server in the cloud. While most earlier work considers the general question of how to securely outsource any computation to the cloud server, we focus on concrete and important functionalities and give the first protocol for the pattern matching problem in the cloud. Loosely speaking, this problem considers a text T that is outsourced to the cloud \({\textsc {S}}\) by a sender \({\textsc {SEN}}\). In a query phase, receivers \({\textsc {REC}}_1, \ldots , {\textsc {REC}}_l\) run an efficient protocol with the server \({\textsc {S}}\) and the sender \({\textsc {SEN}}\) in order to learn the positions at which a pattern of length m matches the text (and nothing beyond that). This is called the outsourced pattern matching problem which is highly motivated in the context of delegatable computing since it offers storage alternatives for massive databases that contain confidential data (e.g., health-related data about patient history). Our constructions are simulation-based secure in the presence of semi-honest and malicious adversaries (in the random oracle model) and limit the communication in the query phase to O(m) bits plus the number of occurrences—which is optimal. In contrast to generic solutions for delegatable computation, our schemes do not rely on fully homomorphic encryption but instead use novel ideas for solving pattern matching, based on a reduction to the subset sum problem. Interestingly, we do not rely on the hardness of the problem, but rather we exploit instances that are solvable in polynomial time. A follow-up result demonstrates that the random oracle is essential in order to meet our communication bound.  相似文献   

14.
一种快速的单模式匹配算法   总被引:4,自引:0,他引:4  
在对Boyer-Moore(BM)算法及其改进的Tuned Boyer-Moore(TunedBM)算法进行分析的基础上,提出了一种更加快速的单模式匹配算法--NFS.该算法利用当前尝试中匹配失败字符的位置信息进行更大的尝试位置移动,使算法具有更高的效率.实验结果表明,NFS算法的性能优于同类的其他算法,特别是在模式长度较短的情况下,优势更为明显.  相似文献   

15.
Defining functions by pattern matching over the arguments is advantageous for understanding and reasoning, but it tends to expose the implementation of a datatype. Significant effort has been invested in tackling this loss of modularity; however, decoupling patterns from concrete representations while maintaining soundness of reasoning has been a challenge. Inspired by the development of invertible programming, we propose an approach to program refactoring based on a right-invertible language rinv—every function has a right (or pre-) inverse. We show how this new design is able to permit a smooth incremental transition from programs with algebraic datatypes and pattern matching, to ones with proper encapsulation, while maintaining simple and sound reasoning.  相似文献   

16.
模式匹配在入侵检测系统中有着广泛的应用。在对BM以及相关算法分析的基础上,提出了一种基于BM算法的改进算法。该算法同时运用BMH和BMHS算法的思想对模式进行移动,并利用了模式串末字符与首字符的组合性,缩短了比较过程,有效地减少了匹配过程中的字符比较次数。实验证明,该算法具有高的匹配效率。  相似文献   

17.
Given a text T and a pattern P, the order-preserving pattern matching (OPPM) problem is to find all substrings in T which have the same relative orders as P. The OPPM has been studied in the fields of finding some patterns affected by relative orders, not by their absolute values. In this paper, we present a method of deciding the order-isomorphism between two strings even when there are same characters. Then, we show that the bad character rule of the Horspool algorithm for generic pattern matching problems can be applied to the OPPM problem and we present a space-efficient algorithm for computing shift tables for text search. Finally, we combine our bad character rule with the KMP-based algorithm to improve the worst-case running time. We give experimental results to show that our algorithm is about 2 to 6 times faster than the KMP-based algorithm in reasonable cases.  相似文献   

18.
World Wide Web - A sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It...  相似文献   

19.
20.
Order-preserving pattern matching has been introduced recently, but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same relative order as the pattern elements. For this problem, we consider the offline version in which we build an index for the reference sequence so that subsequent searches can be completed very efficiently. We propose a space-efficient index that works well in practice despite its lack of good worst-case time bounds. Our solution is based on the new approach of decomposing the indexed sequence into an order component, containing ordering information, and a δ component, containing information on the absolute values. Experiments show that this approach is viable, is faster than the available alternatives, and is the first one offering simultaneously small space usage and fast retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号