首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 187 毫秒
1.
A time-series database is a set of data sequences, each of which is a list of changing values of an object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence in a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to solve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes from the perspective of physical database design. Given a set of pairs 〈lengthfrequency〉 of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings performed in a target application. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we show the superiority of our approach by performing extensive experiments with a real-life stock data set and a large volume of synthetic data sets.  相似文献   

2.
《Information Systems》2001,26(4):279-293
In this paper, we propose a new subsequence matching method, Dual Match. Dual Match exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Washington, 1994, pp. 419–429.) (FRM in short), which divides data sequences into sliding windows and the query sequence into disjoint windows. FRM causes a lot of false alarms (i.e., candidates that do not qualify) by storing minimum bounding rectangles rather than individual points representing windows to save storage space for the index. Dual Match solves this problem by directly storing points without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement both in false alarms and performance over FRM given the same amount of storage space. In particular, for low selectivities (less than 10−4), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities (more than 10−2), it shows a very minor degradation (less than 29%). For selectivities in between (10−4–10−2), Dual Match shows performance slightly better than that of FRM. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications.  相似文献   

3.
We consider the problem of partial shape matching. We propose to transform shapes into sequences and utilize an algorithm that determines a subsequence of a target sequence that best matches a query. In the proposed algorithm we map the problem of the best matching subsequence to the problem of a cheapest path in a directed acyclic graph (DAG). The approach allows us to compute the optimal scale and translation of sequence values, which is a nontrivial problem in the case of subsequence matching. Our experimental results demonstrate that the proposed algorithm outperforms the commonly used techniques in retrieval accuracy.  相似文献   

4.
Moving average transform is very useful in finding the trend of time-series data by reducing the effect of noise, and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, are problematic in that, since they must build multiple indexes in supporting transform of arbitrary order, they incur index overhead both in storage space and in update maintenance. To solve this problem, we propose a single-index approach to subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single-index approach, we can reduce both the storage space and the index maintenance overhead. In explaining the single-index approach, we first introduce the notion of poly-order moving average transform by generalizing the original definition of moving average transform. We then formally prove the correctness of poly-order transform-based subsequence matching. We also propose two subsequence matching methods based on poly-order transform that efficiently support moving average transform of arbitrary order. Experimental results for real stock data show that, compared with the sequential scan, our methods improve average performance significantly, by a factor of 22.6-33.6. Also, compared with cases in which an index is built for every moving average order, our methods reduce storage space and maintenance effort significantly while incurring only marginal performance degradation. Our approach entails the additional advantage of being generalized to support many other transforms in addition to moving average transform. Therefore, we believe that our approach will be widely used in many transform-based subsequence matching methods.  相似文献   

5.
在大数据时代背景下,越来越多的用户或者企业将大量的数据上传至云端存储以便减轻本地存储的压力和获得高效的数据共享服务管理,由此可搜索加密技术应运而生,检索效率与保证数据安全一直是研究的热点。因此,本文提出一种基于特征匹配的快速降维排序搜索方法(DRFM)。通过提出的特征得分算法,创建每一篇文档的索引特征向量;通过提出的匹配得分算法,创建查询关键词的查询匹配向量。使用K-L变换算法对所有文档索引特征向量以及查询匹配向量进行降维,提高算法效率。理论分析与实验结果表明所提的方案高效且可行。  相似文献   

6.
Video similarity matching has broad applications such as copyright detection, news tracking and commercial monitoring, etc. Among these applications, one typical task is to detect the local similarity between two videos without the knowledge on positions and lengths of each matched subclip pair. However, most studies so far on video detection investigate the global similarity between two short clips using a pre-defined distance function. Although there are a few works on video subsequence detection, all these proposals fail to provide an effective query processing mechanism. In this paper, we first generalize the problem of video similarity matching. Then, a novel solution called consistent keyframe matching (CKM) is proposed to solve the problem of subsequence matching based on video segmentation. CKM is designed with two goals: (1) good scalability in terms of the query sequence length and the size of video database and (2) fast video subsequence matching in terms of processing time. Good scalability is achieved by employing a batch query paradigm, where keyframes sharing the same query space are summarized and ordered. As such, the redundancy of data access is eliminated, leading to much faster video query processing. Fast subsequence matching is achieved by comparing the keyframes of different video sequences. Specifically, a keyframe matching graph is first constructed and then divided into matched candidate subgraphs. We have evaluated our proposed approach over a very large real video database. Extensive experiments demonstrate the effectiveness and efficiency of our approach.  相似文献   

7.
A signature-based intrusion detection system identifies intrusions by comparing the data traffic with known signature patterns. In this process, matching of packet strings against signature patterns is the most time-consuming step and dominates the overall system performance. Many signature-based network intrusion detection systems (NIDS), e.g., the Snort, employ one or multiple pattern matching algorithms to detect multiple attack types. So far, many pattern matching algorithms have been proposed. Most of them use single-byte standard unit for search, while a few algorithms such as the Modified Wu-Manber (MWM) algorithm use typically two-byte unit, which guarantees better performance than others even as the number of different signatures increases. Among those algorithms, the MWM algorithm has been known as the fastest pattern matching algorithm when the patterns in a rule set rarely appear in packets. However, the matching time of the MWM algorithm increases as the length of the shortest pattern in a signature group decreases.In this paper, by extending the length of the shortest pattern, we minimize the pattern matching time of the algorithm which uses multi-byte unit. We propose a new pattern matching algorithm called the L+1-MWM algorithm for multi-pattern matching. The proposed algorithm minimizes the performance degradation that is originated from the dependency on the length of the shortest pattern. We show that the L+1-MWM algorithm improves the performance of the MWM algorithm by as much as 20% in average under various lengths of shortest patterns and normal traffic conditions. Moreover, when the length of the shortest pattern in a rule set is less than 5, the L+1-MWM algorithm shows 38.87% enhancement in average. We also conduct experiments on a real campus network and show that 12.48% enhancement is obtained in average. In addition, it is shown that the L+1-MWM algorithm provides a better performance than the MWM algorithm by as much as 25% in average under various numbers of signatures and normal traffic conditions, and 20.12% enhancement in average with real on-line traffic.  相似文献   

8.
近年来,图的可达性查询已经成为一个研究热点。传统的可达性查询算法——GRAIL在处理k步可达性查询时具有较高的查询效率,但不适合处理不同分支顶点之间的k步可达性查询。为了解决上述问题,提出了一种新的双向双区间标签索引,进而实现了RE-GRAIL算法,从而有效解决了k步可达性查询问题。最后,在5个不同特征的数据集上进行实验,并从索引构建时间、索引大小、查询时间、扩展性4个方面进行验证。 实验结果表明,与众多同类算法相比,RE-GRAIL算法具有更好的性能。  相似文献   

9.
基于极值点特征的时间序列相似性查询方法*   总被引:4,自引:2,他引:2  
为了提高时间序列子序列匹配的准确度和效率,提出了基于极值点特征的时间序列相似性查询方法。首先识别出时间序列中的极值特征点,根据极值点使用多层次极值划分法对长序列进行划分;然后对划分得到的多层次子序列集使用改进的动态时间弯曲方法与查询序列进行相似性匹配;最后找到与查询序列最相似的子序列。实验表明,此方法在保证准确度的情况下大大提高了相似性搜索过程的效率。  相似文献   

10.
Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper identifies a performance bottleneck in subsequence matching, and then proposes an effective method that substantially improves the performance of entire subsequence matching by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post-processing steps of subsequence matching through preliminary experiments. Based on these results, we show that the post-processing step is a main performance bottleneck in subsequence matching. Then, we argue that the optimization of the post-processing step is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple yet highly effective method for expediting the post-processing step. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancies of disk accesses and CPU processing that occur in the post-processing step. Our method is fairly efficient, and does not incur any false dismissal. We quantitatively demonstrate the superiority of our method through extensive experimentation. The results show that our method produces a significantly faster post-processing step; When using a data set of real-world stock sequences, our method was 43.36-96.75 times faster than previous methods, and when using data sets of large numbers of synthetic sequences, our method was 12.48-26.95 times faster than previous methods. Also, the results show that our method reduces the weight of the post-processing step over entire subsequence matching from more than 97% to less than 67%. This implies that our method successfully resolves the performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. Compared with previous methods, our method is 16.17-32.64 times faster when using a data set of real-world stock sequences and 8.64-14.29 times faster when using data sets of large numbers of synthetic sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号