期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using multiple indexes for efficient subsequence matching in time-series databases 总被引：1，自引：0，他引：1

Seung-Hwan Lim Sang-Wook Kim 《Information Sciences》2007,177(24):5691-5706

A time-series database is a set of data sequences, each of which is a list of changing values of an object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence in a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to solve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes from the perspective of physical database design. Given a set of pairs 〈length, frequency〉 of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings performed in a target application. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we show the superiority of our approach by performing extensive experiments with a real-life stock data set and a large volume of synthetic data sets. 相似文献

2.

Efficient time-series subsequence matching using duality in constructing windows

《Information Systems》2001,26(4):279-293

In this paper, we propose a new subsequence matching method, Dual Match. Dual Match exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by Faloutsos et al. (Proceedings of the ACM SIGMOD International Conference on Management of Data, Seattle, Washington, 1994, pp. 419–429.) (FRM in short), which divides data sequences into sliding windows and the query sequence into disjoint windows. FRM causes a lot of false alarms (i.e., candidates that do not qualify) by storing minimum bounding rectangles rather than individual points representing windows to save storage space for the index. Dual Match solves this problem by directly storing points without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement both in false alarms and performance over FRM given the same amount of storage space. In particular, for low selectivities (less than 10⁻⁴), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities (more than 10⁻²), it shows a very minor degradation (less than 29%). For selectivities in between (10⁻⁴–10⁻²), Dual Match shows performance slightly better than that of FRM. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications. 相似文献

3.

An elastic partial shape matching technique

Longin Jan Latecki^{Author Vitae} Vasileios Megalooikonomou Author VitaeAuthor Vitae Deguang Yu Author Vitae 《Pattern recognition》2007,40(11):3069-3080

We consider the problem of partial shape matching. We propose to transform shapes into sequences and utilize an algorithm that determines a subsequence of a target sequence that best matches a query. In the proposed algorithm we map the problem of the best matching subsequence to the problem of a cheapest path in a directed acyclic graph (DAG). The approach allows us to compute the optimal scale and translation of sequence values, which is a nontrivial problem in the case of subsequence matching. Our experimental results demonstrate that the proposed algorithm outperforms the commonly used techniques in retrieval accuracy. 相似文献

4.

Efficient moving average transform-based subsequence matching algorithms in time-series databases 总被引：1，自引：0，他引：1

Yang-Sae Moon Jinho Kim 《Information Sciences》2007,177(23):5415-5431

Moving average transform is very useful in finding the trend of time-series data by reducing the effect of noise, and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, are problematic in that, since they must build multiple indexes in supporting transform of arbitrary order, they incur index overhead both in storage space and in update maintenance. To solve this problem, we propose a single-index approach to subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single-index approach, we can reduce both the storage space and the index maintenance overhead. In explaining the single-index approach, we first introduce the notion of poly-order moving average transform by generalizing the original definition of moving average transform. We then formally prove the correctness of poly-order transform-based subsequence matching. We also propose two subsequence matching methods based on poly-order transform that efficiently support moving average transform of arbitrary order. Experimental results for real stock data show that, compared with the sequential scan, our methods improve average performance significantly, by a factor of 22.6-33.6. Also, compared with cases in which an index is built for every moving average order, our methods reduce storage space and maintenance effort significantly while incurring only marginal performance degradation. Our approach entails the additional advantage of being generalized to support many other transforms in addition to moving average transform. Therefore, we believe that our approach will be widely used in many transform-based subsequence matching methods. 相似文献

5.

基于特征匹配的快速降维排序搜索方法

徐惠铁治欣舒莹《计算机与现代化》2022,(2):85-91

在大数据时代背景下,越来越多的用户或者企业将大量的数据上传至云端存储以便减轻本地存储的压力和获得高效的数据共享服务管理,由此可搜索加密技术应运而生,检索效率与保证数据安全一直是研究的热点。因此,本文提出一种基于特征匹配的快速降维排序搜索方法(DRFM)。通过提出的特征得分算法,创建每一篇文档的索引特征向量;通过提出的匹配得分算法,创建查询关键词的查询匹配向量。使用K-L变换算法对所有文档索引特征向量以及查询匹配向量进行降维,提高算法效率。理论分析与实验结果表明所提的方案高效且可行。相似文献

6.

Efficient subsequence matching over large video databases

Xiangmin Zhou Xiaofang Zhou Lei Chen Athman Bouguettaya 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(4):489-508

Video similarity matching has broad applications such as copyright detection, news tracking and commercial monitoring, etc. Among these applications, one typical task is to detect the local similarity between two videos without the knowledge on positions and lengths of each matched subclip pair. However, most studies so far on video detection investigate the global similarity between two short clips using a pre-defined distance function. Although there are a few works on video subsequence detection, all these proposals fail to provide an effective query processing mechanism. In this paper, we first generalize the problem of video similarity matching. Then, a novel solution called consistent keyframe matching (CKM) is proposed to solve the problem of subsequence matching based on video segmentation. CKM is designed with two goals: (1) good scalability in terms of the query sequence length and the size of video database and (2) fast video subsequence matching in terms of processing time. Good scalability is achieved by employing a batch query paradigm, where keyframes sharing the same query space are summarized and ordered. As such, the redundancy of data access is eliminated, leading to much faster video query processing. Fast subsequence matching is achieved by comparing the keyframes of different video sequences. Specifically, a keyframe matching graph is first constructed and then divided into matched candidate subgraphs. We have evaluated our proposed approach over a very large real video database. Extensive experiments demonstrate the effectiveness and efficiency of our approach. 相似文献

7.

A fast pattern matching algorithm with multi-byte search unit for high-speed network security

Yoon-Ho Choi Moon-Young Jung 《Computer Communications》2011,34(14):1750-1763

A signature-based intrusion detection system identifies intrusions by comparing the data traffic with known signature patterns. In this process, matching of packet strings against signature patterns is the most time-consuming step and dominates the overall system performance. Many signature-based network intrusion detection systems (NIDS), e.g., the Snort, employ one or multiple pattern matching algorithms to detect multiple attack types. So far, many pattern matching algorithms have been proposed. Most of them use single-byte standard unit for search, while a few algorithms such as the Modified Wu-Manber (MWM) algorithm use typically two-byte unit, which guarantees better performance than others even as the number of different signatures increases. Among those algorithms, the MWM algorithm has been known as the fastest pattern matching algorithm when the patterns in a rule set rarely appear in packets. However, the matching time of the MWM algorithm increases as the length of the shortest pattern in a signature group decreases.In this paper, by extending the length of the shortest pattern, we minimize the pattern matching time of the algorithm which uses multi-byte unit. We propose a new pattern matching algorithm called the L⁺¹-MWM algorithm for multi-pattern matching. The proposed algorithm minimizes the performance degradation that is originated from the dependency on the length of the shortest pattern. We show that the L⁺¹-MWM algorithm improves the performance of the MWM algorithm by as much as 20% in average under various lengths of shortest patterns and normal traffic conditions. Moreover, when the length of the shortest pattern in a rule set is less than 5, the L⁺¹-MWM algorithm shows 38.87% enhancement in average. We also conduct experiments on a real campus network and show that 12.48% enhancement is obtained in average. In addition, it is shown that the L⁺¹-MWM algorithm provides a better performance than the MWM algorithm by as much as 25% in average under various numbers of signatures and normal traffic conditions, and 20.12% enhancement in average with real on-line traffic. 相似文献

8.

基于双向双区间标签实现k步可达性查询

宋亚青武优西刘靖宇李艳《计算机科学》2018,45(3):178-181

近年来,图的可达性查询已经成为一个研究热点。传统的可达性查询算法——GRAIL在处理k步可达性查询时具有较高的查询效率,但不适合处理不同分支顶点之间的k步可达性查询。为了解决上述问题,提出了一种新的双向双区间标签索引,进而实现了RE-GRAIL算法,从而有效解决了k步可达性查询问题。最后,在5个不同特征的数据集上进行实验,并从索引构建时间、索引大小、查询时间、扩展性4个方面进行验证。实验结果表明,与众多同类算法相比,RE-GRAIL算法具有更好的性能。相似文献

9.

基于极值点特征的时间序列相似性查询方法* 总被引：4，自引：2，他引：2

吴学雁黄道平莫赞《计算机应用研究》2010,27(6):2068-2070

为了提高时间序列子序列匹配的准确度和效率,提出了基于极值点特征的时间序列相似性查询方法。首先识别出时间序列中的极值特征点,根据极值点使用多层次极值划分法对长序列进行划分;然后对划分得到的多层次子序列集使用改进的动态时间弯曲方法与查询序列进行相似性匹配;最后找到与查询序列最相似的子序列。实验表明,此方法在保证准确度的情况下大大提高了相似性搜索过程的效率。相似文献

10.

Performance bottleneck of subsequence matching in time-series databases: Observation, solution, and performance evaluation

Sang-Wook Kim Byeong-Soo Jeong 《Information Sciences》2007,177(22):4841-4858

Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper identifies a performance bottleneck in subsequence matching, and then proposes an effective method that substantially improves the performance of entire subsequence matching by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post-processing steps of subsequence matching through preliminary experiments. Based on these results, we show that the post-processing step is a main performance bottleneck in subsequence matching. Then, we argue that the optimization of the post-processing step is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple yet highly effective method for expediting the post-processing step. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancies of disk accesses and CPU processing that occur in the post-processing step. Our method is fairly efficient, and does not incur any false dismissal. We quantitatively demonstrate the superiority of our method through extensive experimentation. The results show that our method produces a significantly faster post-processing step; When using a data set of real-world stock sequences, our method was 43.36-96.75 times faster than previous methods, and when using data sets of large numbers of synthetic sequences, our method was 12.48-26.95 times faster than previous methods. Also, the results show that our method reduces the weight of the post-processing step over entire subsequence matching from more than 97% to less than 67%. This implies that our method successfully resolves the performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. Compared with previous methods, our method is 16.17-32.64 times faster when using a data set of real-world stock sequences and 8.64-14.29 times faster when using data sets of large numbers of synthetic sequences. 相似文献