首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 485 毫秒
1.
Sequential pattern mining is one of the most important data mining techniques. Previous research on mining sequential patterns discovered patterns from point-based event data, interval-based event data, and hybrid event data. In many real life applications, however, an event may involve many statuses; it might not occur only at one certain point in time or over a period of time. In this work, we propose a generalized representation of temporal events. We treat events as multi-label events with many statuses, and introduce an algorithm called MLTPM to discover multi-label temporal patterns from temporal databases. The experimental results show that the efficiency and scalability of the MLTPM algorithm are satisfactory. We also discuss interesting multi-label temporal patterns discovered when MLTPM was applied to historical Nasdaq data.  相似文献   

2.
Mining Nonambiguous Temporal Patterns for Interval-Based Events   总被引:2,自引:0,他引:2  
Previous research on mining sequential patterns mainly focused on discovering patterns from point-based event data. Little effort has been put toward mining patterns from interval-based event data, where a pair of time values is associated with each event. Kam and Fu's work in 2000 identified 13 temporal relationships between two intervals. According to these temporal relationships, a new variant of temporal patterns was defined for interval-based event data. Unfortunately, the patterns defined in this manner are ambiguous, which means that the temporal relationships among events cannot be correctly represented in temporal patterns. To resolve this problem, we first define a new kind of nonambiguous temporal pattern for interval-based event data. Then, the TPrefixSpan algorithm is developed to mine the new temporal patterns from interval-based events. The completeness and accuracy of the results are also proven. The experimental results show that the efficiency and scalability of the TPrefixSpan algorithm are satisfactory. Furthermore, to show the applicability and effectiveness of temporal pattern mining, we execute experiments to discover temporal patterns from historical Nasdaq data  相似文献   

3.
The goal of analyzing a time series database is to find whether and how frequent a periodic pattern is repeated within the series. Periodic pattern mining is the problem that regards temporal regularity. However, most of the existing algorithms have a major limitation in mining interesting patterns of users interest, that is, they can mine patterns of specific length with all the events sequentially one after another in exact positions within this pattern. Though there are certain scenarios where a pattern can be flexible, that is, it may be interesting and can be mined by neglecting any number of unimportant events in between important events with variable length of the pattern. Moreover, existing algorithms can detect only specific type of periodicity in various time series databases and require the interaction from user to determine periodicity. In this paper, we have proposed an algorithm for the periodic pattern mining in time series databases which does not rely on the user for the period value or period type of the pattern and can detect all types of periodic patterns at the same time, indeed these flexibilities are missing in existing algorithms. The proposed algorithm facilitates the user to generate different kinds of patterns by skipping intermediate events in a time series database and find out the periodicity of the patterns within the database. It is an improvement over the generating pattern using suffix tree, because suffix tree based algorithms have weakness in this particular area of pattern generation. Comparing with the existing algorithms, the proposed algorithm improves generating different kinds of interesting patterns and detects whether the generated pattern is periodic or not. We have tested the performance of our algorithm on both synthetic and real life data from different domains and found a large number of interesting event sequences which were missing in existing algorithms and the proposed algorithm was efficient enough in generating and detecting periodicity of flexible patterns on both types of data.  相似文献   

4.
A constraint-based qualitative reasoning system that integrates Allen's interval calculus, point calculus and pan of Simmons' quantity lattice is presented in this paper. The highlight of the work is a simple but powerful logical system for expressing both quantitative and qualitative information managed by a temporal manager (TM). Allen's algorithm, which deals with time intervals, is extended to reason about time points. The hybrid method of propagating temporal constraints permits flexible control over both systems. We try to offset the limitations of an interval-based representation by the advantages of a point-based representation.  相似文献   

5.
We present a novel approach to client-side mining of temporal API specifications based on static analysis. Specifically, we present an interprocedural analysis over a combined domain that abstracts both aliasing and event sequences for individual objects. The analysis uses a new family of automata-based abstractions to represent unbounded event sequences, designed to disambiguate distinct usage patterns and merge similar usage patterns. Additionally, our approach includes an algorithm that summarizes abstract traces based on automata clusters, and effectively rules out spurious behaviors. We show experimental results mining specifications from a number of Java clients and APIs. The results indicate that effective static analysis for client-side mining requires fairly precise treatment of aliasing and abstract event sequences. Based on the results, we conclude that static client-side specification mining shows promise as a complement or alternative to dynamic approaches.  相似文献   

6.
杜诗晴  王鹏  汪卫 《计算机工程》2021,47(2):118-125
日志数据是互联网系统产生的过程性事件记录数据,从日志数据中挖掘出高质量序列模式可帮助工程师高效开展系统运维工作。针对传统模式挖掘算法结果冗余的问题,提出一种从时序日志序列中挖掘序列模式(DTS)的算法。DTS采用启发式思路挖掘能充分代表原序列中事件关系和时序规律的模式集合,并将最小描述长度准则应用于模式挖掘,设计一种考虑事件关系和时序关系的编码方案,以解决模式规模爆炸问题。在真实日志数据集上的实验结果表明,与SQS、CSC与ISM等序列模式挖掘算法相比,该算法能高效挖掘出含义丰富且冗余度低的序列模式。  相似文献   

7.
The basic goal of scene understanding is to organize the video into sets of events and to find the associated temporal dependencies. Such systems aim to automatically interpret activities in the scene, as well as detect unusual events that could be of particular interest, such as traffic violations and unauthorized entry. The objective of this work, therefore, is to learn behaviors of multi-agent actions and interactions in a semi-supervised manner. Using tracked object trajectories, we organize similar motion trajectories into clusters using the spectral clustering technique. This set of clusters depicts the different paths/routes, i.e., the distinct events taking place at various locations in the scene. A temporal mining algorithm is used to mine interval-based frequent temporal patterns occurring in the scene. A temporal pattern indicates a set of events that are linked based on their relationship with other events in the set, and we use Allen's interval-based temporal logic to describe these relations. The resulting frequent patterns are used to generate temporal association rules, which convey the semantic information contained in the scene. Our overall aim is to generate rules that govern the dynamics of the scene and perform anomaly detection. We apply the proposed approach on two publicly available complex traffic datasets and demonstrate considerable improvements over the existing techniques.  相似文献   

8.
肖波  张亮  徐前方  蔺志青  郭军 《软件学报》2010,21(4):659-671
超团模式是一种新型的关联模式,这种模式所包含的项目相互间具有很高的亲密度.超团模式中某个项目在事务中的出现很强地暗示了模式中其他项目也会相应地出现.极大超团模式是一组超团模式更加紧凑的表示,可被用于多种应用.挖掘这两种模式的标准算法是完全不同的.提出一种基于FP-tree(frequent pattern tree)的快速挖掘算法——混合超团模式增长(hybrid hyperclique pattern growth,简称HHCP-growth),统一了两种模式的挖掘.算法采用递归挖掘方法,并应用多种有效的剪枝策略.提出并证明几个相关命题来说明剪枝策略的有效性和算法的正确性.实验结果表明,HHCP-growth算法相对于标准的超团模式挖掘算法和极大超团模式挖掘算法都具有更高的效率,尤其对于大数据集或在低支持度条件下更为显著.  相似文献   

9.
本文以标记有序树作为半结构化数据的数据模型 ,研究了半结构化数据的树状最大频繁模式挖掘问题 .已有挖掘算法通常挖掘所有频繁模式 ,其中很多模式为其它模式的子模式 ,针对该问题 ,设计实现了一种最大模式挖掘算法 .该算法采用最右扩展枚举方法无重复枚举所有候选模式 ,利用频繁模式扩展森林实现高效剪枝扩展和挖掘频繁叶模式 ,通过计算频繁叶模式间的包含关系挖掘树状最大频繁模式 .试验结果表明该算法具有良好性能  相似文献   

10.
王树怡  董东 《计算机科学》2017,44(Z6):486-490
在软件开发过程中,开发人员经常需要遵循特定的API用法模式,而这些用法模式几乎没有相关文档作为参考。为了挖掘API用法模式,提出基于聚类和频繁闭合偏序序列的API用法模式挖掘途径。通过抽象语法树对源代码进行解析,对提取API方法调用序列进行层次聚类,最后使用频繁闭合偏序挖掘算法DFP进行API用法模式的挖掘。实验结果表明,在相同的数据集上,与SPADE算法和BIDE算法相比,所得候选API用法模式集更加精简。  相似文献   

11.
Given a time stamped transaction database and a user-defined reference sequence of interest over time, similarity-profiled temporal association mining discovers all associated item sets whose prevalence variations over time are similar to the reference sequence. The similar temporal association patterns can reveal interesting relationships of data items which co-occur with a particular event over time. Most works in temporal association mining have focused on capturing special temporal regulation patterns such as cyclic patterns and calendar scheme-based patterns. However, our model is flexible in representing interesting temporal patterns using a user-defined reference sequence. The dissimilarity degree of the sequence of support values of an item set to the reference sequence is used to capture how well its temporal prevalence variation matches the reference pattern. By exploiting interesting properties such as an envelope of support time sequence and a lower bounding distance for early pruning candidate item sets, we develop an algorithm for effectively mining similarity-profiled temporal association patterns. We prove the algorithm is correct and complete in the mining results and provide the computational analysis. Experimental results on real data as well as synthetic data show that the proposed algorithm is more efficient than a sequential method using a traditional support-pruning scheme.  相似文献   

12.
The mobile wireless market has been attracting many customers. Technically, the paradigm of anytime-anywhere connectivity raises previously unthinkable challenges, including the management of million of mobile customers, their profiles, the profiles-based selective information dissemination, and server-side computing infrastructure design issues to support such a large pool of users automatically and intelligently. In this paper, we propose a data mining technique for discovering frequent behavioral patterns from a collection of trajectories gathered by Global Positioning System. Although the search space for spatiotemporal knowledge is extremely challenging, imposing spatial and temporal constraints on spatiotemporal sequences makes the computation feasible. Specifically, the mined patterns are incorporated with synthetic constraints, namely spatiotemporal sequence length restriction, minimum and maximum timing gap between events, time window of occurrence of the whole pattern, inclusion or exclusion event constraints, and frequent movement patterns predictive of one ore more classes. The algorithm for mining all frequent constrained patterns is named cAllMOP. Moreover, to control the density of pattern regions a clustering algorithm is exploited. The proposed method is efficient and scalable. Its efficiency is better than that of the previous algorithms AllMOP and GSP with respect to the compactness of discovered knowledge, execution time, and memory requirement.  相似文献   

13.
Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases.  相似文献   

14.
15.
In the temporal database literature, every fact stored in a database may be equipped with two temporal dimensions: the valid time, which describes the time when the fact is true in the modeled reality, and the transaction time, which describes the time when the fact is current in the database and can be retrieved. Temporal functional dependencies (TFDs) add valid time to classical functional dependencies (FDs) in order to express database integrity constraints over the flow of time. Currently, proposals dealing with TFDs adopt a point-based approach, where tuples hold at specific time points, to express integrity constraints such as “for each month, the salary of an employee depends only on his role”. To the best of our knowledge, there are no proposals dealing with interval-based temporal functional dependencies (ITFDs), where the associated valid time is represented by an interval and there is the need of representing both point-based and interval-based data dependencies. In this paper, we propose ITFDs based on Allen’s interval relations and discuss their expressive power with respect to other TFDs proposed in the literature: ITFDs allow us to express interval-based data dependencies, which cannot be expressed through the existing point-based TFDs. ITFDs allow one to express constraints such as “employees starting to work the same day with the same role get the same salary” or “employees with a given role working on a project cannot start to work with the same role on another project that will end before the first one”. Furthermore, we propose new algorithms based on B-trees to efficiently verify the satisfaction of ITFDs in a temporal database. These algorithms guarantee that, starting from a relation satisfying a set of ITFDs, the updated relation still satisfies the given ITFDs.  相似文献   

16.
Given a large spatio-temporal database of events, where each event consists of the fields event ID, time, location, and event type, mining spatio-temporal sequential patterns identifies significant event-type sequences. Such spatio-temporal sequential patterns are crucial to the investigation of spatial and temporal evolutions of phenomena in many application domains. Recent research literature has explored the sequential patterns on transaction data and trajectory analysis on moving objects. However, these methods cannot be directly applied to mining sequential patterns from a large number of spatio-temporal events. Two major research challenges still remain: 1) the definition of significance measures for spatio-temporal sequential patterns to avoid spurious ones and 2) the algorithmic design under the significance measures, which may not guarantee the downward closure property. In this paper, we propose a sequence index as the significance measure for spatio-temporal sequential patterns, which is meaningful due to its interpretability using spatial statistics. We propose a novel algorithm called Slicing-STS-miner to tackle the algorithmic design challenge using the spatial sequence index, which does not preserve the downward closure property. We compare the proposed algorithm with a simple algorithm called STS-miner that utilizes the weak monotone property of the sequence index. Performance evaluations using both synthetic and real-world data sets show that the slicing-STS-miner is an order of magnitude faster than STS-Miner for large data sets.  相似文献   

17.
使用序列模式精简基挖掘序列模式   总被引:3,自引:1,他引:3  
传统的序列模式挖掘方法在挖掘由短的频繁序列模式组成的数据库时有良好的性能.但在挖掘长的序列模式或支持度阈值很低时,这些方法可能遇到固有的困难,因为产生的频繁序列模式的数量经常太大.在许多情况下,用户可能只需要那些覆盖许多短模式的长模式.此外,在很多应用中,只要得到产生的频繁序列模式的近似支持度就已足够,而不需要它们的精确支持度.介绍了能将误差控制在确定范围内的频繁序列模式精简基的概念,并开发了一个挖掘这种序列模式精简基的算法.实验结果显示计算频繁序列模式精简基是很有前途的.  相似文献   

18.
王乐  熊松泉  常艳芬  王水 《自动化学报》2015,41(9):1616-1626
高效用模式挖掘是数据挖掘领域的一个重要研究内容; 由于其计算过程包含对模式的内、外效用值的处理, 计算复杂度较大, 因此挖掘算法的主要研究热点问题就是提高算法的时间效率.针对此问题, 本文给出一个基于模式增长方式的高效用模式挖掘算法HUPM-FP, 该算法可以从全局树上挖掘高效用模式, 避免产生候选项集.实验中, 采用6个典型数据集进行实验, 并和目前效率较好的算法FHM (Faster high-utility itemset mining)做了对比, 实验结果表明本文给出的算法时空效率都有较大的提高, 特别是时间效率提高较大, 可以达到1个数量级以上.  相似文献   

19.
Propositional interval temporal logics are quite expressive temporal logics that allow one to naturally express statements that refer to time intervals. Unfortunately, most such logics turn out to be (highly) undecidable. In order to get decidability, severe syntactic or semantic restrictions have been imposed to interval-based temporal logics to reduce them to point-based ones. The problem of identifying expressive enough, yet decidable, new interval logics or fragments of existing ones that are genuinely interval-based is still largely unexplored. In this paper, we focus our attention on interval logics of temporal neighborhood. We address the decision problem for the future fragment of Neighborhood Logic (Right Propositional Neighborhood Logic, RPNL for short), and we positively solve it by showing that the satisfiability problem for RPNL over natural numbers is NEXPTIME-complete. Then, we develop a sound and complete tableau-based decision procedure, and we prove its optimality.  相似文献   

20.
Temporally uncertain data widely exist in many real-world applications. Temporal uncertainty can be caused by various reasons such as conflicting or missing event timestamps, network latency, granularity mismatch, synchronization problems, device precision limitations, data aggregation. In this paper, we propose an efficient algorithm to mine sequential patterns from data with temporal uncertainty. We propose an uncertain model in which timestamps are modeled by random variables and then design a new approach to manage temporal uncertainty. We integrate it into the pattern-growth sequential pattern mining algorithm to discover probabilistic frequent sequential patterns. Extensive experiments on both synthetic and real datasets prove that the proposed algorithm is both efficient and scalable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号