共查询到17条相似文献,搜索用时 296 毫秒
1.
上世纪末,为适应网络监控、入侵检测、情报分析、商业交易管理和分析等应用的要求,数据流技术应运而生。数据流独特的特点,对传统数据的处理方法带来了很大的挑战。介绍了数据流的有关概念及数据流挖掘的特点,讨论了数据流挖掘的研究现状。最后,举例说明了数据流挖掘的应用,并展望了数据流挖掘未来的研究方向。 相似文献
2.
《数字社区&智能家居》2008,(Z2)
介绍了数据流的定义和特点及数据流频繁模式的基本概念。针对数据流的特性,讨论分析了目前国内外数据流频繁模式挖掘算法、算法特性及应用情况,最后展望了数据流频繁模式挖掘的进一步研究工作。 相似文献
3.
4.
数据流挖掘分类技术综述 总被引:7,自引:0,他引:7
数据流挖掘作为从连续不断的数据流中挖掘有用信息的技术,近年来正成为数据挖掘领域的研究热点,并有着广泛的应用前景.数据流具有数据持续到达、到达速度快、数据规模巨大等特点,因此需要新颖的算法来解决这些问题.而数据流挖掘的分类技术更是当前的研究热点.综述了当前国际上关于数据流挖掘分类算法的研究现状,并从数据平稳分布和带概念漂移两个方面对这些方法进行了系统的介绍与分析,最后对数据流挖掘分类技术当前所面临的问题和发展趋势进行了总结和展望. 相似文献
5.
数据流管理和挖掘技术探析 总被引:2,自引:1,他引:1
数据流管理和挖掘技术是数据库领域的新研究方向之一。概述了数据库技术的发展趋势以及数据流的概念、特点、体系结构、应用领域,分析了数据流概要数据结构的构造问题和数据流的连续近似查询技术,最后介绍了数据流挖掘技术。旨在描述数据流管理和挖掘技术的发展概况,为进一步的研究提供有益的借鉴。 相似文献
6.
数据流本身的特点使得静态挖掘方法不再满足要求。国内外学者已提出许多新的挖掘数据流频繁模式的方法和技术。对这些技术和算法进行了综述。首先介绍数据流的概念和特点,分析国内外的研究现状,总结了数据流中挖掘频繁模式的特点,并列出挖掘方法的常用技术和基于这些技术的代表性算法,最后讨论了将来的研究方向。 相似文献
7.
8.
9.
10.
11.
12.
Process mining aims at gaining insights into business processes by analyzing the event data that is generated and recorded during process execution. The vast majority of existing process mining techniques works offline, i.e. using static, historical data, stored in event logs. Recently, the notion of online process mining has emerged, in which techniques are applied on live event streams, i.e. as the process executions unfold. Analyzing event streams allows us to gain instant insights into business processes. However, most online process mining techniques assume the input stream to be completely free of noise and other anomalous behavior. Hence, applying these techniques to real data leads to results of inferior quality. In this paper, we propose an event processor that enables us to filter out infrequent behavior from live event streams. Our experiments show that we are able to effectively filter out events from the input stream and, as such, improve online process mining results. 相似文献
13.
一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。 相似文献
14.
Chowdhury Farhan Ahmed Syed Khairuzzaman Tanbeer Byeong-Soo Jeong Ho-Jin Choi 《Expert systems with applications》2012,39(15):11979-11991
High utility pattern (HUP) mining over data streams has become a challenging research issue in data mining. When a data stream flows through, the old information may not be interesting in the current time period. Therefore, incremental HUP mining is necessary over data streams. Even though some methods have been proposed to discover recent HUPs by using a sliding window, they suffer from the level-wise candidate generation-and-test problem. Hence, they need a large amount of execution time and memory. Moreover, their data structures are not suitable for interactive mining. To solve these problems of the existing algorithms, in this paper, we propose a novel tree structure, called HUS-tree (high utility stream tree) and a new algorithm, called HUPMS (high utility pattern mining over stream data) for incremental and interactive HUP mining over data streams with a sliding window. By capturing the important information of stream data into an HUS-tree, our HUPMS algorithm can mine all the HUPs in the current window with a pattern growth approach. Furthermore, HUS-tree is very efficient for interactive mining. Extensive performance analyses show that our algorithm is very efficient for incremental and interactive HUP mining over data streams and significantly outperforms the existing sliding window-based HUP mining algorithms. 相似文献
15.
Recent research shows that rule based models perform well while classifying large data sets such as data streams with concept
drifts. A genetic algorithm is a strong rule based classification algorithm which is used only for mining static small data
sets. If the genetic algorithm can be made scalable and adaptable by reducing its I/O intensity, it will become an efficient
and effective tool for mining large data sets like data streams. In this paper a scalable and adaptable online genetic algorithm
is proposed to mine classification rules for the data streams with concept drifts. Since the data streams are generated continuously
in a rapid rate, the proposed method does not use a fixed static data set for fitness calculation. Instead, it extracts a
small snapshot of the training example from the current part of data stream whenever data is required for the fitness calculation.
The proposed method also builds rules for all the classes separately in a parallel independent iterative manner. This makes
the proposed method scalable to the data streams and also adaptable to the concept drifts that occur in the data stream in
a fast and more natural way without storing the whole stream or a part of the stream in a compressed form as done by the other
rule based algorithms. The results of the proposed method are comparable with the other standard methods which are used for
mining the data streams. 相似文献
16.
In recent years, data stream mining has become an important research topic. With the emergence of new applications, the data we process are not again static, but the continuous dynamic data stream. Examples include network traffic analysis, Web click stream mining, network intrusion detection, and on-line transaction analysis. In this paper, we propose a new framework for data stream mining, called the weighted sliding window model. The proposed model allows the user to specify the number of windows for mining, the size of a window, and the weight for each window. Thus users can specify a higher weight to a more significant data section, which will make the mining result closer to user’s requirements. Based on the weighted sliding window model, we propose a single pass algorithm, called WSW, to efficiently discover all the frequent itemsets from data streams. By analyzing data characteristics, an improved algorithm, called WSW-Imp, is developed to further reduce the time of deciding whether a candidate itemset is frequent or not. Empirical results show that WSW-Imp outperforms WSW under the weighted sliding window model. 相似文献
17.
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research. 相似文献