期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient algorithm for mining temporal high utility itemsets from data streams 总被引：1，自引：0，他引：1

Chun-Jung Chu Tyne Liang 《Journal of Systems and Software》2008,81(7):1105-1117

Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions. 相似文献

2.

减少候选项集的数据流高效用项集挖掘算法

茹蓓贺新征《计算机应用研究》2017,34(11)

大数据环境下高效用项集挖掘算法中过多的候选项集极大地降低了算法的时空效率,提出了一种减少候选项集的数据流高效用项集挖掘算法。首先,通过数据流中当前窗口的一次扫描建立一个全局树,并降低全局树中头表入口与节点的冗余效用值;然后,基于全局树生成候选模式,基于增长算法降低局部树的候选项集效用;最终,从候选模式中选出高效用模式。基于真实数据流的实验结果表明,本算法的时空效率与内存占用比均优于其他数据流的高效用模式挖掘算法。相似文献

3.

High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates

《Expert systems with applications》2014,41(8):3861-3878

High utility itemset mining considers the importance of items such as profit and item quantities in transactions. Recently, mining high utility itemsets has emerged as one of the most significant research issues due to a huge range of real world applications such as retail market data analysis and stock market prediction. Although many relevant algorithms have been proposed in recent years, they incur the problem of generating a large number of candidate itemsets, which degrade mining performance. In this paper, we propose an algorithm named MU-Growth (Maximum Utility Growth) with two techniques for pruning candidates effectively in mining process. Moreover, we suggest a tree structure, named MIQ-Tree (Maximum Item Quantity Tree), which captures database information with a single-pass. The proposed data structure is restructured for reducing overestimated utilities. Performance evaluation shows that MU-Growth not only decreases the number of candidates but also outperforms state-of-the-art tree-based algorithms with overestimated methods in terms of runtime with a similar memory usage. 相似文献

4.

An efficient fast algorithm for discovering closed+ high utility itemsets

Jayakrushna Sahoo Ashok Kumar Das A. Goswami 《Applied Intelligence》2016,45(1):44-74

In recent years, high utility itemsets (HUIs) mining from the transactional databases becomes one of the most emerging research topic in the field of data mining due to its wide range of applications in online e-commerce data analysis, identifying interesting patterns in biomedical data and for cross marketing solutions in retail business. It aims to discover the itemsets with high utilities efficiently by considering item quantities in a transaction and profit values of each item. However, it produces a tremendous number of HUIs, which imposes further burden in analysis of the extracted patterns and also degrades the performance of mining methods. Mining the set of closed ⁺ high utility itemsets (CHUIs) solves this issue as it is a loss-less and condensed representation of all HUIs. In this paper, we aim to present a new algorithm for finding CHUIs from a transactional database, called the CHUM (Closed ⁺ High Utility itemset Miner), which is scalable and efficient. The proposed mining algorithm adopts a tricky aimed vertical representation of the database in order to speed up the execution time in generating itemset closures and compute their utility information without accessing the database. The proposed method makes use of the item co-occurrences strategy in order to further reduce the number of intersections needed to be performed. Several experiments are conducted on various sparse and dense datasets and the simulation results clearly show the scalability and superior performance of our algorithm as compared to those for the existing state-of-the-art CHUD (Closed ⁺ High Utility itemset Discovery) algorithm. 相似文献

5.

基于聚类划分的高效用模式并行挖掘算法 总被引：4，自引：0，他引：4

邢淑凝刘方爱赵晓晖《计算机应用》2016,36(8):2202-2206

针对在大规模数据库中挖掘高效用模式产生大量基于内存的效用模式树,从而导致内存空间占用较大以及丢失一些高效用项集的问题,提出在Hadoop分布式计算平台下的基于聚类划分的高效用模式并行挖掘算法PUCP。首先,采用聚类的方法把数据库中相似的事务划分为若干数据子集;然后,把若干划分好的数据子集分配到Hadoop平台的各个节点中构造效用模式树;最后,把各个节点中相同项的条件模式基分配到同一个节点中进行挖掘,以减少各个节点交叉操作的次数。通过实验结果和理论分析表明：PUCP算法在不影响挖掘结果可靠性的前提下,与主流串行高效用模式挖掘——效用模式增长挖掘算法（UP-Growth）和现有的并行高效用模式挖掘算法PHUI-Growth相比,挖掘效率分别提高了61.2%和16.6%;并且使用了Hadoop计算平台,能有效缓解挖掘大规模数据的内存压力。相似文献

6.

Fuzzy utility mining with upper-bound measure

《Applied Soft Computing》2015

Fuzzy utility mining has been an emerging research issue because of its simplicity and comprehensibility. Different from traditional fuzzy data mining, fuzzy utility mining considers not only quantities of items in transactions but also their profits for deriving high fuzzy utility itemsets. In this paper, we introduce a new fuzzy utility measure with the fuzzy minimum operator to evaluate the fuzzy utilities of itemsets. Besides, an effective fuzzy utility upper-bound model based on the proposed measure is designed to provide the downward-closure property in fuzzy sets, thus reducing the search space of finding high fuzzy utility itemsets. A two-phase fuzzy utility mining algorithm, named TPFU, is also proposed and described for solving the problem of fuzzy utility mining. At last, the experimental results on both synthetic and real datasets show that the proposed algorithm has good performance. 相似文献

7.

数据流频繁模式挖掘综述

韩萌丁剑《计算机应用》2019,39(3):719-727

一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。相似文献

8.

窗口模式下在线数据流中频繁项集的挖掘* 总被引：1，自引：1，他引：0

李可任家东《计算机应用研究》2010,27(5):1711-1713

拟采用一种基于滑动窗模式的单遍挖掘算法,专注于处理近期数据;为了减少处理时间和占用的内存,设计了一种新的事务表示方法。通过处理这个事务的表达式,频繁项集可以被高效输出,并解决了使用基于Apriori理论的算法时,由候选频繁1-项集生成频繁2-项集时数据项顺序判断不准确问题。该算法称为MRFI-SW算法。相似文献

9.

面向数据流的频繁模式挖掘研究*

孟彩霞《计算机应用研究》2009,26(11):4054-4056

数据流的无限性、高速性使得经典的频繁模式挖掘方法难以适用到数据流中。针对数据流的特点,对数据流中频繁模式挖掘问题进行了研究,提出了数据流频繁模式挖掘算法FP-SegCount。该算法将数据流分段并利用改进的FP-growth算法挖掘分段中的频繁项集,然后利用Count-Min Sketch进行项集计数。算法解决了压缩统计和计算快速高效的问题。通过实验分析,FP-SegCount算法是有效的。相似文献

10.

Effective utility mining with the measure of average utility

Tzung-Pei Hong Cho-Han Lee Shyue-Liang Wang 《Expert systems with applications》2011,38(7):8259-8265

Frequent-itemset mining only considers the frequency of occurrence of the items but does not reflect any other factors, such as price or profit. Utility mining is an extension of frequent-itemset mining, considering cost, profit or other measures from user preference. Traditionally, the utility of an itemset is the summation of the utilities of the itemset in all the transactions regardless of its length. The average utility measure is thus adopted in this paper to reveal a better utility effect of combining several items than the original utility measure. It is defined as the total utility of an itemset divided by its number of items within it. The average-utility itemsets, as well as the original utility itemsets, does not have the “downward-closure” property. A mining algorithm is then proposed to efficiently find the high average-utility itemsets. It uses the summation of the maximal utility among the items in each transaction with the target itemset as the upper bound to overestimate the actual average utilities of the itemset and processes it in two phases. As expected, the mined high average-utility itemsets in the proposed way will be fewer than the high utility itemsets under the same threshold. The proposed approach can thus be executed under a larger threshold than the original, thus with a more significant and relevant criterion. Experimental results also show the performance of the proposed algorithm. 相似文献

11.

Mining high utility itemsets by dynamically pruning the tree structure 总被引：2，自引：2，他引：0

Wei Song Yu Liu Jinhong Li 《Applied Intelligence》2014,40(1):29-43

Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Mining such itemsets from a transaction database involves finding those itemsets with utility above a user-specified threshold. In this paper, we propose an efficient concurrent algorithm, called CHUI-Mine (Concurrent High Utility Itemsets Mine), for mining high utility itemsets by dynamically pruning the tree structure. A tree structure, called the CHUI-Tree, is introduced to capture the important utility information of the candidate itemsets. By recording changes in support counts of candidate high utility items during the tree construction process, we implement dynamic CHUI-Tree pruning, and discuss the rationality thereof. The CHUI-Mine algorithm makes use of a concurrent strategy, enabling the simultaneous construction of a CHUI-Tree and the discovery of high utility itemsets. Our algorithm reduces the problem of huge memory usage for tree construction and traversal in tree-based algorithms for mining high utility itemsets. Extensive experimental results show that the CHUI-Mine algorithm is both efficient and scalable. 相似文献

12.

Approximate mining of maximal frequent itemsets in data streams with different window models

Hua-Fu Li Suh-Yin Lee 《Expert systems with applications》2008,35(3):781-789

A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal frequent itemsets in landmark windows over data streams. A new summary data structure called summary frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal frequent itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal frequent itemsets over the entire history of the data streams. 相似文献

13.

HMiner: Efficiently mining high utility itemsets

《Expert systems with applications》2017

High utility itemset mining problem uses the notion of utilities to discover interesting and actionable patterns. Several data structures and heuristic methods have been proposed in the literature to efficiently mine high utility itemsets. This paper advances the state-of-the-art and presents HMiner, a high utility itemset mining method. HMiner utilizes a few novel ideas and presents a compact utility list and virtual hyperlink data structure for storing itemset information. It also makes use of several pruning strategies for efficiently mining high utility itemsets. The proposed ideas were evaluated on a set of benchmark sparse and dense datasets. The execution time improvements ranged from a modest thirty percent to three orders of magnitude across several benchmark datasets. The memory consumption requirements also showed up to an order of magnitude improvement over the state-of-the-art methods. In general, HMiner was found to work well in the dense regions of both sparse and dense benchmark datasets. 相似文献

14.

Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits 总被引：2，自引：1，他引：1

Hua-Fu Li Hsin-Yun Huang Suh-Yin Lee 《Knowledge and Information Systems》2011,28(3):495-522

Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows. 相似文献

15.

High utility pattern mining over data streams with sliding window technique

《Expert systems with applications》2016

Processing changeable data streams in real time is one of the most important issues in the data mining field due to its broad applications such as retail market analysis, wireless sensor networks, and stock market prediction. In addition, it is an interesting and challenging problem to deal with the stream data since not only the data have unbounded, continuous, and high speed characteristics but also their environments have limited resources. High utility pattern mining, meanwhile, is one of the essential research topics in pattern mining to overcome major drawbacks of the traditional framework for frequent pattern mining that takes only binary databases and identical item importance into consideration. This approach conducts mining processes by reflecting characteristics of real world databases, non-binary quantities and relative importance of items. Although relevant algorithms were proposed for finding high utility patterns in stream environments, they suffer from a level-wise candidate generation-and-test and a large number of candidates by their overestimation techniques. As a result, they consume a huge amount of execution time, which is a significant performance issue since a rapid process is necessary in stream data analysis. In this paper, we propose an algorithm for mining high utility patterns from resource-limited environments through efficient processing of data streams in order to solve the problems of the overestimation-based methods. To improve mining performance with fewer candidates and search space than the previous ones, we develop two techniques for reducing overestimated utilities. Moreover, we suggest a tree-based data structure to maintain information of stream data and high utility patterns. The proposed tree is restructured by our updating method with decreased overestimation utilities to keep up-to-date stream information whenever the current window slides. Our approach also has an important effect on expert and intelligent systems in that it can provide users with more meaningful information than traditional analysis methods by reflecting the characteristics of real world non-binary databases in stream environments and emphasizing on recent data. Comprehensive experimental results show that our algorithm outperforms the existing sliding window-based one in terms of runtime efficiency and scalability. 相似文献

16.

数据流上的最大频繁项集挖掘方法

下载免费PDF全文

李海峰章宁《计算机工程》2012,38(21):45-48

最大频繁项集适用于内存空间有限的数据流挖掘。为此,提出一种基于界碑模型的最大频繁项集挖掘方法,采用最大频繁项集树的数据结构,增量式地维护最大频繁项集与部分附属信息,实现项集的快速搜索和裁剪。在MUSHROOM和BMS-POS数据集上的实验结果表明,该方法具有较高的挖掘效率。相似文献

17.

Top-k high average-utility itemsets mining with effective pruning strategies

Ronghui Wu Zhan He 《Applied Intelligence》2018,48(10):3429-3445

High average-utility itemset (HAUI) mining has recently received interest in the data mining field due to its balanced utility measurement, which considers not only profits and quantities of items but also the lengths of itemsets. Although several algorithms have been designed for the task of HAUI mining in recent years, it is hard for users to determine an appropriate minimum average-utility threshold for the algorithms to work efficiently and control the mining result precisely. In this paper, we address this issue by introducing a framework of top-k HAUI mining, where \(k\) is the desired number of high average-utility itemsets to be mined instead of setting a minimum average-utility threshold. An efficient list based algorithm named TKAU is proposed to mine the top-k high average-utility itemsets in a single phase. TKAU introduces two novel strategies, named EMUP and EA to avoid performing costly join operations for calculating the utilities of itemsets. Moreover, three strategies named RIU, CAD, and EPBF are also incorporated to raise its internal minimal average-utility threshold effectively, and thus reduce the search space. Extensive experiments on both real and synthetic datasets show that the proposed algorithm has excellent performance and scalability. 相似文献

18.

基于效用表的快速高平均效用挖掘算法

王敬华罗相洲吴倩《计算机应用》2016,36(11):3062-3066

高效用项集挖掘在数据挖掘领域中受到了广泛的关注,但是高效用项集挖掘并没有考虑项集长度对效用值的影响,所以高平均效用项集挖掘被提出;而目前的一些高平均效用项集挖掘算法需要耗费大量的时间才能挖掘出有效的高平均效用项集。针对此问题,给出了一个高平均效用项集挖掘的改进算法——FHAUI。FHAUI算法将效用信息保存到效用列表中,通过效用列表的比较来挖掘出所有的高平均效用值,同时FHAUI算法还采用了一个二维矩阵来有效减少二项效用值的连接比较次数。最后将FHAUI算法在多个经典的数据集上测试。实验结果表明,FHAUI算法在效用列表的连接比较次数上有了极大的降低,同时其时间性能也有非常大提高。相似文献

19.

一种高效用项集并行挖掘算法

下载免费PDF全文

宋威吉红蕾李晋宏《计算机工程与科学》2015,37(3):422-428

由于能反映用户的偏好,可以弥补传统频繁项集挖掘仅由支持度来衡量项集重要性的不足,高效用项集正在成为当前数据挖掘研究的热点。为使高效用项集挖掘更好地适应数据规模不断增大的实际需求,提出了一种高效用项集的并行挖掘算法PHUI-Mine。提出了记录挖掘高效用项集信息的DHUI-树结构,描述了DHUI-树的构造方法,论证了DHUI-树的动态剪枝策略。在此基础上,给出了高效用项集挖掘的并行算法描述。实验结果表明,PHUI-Mine算法具有较高的挖掘效率及较低的存储开销。相似文献

20.

Mining top-k maximal reference sequences from streaming web click-sequences with a damped sliding window

Hua-Fu Li 《Expert systems with applications》2009,36(8):11304-11311

Online mining of path traversal patterns from continuous Web click streams is one of the challenging research problems of Web usage mining. Most of previous works focus on mining path traversal patterns over the entire history of Web click streams. Mining the recent changes of Web click streams can provide valuable information for the analysis of the Web click streams. In this paper, we propose a new, online mining algorithm, called Top-DSW (top-k path traversal patterns of stream Damped Sliding Window), to discover the set of top-k path traversal patterns from streaming maximal forward references, where k is the desired number of path traversal patterns to be mined. An effective summary data structure, called TKP-DSW-list (a list of top-k path traversal patterns of stream Damped Sliding Windows) is developed to maintain the essential information about the top-k path traversal patterns from the maximal forward references within a stream damped sliding window. An effective space pruning mechanism, called TKR-list-maintain, is developed to control the memory requirement of the TKP-DSW-list. Experimental studies show that the proposed Top-DSW algorithm is an efficient, single-pass algorithm for online mining of the set of top-k path traversal patterns over stream damped sliding windows. 相似文献