期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient data mining for calling path patterns in GSM networks

Anthony J. T. Lee Yao-Te Wang 《Information Systems》2003,28(8):929-948

In this paper, we explore a new data mining capability that involves mining calling path patterns in global system for mobile communication (GSM) networks. Our proposed method consists of two phases. First, we devise a data structure to convert the original calling paths in the log file into a frequent calling path graph. Second, we design an algorithm to mine the calling path patterns from the frequent calling path graph obtained. By using the frequent calling path graph to mine the calling path patterns, our proposed algorithm does not generate unnecessary candidate patterns and requires less database scans. If the corresponding calling path graph of the GSM network can be fitted in the main memory, our proposed algorithm scans the database only once. Otherwise, the cellular structure of the GSM network is divided into several partitions so that the corresponding calling path sub-graph of each partition can be fitted in the main memory. The number of database scans for this case is equal to the number of partitioned sub-graphs. Therefore, our proposed algorithm is more efficient than the PrefixSpan and a priori-like approaches. The experimental results show that our proposed algorithm outperforms the a priori-like and PrefixSpan approaches by several orders of magnitude. 相似文献

2.

Mining closed patterns in multi-sequence time-series databases

Anthony J.T. Huei-Wen Tzu-Yu Ying-Ho Kuo-Tay 《Data & Knowledge Engineering》2009,68(10):1071-1090

In this paper, we propose an efficient algorithm, called CMP-Miner, to mine closed patterns in a time-series database where each record in the database, also called a transaction, contains multiple time-series sequences. Our proposed algorithm consists of three phases. First, we transform each time-series sequence in a transaction into a symbolic sequence. Second, we scan the transformed database to find frequent patterns of length one. Third, for each frequent pattern found in the second phase, we recursively enumerate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of enumeration, we apply several efficient pruning strategies to remove frequent but non-closed patterns. Thus, the CMP-Miner algorithm can efficiently mine the closed patterns from a time-series database. The experimental results show that our proposed algorithm outperforms the modified Apriori and BIDE algorithms. 相似文献

3.

An individual-based spatio-temporal travel demand mining method and its application in improving rebalancing for free-floating bike-sharing system

《Advanced Engineering Informatics》2021

After the rapid expansion in the early stage, many enterprises have closed down or withdrawn from the Free-floating bike sharing (FFBS) market, the remaining few giants are also generally at a loss at present. One main reason causing these is the serious FFBS imbalance between supply and demand. As there is no fixed docking station, the individual-based identification method such as trip-chain, which is used for identifying travel demand and behaviour in traditional station-based bike sharing (SBBS) cannot be used in FFBS. Therefore, the lack of methods to obtain in-depth demand makes it unable to achieve reasonable rebalancing. This study constructs an individual-based spatio-temporal travel demand mining methodology, which is the first disaggregate travel demand mining model that suitable for FFBS. The proposed methodology consists of three steps. A spatio-temporal trajectory clustering algorithm is first developed to obtain an individual’s frequent trajectory clusters, and then a sequential pattern mining algorithm is applied to users who have multiple spatio-temporal trajectory clusters to extract travel patterns and trajectory sequential relations in patterns. A point clustering method is used finally to identify spatial relationships among different trajectory clusters. Besides, a zone aggregating method is proposed that aggregated granularity could be flexible adjusted for zone demand imbalance analysis. Based on these, how to utilize identified frequent pattern trajectories to improve rebalancing is studied. The proposed methodology is applied to Beijing Mobike dataset, six frequent travel patterns are mined out and analyzed in detail. On this basis, imbalance and rebalancing analysis are carried out with the case study at last. Consequently, this research contributes a powerful tool to achieve accurate FFBS demand analysis and rebalancing. 相似文献

4.

Mining frequent closed patterns in pointset databases

Anthony J.T. Lee Wen-Kwang Tsao Po-Yin Chen Ming-Chih Lin Shih-Hui Yang 《Information Systems》2010

In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude. 相似文献

5.

一种挖掘带时间约束序列模式的改进算法

胡学钢张圆圆《智能系统学报》2007,2(2):89-93

针对带时间约束的序列模式，提出了一种改进的挖掘算法TSPM，克服了传统的序列模式挖掘方法时空开销大，结果数量巨大且缺少针对性的缺陷．算法引入图结构表示频繁2序列，仅需扫描一次数据库，即可将与挖掘任务相关的信息映射到图中，图结构的表示使得挖掘过程可以充分利用项目之间的次序关系，提高了频繁序列的生成效率．另外算法利用序列的位置信息计算支持度，降低了处理时间约束的复杂性，避免了反复测试序列包含的过程．实验证明，该算法较传统的序列模式发现算法在时间和空间性能上具有优越性。相似文献

6.

一种高效的增量式序列模式挖掘算法

下载免费PDF全文

刘佳新《计算机工程》2012,38(12):39-41

现有的增量式挖掘算法在支持度发生变化时,需要对序列数据库进行重复挖掘,为减少由此产生的时空消耗,提出一种高效的增量式序列模式挖掘算法。算法采用频繁序列树作为序列存储结构,当序列数据库和最小支持度发生变化时,通过执行更新操作,实现频繁序列树的更新,利用深度优先遍历频繁序列树找到序列数据库中所有的序列模式。实验结果表明,与IncSpan算法和PrefixSpan算法相比,该算法的挖掘效率较高。相似文献

7.

从不确定图中挖掘频繁子图模式 总被引：8，自引：0，他引：8

邹兆年李建中高宏张硕《软件学报》2009,20(11):2965-2976

研究不确定图数据的挖掘,主要解决不确定图数据的频繁子图模式挖掘问题.介绍了一种数据模型来表示图的不确定性,以及一种期望支持度来评价子图模式的重要性.利用期望支持度的Apriori性质,给出了一种基于深度优先搜索策略的挖掘算法.该算法使用高效的期望支持度计算方法和搜索空间裁剪技术,使得计算子图模式的期望支持度所需的子图同构测试的数量从指数级降低到线性级.实验结果表明,该算法比简单的深度优先搜索算法快3～5个数量级,有很高的效率和可扩展性. 相似文献

8.

基于图结构的候选序列生成算法 总被引：3，自引：1，他引：3

郭平刘潭仁《计算机科学》2004,31(1):136-139

先生成候选序列再判断候选序列是否为频繁序列，最后获得频繁序列是序列数据挖掘中基于候选序列挖掘算法的一般结构，如Apriori类算法，GSP算法，SPADE算法等。因此，研究候选序列生成算法具有普遍意义。本文首先研究了序列数据集(序列数据库)与图结构间的关系，证明了一个序列是频繁序列的必要条件是该序列对应于一个完全子图。以此为基础提出了基于图结构的候选序列生成算法，文中给出了算法正确性证明。在T25110D10K和T25120D100K数据集上的挖掘实验表明在本文提出的候选序列生成算法上进行挖掘比用Apriori算法进行挖掘的效率更高。相似文献

9.

从图数据库中挖掘频繁跳跃模式 总被引：4，自引：0，他引：4

刘勇李建中高宏《软件学报》2010,21(10):2477-2493

很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的. 相似文献

10.

频繁模式挖掘的约束算法

孟彩霞《智能系统学报》2009,4(2):142-147

在频繁模式挖掘过程中能够动态改变约束的算法比较少.提出了一种基于约束的频繁模式挖掘算法MCFP.MCFP首先按照约束的性质来建立频繁模式树,并且只需扫描一遍数据库,然后建立每个项的条件树,挖掘以该项为前缀的最大频繁模式,并用最大模式树来存储,最后根据最大模式来找出所有支持度明确的频繁模式.MCFP算法允许用户在挖掘频繁模式过程中动态地改变约束.实验表明,该算法与iCFP算法相比是很有效的. 相似文献

11.

Parallel Bifold: Large-scale parallel pattern mining with constraints

Mohammad El-Hajj Osmar R. Zaïane 《Distributed and Parallel Databases》2006,20(3):225-243

When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than an hour and a half. Recommended by: Ahmed Elmagarmid 相似文献

12.

基于B-list的快速频繁模式挖掘算法

李校林杜托刘彪《计算机应用》2017,37(8):2357-2361

针对现有的频繁模式挖掘算法存在建树复杂、挖掘效率低等问题,提出一种基于构造链表（B-list）的频繁模式挖掘（BLFPM）算法。BLFPM使用一种新的数据结构B-list表示频繁项集,通过连接两个k-1-频繁项集的B-list可以快速得到k-项集的支持度,避免了多次扫描数据库;针对连接两个B-list时间复杂度高的问题,给出了一种线性时间复杂度的连接方法,提高了BLFPM的时间效率;同时,BLFPM采用集合枚举树代表搜索空间,并使用子集非频繁剪枝策略,减小了频繁模式挖掘的搜索空间,提高了算法的执行速度。实验结果表明,与NSFI算法和prepost算法相比,BLFPM的时间效率提高约12%到29%,空间效率提高约10%到24%,对稀疏数据库或稠密数据库进行频繁模式挖掘均可以得到良好的效果。相似文献

13.

关联规则挖掘中对Apriori算法的一种改进研究 总被引：2，自引：0，他引：2

孔芳钱雪忠《计算机工程与设计》2008,29(16)

通过对关联规则挖掘算法的详细分析,提出了一种基于无向项集图的动态频繁项集挖掘算法.当事务数据库和最小支持度发生变化时,该算法只需重新遍历一次无向项集图即可得到新的频繁项集.该算法不仅简单、只需扫描一次数据库,而且还具有搜索速度快、节省内存空间等优点. 相似文献

14.

基于改进FP-树的最大模式挖掘算法 总被引：2，自引：0，他引：2

孟祥萍王华金王贤勇任纪川鞠传香《计算机工程与应用》2005,41(14):179-181,228

频繁模式挖掘是数据挖掘领域中的一个非常重要的分支,但是由于其内在的计算复杂性,挖掘密集型数据的频繁模式完全集非常困难而且数量往往大得惊人,难以理解和应用。最大频繁模式(最大模式)压缩隐含了所有的频繁模式,存储所占用的空间远远小于完全集,因而最大模式挖掘具有十分重要的意义。该文改进了传统的FP-树结构并提出了一种有效的基于改进FP-树的最大模式挖掘算法IFP-M ax;通过引入后缀子树的概念,算法在挖掘过程中不用生成最大频繁模式候选集,从而大大提高了算法的时间效率和空间可伸缩性。实验表明,IFP-M ax的挖掘速度比M AFIA和GenM ax大约快一个数量级。相似文献

15.

最大目标频繁模式挖掘算法研究 总被引：2，自引：0，他引：2

李清勇秦亮曦施智平史忠植《计算机工程与应用》2004,40(33):184-188

传统的频繁模式挖掘算法往往会得到成百上千的结果模式,面对繁多的频繁模式用户通常要经过“二次挖掘”才能得到有用的目标模式。怎样根据用户需求直接挖掘用户感兴趣的目标模式是该文的研究目标。文章在FP-树的基础上设计了紧缩的、非冗余的TFP-树,它能有效过滤与目标模式无关的项和事务,而仅保留与目标模式相关的信息,缩小TFP-树的大小规模。同时根据TFP-树的规律和特点,笔者设计了最大目标频繁模式挖掘算法,算法的结果模式具有以下两个特点:(1)满足用户需求的目标模式;(2)最大模式。该实验结果验证了TFP-树算法是有效的,而且显著改善了FP-树算法的性能。相似文献

16.

一种基于FP-树的最大频繁模式增量更新挖掘算法

李忠哗任春龙何丕廉《计算机应用与软件》2007,24(5):47-49

挖掘关联规则是数据挖掘领域的一个重要研究方向,人们已经提出了许多用于发现数据库中关联规则的算法,但对关联规则的增量维护问题的研究较少.深入分析了增量更新情况,使用了目前较高效的最大频繁模式挖掘算法FP-Max,并对其进行改进.基本思想:①基于FP-树;②考虑了数据集中,数据增加情况下FP-树的更新;③对FP-Max算法进行改进来更新、维护已经挖掘出来的最大频繁模式. 相似文献

17.

一种序列模式发现的新方法*

胡学钢张圆圆《计算机应用研究》2008,25(4):1003-1005

针对序列模式挖掘,提出频繁2序列图(F2SG)来表示数据库中的序列信息,通过扫描一次数据库,将与挖掘任务相关的信息映射到F2SG中,并在此基础上提出一种新的序列模式发现算法——GBSP。GBSP算法充分利用F2SG中表示的项目之间的次序关系进行频繁序列挖掘,提高了其生成效率。理论分析与实验表明,该算法较传统的序列模式发现算法在时间和空间性能上具有优越性。相似文献

18.

一种改进的加权频繁项集挖掘算法

下载免费PDF全文

王艳薛海燕李玲玲孙新德《计算机工程与应用》2010,46(23):135-137

FP-growth算法是挖掘频繁项集的经典算法,它利用FP-树这种紧凑的数据结构存储事务数据库与频繁项集挖掘相关的全部信息,但对于挖掘加权频繁项集并不合适。分析了现有加权频繁项集挖掘算法中存在的问题,并对FP-树进行改进,构造新的加权FP-树,提出了有效挖掘加权频繁项集的算法。最后举例说明了算法的挖掘过程,并通过实验验证了算法的有效性。相似文献

19.

一种改进的闭图挖掘算法

郭景峰陈晓赵丽邹晓红《计算机研究与发展》2009,46(Z2)

频繁子图挖掘是各种图挖掘的基础和瓶颈,为了提高频繁子图挖掘算法的效率,在频繁闭图方法的基础上提出了一种新算法BPCG.首先使用了一种新结构表存储频繁子图集,从而不需扫描图集就可直接扩展最频繁邻接边及计算支持度阈值;然后算法又利用兄弟剪枝策略和删除局部频繁边,缩小搜索空间并减少不必要的操作.通过实验证明,算法优于其他子图挖掘算法. 相似文献

20.

Mining spatial association rules in image databases 总被引：2，自引：0，他引：2

Anthony J.T. Lee Ruey-Wen Hong Wen-Kwang Tsao 《Information Sciences》2007,177(7):1593-1608

In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm. 相似文献