期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Frequent Closed Sequence Mining without Candidate Maintenance

Jianyong Wang Jiawei Han Chun Li 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(8):1042-1056

Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and- test paradigm, which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called Bl-Directional Extension and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method. A thorough performance study with both sparse and dense, real, and synthetic data sets has demonstrated that BIDE significantly outperforms the previous algorithm: It consumes an order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size. 相似文献

2.

Incremental sequence-based frequent query pattern mining from XML queries

Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou 《Data mining and knowledge discovery》2009,18(3):472-516

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i ⁺, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly. 相似文献

3.

Finding hot query patterns over an XQuery stream

Liang?Huai?Yang Email author Mong?Li?Lee Wynne?Hsu 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(4):318-332

Caching query results is one efficient approach to improving the performance of XML management systems. This entails the discovery of frequent XML queries issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine the frequent query patterns over the query stream. To facilitate the one-pass mining process, we devise a novel data structure called DTS to summarize the pattern trees seen so far. By grouping the incoming pattern trees into batches, we can dynamically mark the active portion of the current batch in DTS and limit the enumeration of candidate trees to only the currently active pattern trees. We also design another summary data structure called ECTree that provides for the incremental computation of the frequent tree patterns over the query stream. Based on the above two constructs, we present two mining algorithms called XQSMinerI and XQSMinerII. XQSMinerI is fast, but it tends to overestimate, while XQSMinerII adopts a filter-and-refine approach to minimize the amount of overestimation. Experimental results show that the proposed methods are both efficient and scalable and require only small memory footprints.Received: 17 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein. 相似文献

4.

基于频繁叶模式的XML最大频繁查询模式挖掘算法

陈超祥丁健龙华成金林樵《计算机应用与软件》2009,26(6):85-87,197

在XML频繁查询模式挖掘稠密数据集、长数据集中,为克服项目集挖掘过程中挖掘的项目过多、不利于结果利用等问题,提出基于频繁叶模式的最大频繁查询模式挖掘算法MFRSTMiner。该算法通过构造频繁模式扩展森林,在扩展森林的叶节点中挖掘出最大频繁子树。试验结果表明该算法能够有效地挖掘动态事务集的最大频繁查询模式。相似文献

5.

Bottom-up discovery of frequent rooted unordered subtrees

Yijun Bei Gang Chen Lidan Shou Xiaoyan Li Jinxiang Dong 《Information Sciences》2009,179(1-2):70-88

In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering frequent tree patterns over tree-structured data has become an interesting topic for XML data management. In this paper, we propose a tree mining algorithm, named BUXMiner, for finding a special class of frequent trees, called rooted unordered trees, from a tree-structured database. BUXMiner employs an efficient bottom-up approach to enumerate all candidate trees over a compact global tree guide and computes the frequent trees based on the tree guide. In addition to BUXMiner, we also propose a mining approach called BUMXMiner to discover the maximal frequent rooted unordered trees. We compare BUXMiner with previous tree-structure mining algorithms, namely XQPMinerTID and FastXMiner, which were also proposed to discover rooted unordered trees. The experimental results show that our algorithm outperforms XQPMinerTID and FastXMiner in terms of efficiency. The performance results from real-world applications also indicate the usefulness of our proposed tree mining algorithms in a variety of web applications, such as analysis of web page access patterns and mining frequent XML query patterns for caching. 相似文献

6.

Closed inter-sequence pattern mining

《Journal of Systems and Software》2013,86(6):1603-1612

Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases. 相似文献

7.

Mining lossless closed frequent patterns with weight constraints

《Knowledge》2007,20(1):86-97

Frequent pattern mining is one of main concerns in data mining tasks. In frequent pattern mining, closed frequent pattern mining and weighted frequent pattern mining are two main approaches to reduce the search space. Although many related studies have been suggested, no mining algorithm considers both paradigms. Even if closed frequent pattern mining represents exactly the same knowledge and weighted frequent pattern mining provides a way to discover more important patterns, the incorporation of closed frequent pattern mining and weight frequent pattern mining may loss information. Based on our analysis of joining orders, we propose closed weighted frequent pattern mining, and present how to discover succinct but lossless closed frequent pattern with weight constraints. To our knowledge, ours is the first work specifically to consider both constraints. An extensive performance study shows that our algorithm outperforms previous algorithms. In addition, it is efficient and scalable. 相似文献

8.

改进的XML小枝模式匹配方法

下载免费PDF全文

魏东平朱新向吴玉雁《计算机工程与应用》2013,49(8):125-128

近年来, XML数据查询成为一个重要的研究课题。处理小枝查询是XML查询实现的核心操作,针对小枝模式查询,提出了一种改进的小枝模式匹配算法。该算法通过剪去无用的数据流以减少待处理结点的数目,从而节省处理时间,提高查询的准确率。实验结果表明,该算法能够有效提高查询效率。相似文献

9.

基于压缩结构树的XML数据频繁模式挖掘研究

下载免费PDF全文

曹洪其牛天耘孙志挥《计算机工程》2006,32(19):108-110

XML文档频繁模式挖掘是XML相关研究工作中的重要内容。在现有的频繁树结构挖掘算法WL的基础上，提出了一种高效的基于压缩结构树存储结构的XML数据频繁模式挖掘算法AFPMX_CST。该算法压缩了搜索空间，减少了扫描次数，相对于WL算法在时间效率和空间效率方面具有更加良好的性能。同时，该文进一步研究了将挖掘结果转换为相应的DTD格式的方法及过程。实验结果表明AFPMX_CST算法是可行和有效的。相似文献

10.

关联规则中基于降维的最大频繁模式挖掘算法

钱雪忠惠亮《计算机应用》2011,31(5):1339-1343

基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、产生大量候选最大频繁项集等问题,在分析FPMax、DMFIA算法的基础上,提出基于降维的最大频繁模式挖掘算法(BDRFI)。该算法改传统的FP-tree为数字频繁模式树DFP-tree,提高了超集检验的效率;采用的预测剪枝策略减少了挖掘的次数;基于降低项集维度的挖掘方式,减少了候选项的数目,避免了递归地产生条件频繁模式树,提高了算法的效率。实验结果表明,BDRFI的效率是同类算法的2~8倍。相似文献

11.

An efficient algorithm of frequent XML query pattern mining for ebXML applications in e-commerce

Tsui-Ping Chang Shih-Ying Chen 《Expert systems with applications》2012,39(2):2183-2193

Providing efficient query to XML data for ebXML applications in e-commerce is crucial, as XML has become the most important technique to exchange data over the Internet. ebXML is a set of specifications for companies to exchange their data in e-commerce. Following the ebXML specifications, companies have a standard method to exchange business messages, communicate data, and business rules in e-commerce. Due to its tree-structure paradigm, XML is superior for its capability of storing and querying complex data for ebXML applications. Therefore, discovering frequent XML query patterns has become an interesting topic for XML data management in ebXML applications. In this paper, we present an efficient mining algorithm, namely ebXMiner, to discover the frequent XML query patterns for ebXML applications. Unlike the existing algorithms, we propose a new idea by collecting the equivalent XML queries and then enumerating the candidates from infrequent XML queries in our ebXMiner. Furthermore, our simulation results show that ebXMiner outperforms other algorithms in its execution time. 相似文献

12.

Tree pattern mining with tree automata constraints

Sandra de Amo Nyara A. Silva Ronaldo P. Silva Fabiola S. Pereira 《Information Systems》2010

Most work on pattern mining focuses on simple data structures such as itemsets and sequences of itemsets. However, a lot of recent applications dealing with complex data like chemical compounds, protein structures, XML and Web log databases and social networks, require much more sophisticated data structures such as trees and graphs. In these contexts, interesting patterns involve not only frequent object values (labels) appearing in the graphs (or trees) but also frequent specific topologies found in these structures. Recently, several techniques for tree and graph mining have been proposed in the literature. In this paper, we focus on constraint-based tree pattern mining. We propose to use tree automata as a mechanism to specify user constraints over tree patterns. We present the algorithm CoBMiner which allows user constraints specified by a tree automata to be incorporated in the mining process. An extensive set of experiments executed over synthetic and real data (XML documents and Web usage logs) allows us to conclude that incorporating constraints during the mining process is far more effective than filtering the interesting patterns after the mining process. 相似文献

13.

Efficient mining of frequent XML query patterns with repeating-siblings

《Information and Software Technology》2008,50(5):375-389

A recent approach to improve the performance of XML query evaluation is to cache the query results of frequent query patterns. Unfortunately, discovering these frequent query patterns is an expensive operation. In this paper, we develop a two-pass mining algorithm 2PXMiner that guarantees the discovery of frequent query patterns by scanning the database at most twice. By exploiting a transaction summary data structure, and an enumeration tree, we are able to determine the upper bounds of the frequencies of the candidate patterns, and to quickly prune away the infrequent patterns. We also design an index to trace the repeating candidate subtrees generated by sibling repetition, thus avoiding redundant computations. Experiments results indicate that 2PXMiner is both efficient and scalable. 相似文献

14.

有效挖掘闭合组合序列模式

闫雷鸣孙志挥张柏礼杨明姚蓓《计算机科学》2010,37(6):186-190

序列模式的挖掘是近年来的研究热点之一,目前很多研究都集中在闭合频繁项集与闭合序列模式的挖掘,较少涉及更加复杂、有重要应用价值的组合序列模式.针对任意长度和任意组合次数的频繁组合序列模式,提出了一种挖掘全部闭合的组合序列的算法CloCSP.为克服指数量级的候选序列进行闭合检验的困难,提出了既能生成频繁组合序列,又能有效剪枝,并同时完成闭合检验的混合扩展策略,该策略无需维护候选集.实验表明,CloCSP算法能够有效挖掘出隐藏在序列数据中,尤其是稠密数据集内的闭合组合序列模式,有助于揭示更加复杂的序列模式. 相似文献

15.

不产生候选的快速投影频繁模式树挖掘算法 总被引：8，自引：0，他引：8

何炎祥向剑文朱骁峰孔维强《计算机科学》2002,29(11):71-75

1.概述近年来,对事务数据库、时序数据库和各种其它类型数据库中的频繁模式挖掘的研究越来越普及。许多先前的研究都是采用Apriori或类似的候选产生—检查迭代算法,使用候选项集来找频繁项集。这些算法都基于一种重要的反单调的Apriori性质:任何非频繁的(k—1)-项集都不可能是频繁k-项集的子集。因此,如果一个候选k-项集的(k—1)-子集不在频繁(k—1)-项集中,则该候选也不可能是频繁的,从而可相似文献

16.

关联规则中FP-tree的最大频繁模式非检验挖掘算法 总被引：1，自引：0，他引：1

惠亮钱雪忠《计算机应用》2010,30(7):1922-1925

基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、做超集检验等问题,在分析DMFIA-1算法的基础上,提出了最大频繁模式的非检验挖掘算法NCMFP。该算法改进了FP-tree的结构,使挖掘过程中不需要生成条件频繁模式树也不需要超集检验。算法采用的预测剪枝策略减少了挖掘的次数,采用的求取公共交集的方式保证了挖掘结果的完整性。实验结果表明在支持度相对较小情况下,NCMFP的效率是同类算法的2~5倍。相似文献

17.

A weighted common structure based clustering technique for XML documents

Jeong Hee Hwang Author Vitae 《Journal of Systems and Software》2010,83(7):1267-1274

XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach. 相似文献

18.

DTD约束下的树模式查询的一致性判断

张剑妹陶世群《计算机应用》2008,28(11):2961-2963

树模式查询被广泛地应用XML数据查询中。树模式查询的一致性判断可以避免不必要的计算,节省查询时间,从而提高查询效率。给出了查询一致性的定义,基于子路径的概念,提出文档类型定义(DTD)约束下的树模式查询的一致性判断算法,并对算法的时间复杂度进行了分析。通过分析比较,该算法是有效的。相似文献

19.

挖掘最大频繁模式的新方法 总被引：11，自引：0，他引：11

刘君强孙晓莹王勋潘云鹤《计算机学报》2004,27(10):1328-1334

由于其内在的计算复杂性，挖掘密集型数据集的频繁模式完全集非常困难，解决方案之一是挖掘最大频繁模式集．该文在频繁模式完全集挖掘算法Opportune Project基础上，提出了挖掘最大频繁模式的新算法MOP．它采用宽度与深度优先相结合的混合搜索策略，能恰当地选择不同的支持集表示和投影方法，将闭合性剪裁和一般性剪裁相结合，并适时前窥，实现搜索与剪裁效率最优化．实验表明，MOP效率是MaxMiner的2～8倍，比MAFIA高2个数量级以上．相似文献

20.

用变异FP-树改进CLOSET算法

刘迎意吴春旭沈陵峰《计算机仿真》2010,27(3):98-101

频繁闭项集提供了频繁项集的一种完整、最小表示,对频繁闭项集的挖掘是近年来数据挖掘领域研究的热点,研究人员从不同角度对算法改进以提高算法的效率。基于频繁项集中共生项集的性质,提出无须进行子集检查的频繁闭项集挖掘方法,并设计一种变异的FP-树结构,利用FP-树结构来存储结点共生项集信息,以改进CLOSET算法,算法无须遍历结果集进行闭合性检查。实验表明,在支持度阈值减小,结果集变大时,改进算法的时间增长率比原有算法小。相似文献