首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
空间co-location模式表示的是空间对象的实例在一个相同的区域内频繁地进行空间并置。人们已经对确定和不确定数据co-location模式挖掘做了很多工作,也有很多成果,但对极大co-location模式挖掘研究较少,特别是针对模糊对象的极大co-location模式挖掘研究还未见报道。提出Mevent-tree算法来挖掘模糊对象的极大co-location模式,首先为每个对象构建空间对象树,从而得到候选模式,然后为候选模式集构建HUT树,最后在HUT树中从阶数最大的候选模式开始到阶数2为止,深度优先搜索极大co-location模式并在得到极大模式后对HUT树剪枝。接着提出两个改进算法,包括预处理阶段模糊对象的剪枝算法和在构造HUT树之前co-location候选模式的剪枝算法。最后通过大量实验验证了Mevent-tree算法和改进算法的效果和效率。  相似文献   

2.
中文评论中产品特征挖掘的剪枝算法研究   总被引:2,自引:0,他引:2       下载免费PDF全文
李实  李秋实 《计算机工程》2011,37(23):43-45
针对中文网络客户评论中的产品特征挖掘问题,提出一种基于Apriori算法的非监督挖掘方法。利用Apriori算法挖掘候选特征集合,设计邻近规则剪枝算法和最小独立支持度剪枝算法,并通过实验确定邻近规则距离值和最小独立支持度。实验结果表明,这2种剪枝算法均能有效提高产品特征挖掘的查准率和查全率。  相似文献   

3.

给出项权值变化的数据模型形式化表示, 构建新的加权项集剪枝策略及其模式评价框架SCCI (supportconfidence-correlation-interest), 提出基于项权值变化和SCCI 评价框架的加权正负关联规则挖掘算法. 该算法考虑了项权值变化的数据特点, 采用新的剪枝方法和评价框架, 通过项集权值简单计算和比较, 挖掘有效的加权正负关联规则. 实验结果表明, 该算法能够有效地减少候选项集数量和挖掘时间, 挖掘出有趣的关联模式, 避免无效模式出现, 挖掘效率高于相比较的现有算法, 解决了项权值变化的加权负模式挖掘问题.

  相似文献   

4.
提出一种基于FP—tree的最大频繁项目挖掘算法DMFIA—D,该算法运用双向搜索策略。根据FP—tree构造特征自顶向下选取最大频繁候选项集,自底向上对候选项集进行计数、剪枝最终确定最大频繁项目集。由于减少了最大频繁候选集,并对候选集进行有效剪枝,从而缩短算法的挖掘时间,提高挖掘效率。  相似文献   

5.
该文主要对移动日志数据库不断更新的问题,提出了增量挖掘的方法,挖掘用户的移动模式。其主要思想是利用原来数据库的挖掘结果,通过将候选模式分成两部分,并适当剪枝,计算在原来数据库和新增数据库中的支持度得到频繁模式,从而提高挖掘效率。  相似文献   

6.
本文提出一种基于ESEquivPS(扩展支持度相等性剪枝策略)的封闭频繁项集挖掘算法ECFIMA。该算法采用深度优先和广度优先相结合的策略访问搜索空间,使用垂直位图向量格式存储表示项集和事务数据库,同时利用基本剪枝策略、相等性剪枝策略、扩展支持度相等性剪枝策略1和扩展支持度相等性剪枝策略2进行候选空间剪枝,并采用多种不同特性的测试数据集进行实验。实验结果表明,ECFIMA算法是一种高效的封闭频繁项集挖掘算法,在多种测试数据集上性能都优于CHARM算法,尤其是在拥有大量长的封闭频繁项集的测试数据集上,效率比CHARM算法提高约2~3倍。  相似文献   

7.
为了高效地从海量物流数据中获取频繁路径,根据物流网络及物流的特征设计了一种物流数据模型以及一种充分考虑了物流网络拓扑信息的频繁路径序列挖掘算法PMWTI(Path Mining With Topology Information).在PMWTI中设计了一种用于候选路径序列深度剪枝的代价容忍度剪枝方法,该方法在利用Apriori性质剪枝的基础上进一步去除了部分不可能是频繁路径序列的候选路径序列,这在一定程度上缩减了候选路径序列规模,从而减少了对数据集的扫描.实验表明,相比没有采用该剪枝方法的同等算法,PMWTI具有更高的频繁路径挖掘效率.  相似文献   

8.
KNN查询是多媒体数据库管理系统中最具代表性的查询方式之一。与范围查询不同,KNN查询过程中缺乏固定的剪枝阈值。为达到剪枝的目的KNN算法使用保守的KNN距离剪枝,通常把到当前访问过的第K个最近点的距离作为剪枝阈值。传统的KNN查询处理算法在找到K个候选查询结果之前无法生成剪枝阈值,使得在此期间所有访问到的节点都被置入待访问节点队列。文中提出了在KNN查询处理中预估剪枝阈值的方法,该方法在找到K个候选查询结果前通过分析当前所访问过的页区域来预估剪枝阈值,试验表明使用预估剪枝阈值进行剪枝可有效缩短待访问节点队列的长度。  相似文献   

9.
挖掘关联规则是目前数据挖掘领域热点研究话题之一。它的目的在于在数据库中挖掘有趣的关联规则。在关联规则分析及Apriori算法分析上,针对Apriori算法的瓶颈问题,许多有效的改进算法被提出。文中提出了QPCA算法。该算法利用矩阵分析的方法,仅需要扫描数据库一次,同时此算法优化了连接和剪枝操作,通过快速的剪枝和连接可以很快地获取最少的候选项集,避免了频繁项集之间的重复判断连接,因此大大提高了算法的效率。实验结果表明,该算法在挖掘时间上有很大提高。  相似文献   

10.
张坤  陈越  朱扬勇 《计算机工程》2007,33(19):69-71
在已有模式的基础上,该文挖掘出了新的模式,减少了挖掘原始数据库次数,指出了IncSpan+算法存在的问题,说明了基于半频繁模式的增量挖掘算法的缺陷,提出了一种增量序列模式挖掘算法。该算法构造了前缀树表示序列模式,并用广度剪枝和深度剪枝维护该前缀树的结构。实验表明,该算法具有良好的性能。  相似文献   

11.
Incremental frequent itemset mining refers to the maintenance and utilization of the knowledge discovered in the previous mining operations for later frequent itemset mining. This paper describes an incremental algorithm for maintaining the generator representation in dynamic datasets. The generator representation is a kind of lossless, concise representation of the set of frequent itemsets. It may be orders of magnitude smaller than the set of frequent itemsets. Furthermore, the algorithm utilizes a novel optimization based on generator borders for the first time in the literature. Generator borders are the borderline between frequent generators and other itemsets. New frequent generators can be generated through monitoring them. Extensive Experiments show that this algorithm is more efficient than previous solutions.  相似文献   

12.
A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy. Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.
Guimei LiuEmail:
  相似文献   

13.
工作流挖掘技术能够从系统的执行日志中构建出过程,大部分过程挖掘方法都使用了一种图形化的方式来表示模型,也就是控制流图.讨论了工作流模式图挖掘,它实际上是工作流挖掘的一种扩展;对其中所涉及的问题进行了剖析,并介绍了一种模式图挖掘算法.  相似文献   

14.
A metapattern (also known as a metaquery) is a new approach for integrated data mining systems. As opposed to a typical “toolbox”-like integration, where components must be picked and chosen by users without much help, metapatterns provide a common representation for inter-component communication as well as a human interface for hypothesis development and search control. One weakness of this approach, however, is that the task of generating fruitful metapatterns is still a heavy burden for human users. In this paper, we describe a metapattern generator and an integrated discovery loop that can automatically generate metapatterns. Experiments in both artificial and real-world databases have shown that this new system goes beyond the existing machine learning technologies, and can discover relational patterns without requiring humans to pre-label the data as positive or negative examples for some given target concepts. With this technology, future data mining systems could discover high-quality, human-comprehensible knowledge in a much more efficient and focused manner, and data mining could be managed easily by both expert and less-expert users  相似文献   

15.
Experiencing SAX: a novel symbolic representation of time series   总被引:15,自引:3,他引:15  
Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.  相似文献   

16.
Web挖掘研究综述   总被引:33,自引:0,他引:33  
论文介绍了Web挖掘的概念,指出了Web挖掘中存在的问题,给出了Web挖掘研究的三种分类:Web内容挖掘、Web结构挖掘、Web使用挖掘,针对每一种分类介绍了各自的研究对象、表示方法、处理方法、应用领域及最近的研究情况,同时展望了Web挖掘的未来研究方向。  相似文献   

17.
18.
图像挖掘技术研究   总被引:9,自引:0,他引:9  
对目前图像挖掘的研究及应用现状进行综述,首先阐明图像数据的特点及图像挖掘的主要问题,随后分析了图像的表示模型、图像挖掘的框架模型,介绍了图像挖掘的主要技术,最后对图像挖掘应用和未来研究方向进行展望。  相似文献   

19.
面向信息挖掘的XML知识表示方法研究   总被引:2,自引:0,他引:2  
在研究了基于XML的知识发现过程的基础上,作者提出了XML的知识表示方法。论文重点探讨了在信息挖掘的过程中,使用XML实现的三个方面知识表示:数据预处理的知识表示;挖掘算法的知识表示;挖掘结果的知识表示。最后以聚类分析为例,给出了知识表示的应用并总结了XML知识表示方法的优点。  相似文献   

20.
细粒度意见挖掘的主要目标是从观点文本中获取情感要素并判断情感倾向。现有方法大多基于序列标注模型,但很少利用情感词典资源。该文提出一种基于领域情感词典特征表示的细粒度意见挖掘方法,使用领域情感词典在观点文本上构建特征表示并将其加入序列标注模型的输入部分。首先构建一份新的电商领域情感词典,然后在电商评论文本真实数据上,分别为条件随机场(CRF)和双向长短期记忆-条件随机场(BiLSTM-CRF)这两种常用序列标注模型设计基于领域情感词典的特征表示。实验结果表明,基于电商领域情感词典的特征表示方法在两种模型上都取得了良好的效果,并且超过其他情感词典。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号