首页 | 官方网站   微博 | 高级检索  
     

一种矩阵和排序索引关联规则数据挖掘算法
引用本文:刘彦戎,杨云.一种矩阵和排序索引关联规则数据挖掘算法[J].计算机技术与发展,2021(2).
作者姓名:刘彦戎  杨云
作者单位:陕西国际商贸学院信息工程学院;陕西科技大学电子信息与人工智能学院
基金项目:陕西省重点研发计划(2019NY-185);陕西省自然科学基金(2017JM6111)。
摘    要:在关联规则挖掘算法中,Apriori由于多次对数据库进行扫描会产生较多的候选集,在多次扫描数据库的情况下容易产生I/O开销问题,并引起数据挖掘效率低。矩阵关联规则在数据挖掘过程中没有删除非频繁项集,致使存在较多的无效扫描,对于挖掘效率的提高也不明显。该文提出了一种改进的矩阵和排序索引关联规则数据挖掘算法,首先,删除不需要的事务和项,通过矩阵相乘和查找表获得频繁的二项式集合,结合排序索引得到剩下的频繁k-项集。与矩阵关联规则算法和Apriori算法进行比较,提出的算法可以直接查找频繁项集并对数据库进行扫描,当产生频繁项集比较多或者数据库需要进行动态更新时,该算法具有较好的可行性和执行效率。实验表明,提出的矩阵排序索引算法很好地降低了内存的使用率和I/O的开销,提高了数据挖掘的效率且具有较好的可扩展性。

关 键 词:数据挖掘  关联规则  APRIORI算法  矩阵算法  排序索引  序列标记

A Data Mining Algorithm for Matrix and Sort Index Association Rules
LIU Yan-rong,YANG Yun.A Data Mining Algorithm for Matrix and Sort Index Association Rules[J].Computer Technology and Development,2021(2).
Authors:LIU Yan-rong  YANG Yun
Affiliation:(School of Information and Engineering,Shaanxi Institute of International Trade&Commerce,Xi’an 712000,China;School of Electronic Information and Artificial Intelligence,Shaanxi University of Science&Technology,Xi’an 710021,China)
Abstract:In the association rule mining algorithm,Apriori is prone to I/O overhead and low efficiency of data mining due to the fact that multiple scans of the database will generate many candidate sets.Matrix association rules do not delete infrequent item sets in the data mining process,resulting in many invalid scans,and the improvement of mining efficiency is not obvious.An improved data mining algorithm for matrix and sorted index association rules is proposed.First,unwanted transactions and items are deleted,frequent binomial sets are obtained by matrix multiplication and lookup tables,and the remaining frequent k-item sets are obtained by combining the sorted index.Compared with the matrix association rule algorithm and Apriori algorithm,the proposed algorithm can directly find frequent item sets and scan the database.When there are more frequent item sets or the database needs to be dynamically updated,the proposed algorithm has better feasibility and execution efficiency.Experiment shows that the proposed algorithm reduces memory utilization and I/O overhead,improves data mining efficiency and has better scalability.
Keywords:data mining  association rules  Apriori algorithm  matrix algorithm  sorting index  sequence marker
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号