一种矩阵和排序索引关联规则数据挖掘算法 A Data Mining Algorithm for Matrix and Sort Index Association Rules期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种矩阵和排序索引关联规则数据挖掘算法

引用本文：	刘彦戎,杨云.一种矩阵和排序索引关联规则数据挖掘算法[J].计算机技术与发展,2021(2).

作者姓名：	刘彦戎杨云

作者单位：	陕西国际商贸学院信息工程学院;陕西科技大学电子信息与人工智能学院

基金项目：	陕西省重点研发计划(2019NY-185);陕西省自然科学基金(2017JM6111)。

摘要：	在关联规则挖掘算法中,Apriori由于多次对数据库进行扫描会产生较多的候选集,在多次扫描数据库的情况下容易产生I/O开销问题,并引起数据挖掘效率低。矩阵关联规则在数据挖掘过程中没有删除非频繁项集,致使存在较多的无效扫描,对于挖掘效率的提高也不明显。该文提出了一种改进的矩阵和排序索引关联规则数据挖掘算法,首先,删除不需要的事务和项,通过矩阵相乘和查找表获得频繁的二项式集合,结合排序索引得到剩下的频繁k-项集。与矩阵关联规则算法和Apriori算法进行比较,提出的算法可以直接查找频繁项集并对数据库进行扫描,当产生频繁项集比较多或者数据库需要进行动态更新时,该算法具有较好的可行性和执行效率。实验表明,提出的矩阵排序索引算法很好地降低了内存的使用率和I/O的开销,提高了数据挖掘的效率且具有较好的可扩展性。
关键词：	数据挖掘关联规则 APRIORI算法矩阵算法排序索引序列标记
A Data Mining Algorithm for Matrix and Sort Index Association Rules

LIU Yan-rong,YANG Yun.A Data Mining Algorithm for Matrix and Sort Index Association Rules[J].Computer Technology and Development,2021(2).

Authors:	LIU Yan-rong YANG Yun

Affiliation:	(School of Information and Engineering,Shaanxi Institute of International Trade&Commerce,Xi’an 712000,China;School of Electronic Information and Artificial Intelligence,Shaanxi University of Science&Technology,Xi’an 710021,China)

Abstract:	In the association rule mining algorithm,Apriori is prone to I/O overhead and low efficiency of data mining due to the fact that multiple scans of the database will generate many candidate sets.Matrix association rules do not delete infrequent item sets in the data mining process,resulting in many invalid scans,and the improvement of mining efficiency is not obvious.An improved data mining algorithm for matrix and sorted index association rules is proposed.First,unwanted transactions and items are deleted,frequent binomial sets are obtained by matrix multiplication and lookup tables,and the remaining frequent k-item sets are obtained by combining the sorted index.Compared with the matrix association rule algorithm and Apriori algorithm,the proposed algorithm can directly find frequent item sets and scan the database.When there are more frequent item sets or the database needs to be dynamically updated,the proposed algorithm has better feasibility and execution efficiency.Experiment shows that the proposed algorithm reduces memory utilization and I/O overhead,improves data mining efficiency and has better scalability.

Keywords:	data mining association rules Apriori algorithm matrix algorithm sorting index sequence marker
本文献已被维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏