首页 | 官方网站   微博 | 高级检索  
     

海量文本数据库中的高效并行频繁项集挖掘方法
引用本文:王永恒,杨树强,贾焰.海量文本数据库中的高效并行频繁项集挖掘方法[J].计算机工程与科学,2007,29(9):110-113.
作者姓名:王永恒  杨树强  贾焰
作者单位:国防科技大学计算机学院,湖南,长沙,410073
基金项目:国家高技术研究发展计划(863计划)
摘    要:针对大规模文本数据库中频繁项集挖掘的特殊要求,本文提出了一种新的并行挖掘算法parFIM。parFIM以一种简单的数据结构H-Struct为基础,对数据进行纵向划分从而实现并行挖掘。算法同时考虑了去除短模式和减少重复模式。实验结果表明,parFIM能够很好地适用于大规模文本数据库中的频繁项集挖掘任务。

关 键 词:文本挖掘  海量文本数据库  频繁项集  并行
文章编号:1007-130X(2007)09-0110-04
修稿时间:2005-12-072006-04-21

An Efficient Method for the Parallel Mining of Frequent Itemsets in Very Large Text Databases
WANG Yong-heng,YANG Shu-qiang,JIA Yan.An Efficient Method for the Parallel Mining of Frequent Itemsets in Very Large Text Databases[J].Computer Engineering & Science,2007,29(9):110-113.
Authors:WANG Yong-heng  YANG Shu-qiang  JIA Yan
Affiliation:School of Computer Science, National University of Defeme Technology,Changsha 410073,China
Abstract:Frequent itemset mining is a common and useful task in data mining. It is also important in text mining. But most of the current mining algorithms can not be used in very large text databases. In order to solve the special problems in frequent itemsets mining in very large text databases,we propose a new parallel mining algorithm parFIM. Based on a simple data structure H-Struct, parFIM mines in parallel by partitioning data vertically. Removing short patterns and reducing duplicated patterns are also considered. Our experiment shows parFIM can suit the frequent itemset mining task well in very large text databases.
Keywords:text mining  very large text database  frequent itemset  parallel
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号