首页 | 官方网站   微博 | 高级检索  
     

基于显露模式的数据流贝叶斯分类算法
引用本文:杜超,王志海,江晶晶,孙艳歌.基于显露模式的数据流贝叶斯分类算法[J].软件学报,2017,28(11):2891-2904.
作者姓名:杜超  王志海  江晶晶  孙艳歌
作者单位:北京交通大学 计算机与信息技术学院, 北京 100044,北京交通大学 计算机与信息技术学院, 北京 100044,北京交通大学 计算机与信息技术学院, 北京 100044,北京交通大学 计算机与信息技术学院, 北京 100044
基金项目:国家自然科学基金(61672086)
摘    要:基于模式的贝叶斯分类模型是解决数据挖掘领域分类问题的一种有效方法.然而,大多数基于模式的贝叶斯分类器只考虑模式在目标类数据集中的支持度,而忽略了模式在对立类数据集合中的支持度.此外,对于高速动态变化的无限数据流环境,在静态数据集下的基于模式的贝叶斯分类器就不能适用.为了解决这些问题,提出了基于显露模式的数据流贝叶斯分类模型EPDS(Bayesian classifier algorithm based on emerging pattern for data stream).该模型使用一个简单的混合森林结构来维护内存中事务的项集,并采用一种快速的模式抽取机制来提高算法速度.EPDS采用半懒惰式学习策略持续更新显露模式,并为待分类事务在每个类下建立局部分类模型.大量实验结果表明,该算法比其他数据流分类模型有较高的准确度.

关 键 词:数据流  显露模式  贝叶斯  数据挖掘
收稿时间:2017/5/15 0:00:00
修稿时间:2017/6/16 0:00:00

Bayesian Classifier Algorithm Based on Emerging Pattern for Data Stream
DU Chao,WANG Zhi-Hai,JIANG Jing-Jing and SUN Yan-Ge.Bayesian Classifier Algorithm Based on Emerging Pattern for Data Stream[J].Journal of Software,2017,28(11):2891-2904.
Authors:DU Chao  WANG Zhi-Hai  JIANG Jing-Jing and SUN Yan-Ge
Affiliation:School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China,School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China,School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China and School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
Abstract:Pattern-Based Bayesian model is one of the solutions for the classification problem in data mining. Most pattern-based Bayesian classifiers consider the supports of patterns in the dataset of the home class only. However, the supports of the patterns in the counterpart class are ignored. In addition, for the high-speed dynamic changes and infinite data stream, pattern-based Bayesian classifier which aims at static datasets can not work. To overcome these problems, EPDS (Bayesian classifier algorithm based on emerging pattern for data stream) is proposed. EPDS is a Bayesian classification model based on the emerging patterns discovered over data stream. In this model, EPDS presents a simple hybrid forests (HYF) data structure to maintain the itemsets of the transactions in memory, and uses a fast pattern extracting mechanism to accelerate the algorithm. EPDS adopts partially-lazy learning strategy to update emerging itemsets continuously, and establishes a local classification model in each class for the test transaction. Experimental results on real and synthetic data streams show that EPDS achieves higher classification accuracy compared to other classic classifiers.
Keywords:data stream  emerging pattern  Bayesian  data mining
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号