首页 | 官方网站   微博 | 高级检索  
     

基于少量类标签的概念漂移检测算法
引用本文:李南,郭躬德,陈黎飞. 基于少量类标签的概念漂移检测算法[J]. 计算机应用, 2012, 32(8): 2176-2185. DOI: 10.3724/SP.J.1087.2012.02176
作者姓名:李南  郭躬德  陈黎飞
作者单位:1. 福建师范大学 数学与计算机科学学院,福州 3500072. 网络安全与密码技术福建省高校重点实验室(福建师范大学),福州 350007
基金项目:国家自然科学基金资助项目,福建高校产学合作科技重大项目
摘    要:传统的概念漂移数据流分类算法通常利用测试数据的真实类标来检测数据流是否发生概念漂移,并根据需要调整分类模型。然而,真实类标的标记需要耗费大量的人力、物力,而持续不断到来的高速数据流使得这种解决方案在现实中难以实现。针对上述问题,提出一种基于少量类标签的概念漂移检测算法。它根据快速KNNModel算法利用模型簇分类的特点,在未知分类数据类标的情况下,根据当前数据块不被任一模型簇覆盖的实例数目较之前数据块在一定的显著水平下是否发生显著增大,来判断是否发生概念漂移。在概念漂移发生的情况下,让领域专家针对那些少量的不被模型簇覆盖的数据进行标记,并利用这些数据自我修正模型,较好地解决了概念漂移的检测和模型自我更新问题。实验结果表明,该方法能够在自适应处理数据流概念漂移的前提下对数据流进行快速的分类,并得到和传统数据流分类算法近似或更高的分类精度。

关 键 词:概念漂移  数据流  分类  KNNModel  模型簇  
收稿时间:2012-01-16
修稿时间:2012-03-08

Concept drift detection method with limited amount of labeled data
LI Nan , GUO Gong-de , CHEN Li-fei. Concept drift detection method with limited amount of labeled data[J]. Journal of Computer Applications, 2012, 32(8): 2176-2185. DOI: 10.3724/SP.J.1087.2012.02176
Authors:LI Nan    GUO Gong-de    CHEN Li-fei
Affiliation:1. Key Laboratory of Network Security and Cryptography of Fujian Province University (Fujian Normal University), Fuzhou Fujian 350007, China2. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou Fujian 350007,China
Abstract:Most existing algorithms for data streams mining utilize the true label of testing data to detect concept drift and adjust current model according to requirements.It is impractical in real-world applications as manual labeling of instances which arrive continuously at a high speed requires a lot of human and material resources.Therefore,a concept drift detection method with limited amount of labeled data was proposed.The proposed method used the model clusters generated by the fast KNNModel algorithm to classify instances.It was able to detect concept drift on whether the number of instances which were not covered by any model clusters on the current block increased remarkably at a certain significance level than that of the prior block.Once concept drift happened,the domain experts were asked to label a few instances which were not covered by the model clusters and these representative instances were used to update the current model.The experimental results show that,compared with the traditional classification algorithms,the proposed method not only adapts to the situation of concept drift,but also acquires approximate or better classification accuracy.
Keywords:concept drift  data stream  classification  KNNModel  model cluster
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号